Archaeobiologists and ecologists have a long-standing interest in how best to estimate the number of species in an assemblage (past or present) with limited samples. The sampling to redundancy method for evaluating species richness and diversity is a well established approach for assessing sample adequacy and has been used by archaeologists for various classes of remains. In a recent article in this journal, Lyman and Ames (2004) explore the utility of this method for zooarchaeological specimens. In this note, we discuss some fundamental issues associated with the sampling to redundancy method, and make some recommendations for using this method to evaluate richness and diversity of archaeobiological assemblages.
Over 80 years ago, ecologist Olaf Arrhenius (1921) presented a formula for predicting species “richness” (or “poorness”) in a given area. Since that time, ecologists and archaeologists have been grappling with how best to estimate the number of species in an assemblage (past or present) with limited samples (e.g., Baxter 2001; Peet 1974). Lyman and Ames' recent article in this journal (2004) describes a particular approach to estimating taxonomic richness and diversity of zooarchaeological assemblages from limited samples. As a paleoethnobotanist and an ecologist, our aim in this paper is to broaden their contribution by situating it in an ecological context and to bring to the discussion additional previous attempts to evaluate richness by paleoethnobotanists and other archaeologists. We also explore some of the implications of the approach they suggest for evaluating taxonomic richness and diversity of archaeobiological assemblages.
Taxonomic richness is the number of taxa in a sample, while diversity refers to any of various measures which combine richness with an estimate of the relative abundance of taxa (typically expressed as “evenness”; Spellerberg and Fedor 2003). In their article, Lyman and Ames (2004) describe a simple application of the relationship of taxonomic richness and diversity to sample size or effort. Ecologists have observed for nearly a century that as sample size increases, the number of taxa sampled from a local or regional pool increases, first relatively rapidly and then more slowly, until it approaches the true value for the assemblage of interest. Lyman and Ames are interested in the idea that when the richness or diversity is plotted against sample size, an assemblage can be considered to have been adequately sampled when the plot levels off—when further samples do not change the result. This can be termed having reached “redundancy” in the sampling effort. Lyman and Ames illustrate this method with zooarchaeological assemblages from the Portland Basin in Oregon, although they recognize the utility of the method for some other classes of archaeological data.
Ecological and Archaeological Contexts for Examining Richness
The predictable relationship between number of species and sampling effort described by Lyman and Ames is one of the fundamental concepts of community ecology (e.g., Krebs 1989:368). It was first described by plant ecologists in the early part of the last century, who recognized the relationship between number of species and the area sampled (Evans et al. 1955). The relationship described in species-effort curves has since been given various names (e.g., collector's curves, species-accumulation curves; Flather 1996:155), and has been used for a variety of purposes, such as determining adequacy of the sample size and developing formal methods to estimate and compare species richness (Gotelli and Colwell 2001; He and Legendre 1996). Initially, such indices were relatively simple formulae which could be calculated by hand (e.g., Fisher et al. 1943; Gleason 1922), but now a variety of more computationally intensive and statistically rigorous approaches are available (e.g., Colwell et al. 2004). Gotelli and Colwell (2001) provide a useful description of how the sampling to redundancy approach described by Lyman and Ames fits in this broader methodological context.
In many ecological situations and archaeological contexts, however, it may not be possible to achieve the redundancy criterion. Limited resources, limited sampling opportunities, or a desire to minimize impact on archaeological deposits can often mean that some taxa necessarily remain unsampled. In these cases, the “true” asymptotic species richness cannot be obtained directly; ecologists (e.g., Colwell et al. 2004; Gotelli and Colwell 2001; Palmer 1990) and archaeologists (e.g., Kintigh 1984; McCartney and Glass 1990) have put considerable effort into developing methods that estimate this value.
Because estimates of richness are highly contingent on the number of samples used in the estimation, comparison among assemblages sampled with different effort is problematic. Archaeobiologists have long recognized the relationship between richness and sample size (e.g., Grayson 1984) and for that reason have grappled with different ways to effectively compare richness among samples of different size. In Orton's (2000) comprehensive text on sampling in archaeology, he notes that four main approaches have been used. In addition to sampling to redundancy, he reviews computer simulations (e.g., Kintigh 1984), regression approaches (e.g., Rhode 1988), and rarefaction analyses (e.g., Baxter 2001). Both computer simulations and regression approaches have been heavily critiqued as valid methods of comparing archaeobiological assemblages that have different sample sizes (e.g., Baxter 2001; Byrd 1997; Rhode 1988) and will not be discussed further here. Orton (2000) acknowledges that the rarefaction method provides the best approach; we present an application of rarefaction to a zooarchaeological data set below.
It is important to distinguish between a taxon (for example, species) accumulation curve and a rarefaction curve (Gotelli and Colwell 2001). A taxon accumulation curve represents the total number of taxa obtained sequentially during the data collection process. This is what Lyman and Ames discuss and to which they apply the redundancy criterion. Taxon accumulation curves can be either individual-based (equivalent to NISP—Number of Identified Specimens) or sample-based, as in Lyman and Ames' examples. In contrast, a rarefaction curve is produced by repeated random resampling of the total pool of individuals or samples, and thus represents an average curve of taxon accumulation with sampling effort. A rarefaction curve provides a smoothed curve which removes any artifacts due to sampling sequence and allows comparison among different curves standardized to equivalent sample sizes. This latter comparison is facilitated by the recent development of confidence intervals for rarefaction curves (Colwell 2004; Gotelli and Entsminger 2001).
Lyman and Ames observe that the graphical method they describe for assessing taxonomic richness has been used by archaeologists, though rarely. In fact, the method may be more commonly recognized, both by archaeologists in general (e.g., Kirch et al. 1987:123–124; Meltzer et al. 1992:376; Orton 2000) and by zooarchaeologists in particular (e.g., Moss 1989:143; Reitz and Wing 1999:107; Trost 2005; Zohar and Belmaker 2005). We suspect, however, that it has been most consistently applied in paleoethnobotanical research, which is not addressed by Lyman and Ames. To our knowledge, this message was first applied to paleoethnobotanical assemblages in the 1970s for plant remains from England (Fasham and Monk 1978; Green 1979; see also van der Veen and Fieller 1982). Since then, it has been used by archaeobotanists dealing with microfossils (e.g., Moore et al. 1991) and macrofossils (e.g., Lepofsky et al. 1996; Miksicek 1987). In recent discussions with various colleagues, we found that sampling to redundancy to determine sufficient sample size for taxonomic richness is standard practice among archaeobotanists in both European and North American laboratories. The reason the redundancy approach is more common in paleoethnobotany than in zooarchaeology may be that the high diversity of plant taxa and number of tiny particles that must be examined require researchers to use subsampling strategies. Like Lyman and Ames, we suggest that this method also has great utility for zooarchaeological assemblages and encourage more zooarchaeologists to peruse the paleoethnobotanical sampling literature.
Among paleoethnobotanists in the Pacific Northwest, the sampling to redundancy method is used in various contexts. It is used to assess sample adequacy when comparing relative richness (Lepofsky 2000a; Lepofsky et al. 1996; Lepofsky and Lyons 2003; Wollstonecroft 2000) or abundance among contexts (Lepofsky 2000b). For instance, we have used this approach both to assess when no additional flotation samples are needed to compare richness between assemblages, or when, despite small samples, the differences in richness between assemblages are so large, that relative richness can still be assessed (e.g., Lepofsky and Lyons 2003). For flotation samples with abundant plant remains, we have used this approach to decide how large a subsample should be identified to assess the sample's richness or the relative abundance of taxa within it (Lepofsky 2000b). This is particularly useful for deciding how many pieces of charcoal in a sample should be identified, given that analyzing charcoal can be very time-consuming (e.g., Lyons 2000).
Estimating Richness from Taxon Accumulation Curves
Two phenomena influence the effectiveness of the simple plots that Lyman and Ames and others use. The first is the effect of the sequence in which the samples or specimens are identified and plotted. Given stochastic factors, it is possible that new taxa will be encountered after the graph has apparently leveled off, simply because the samples containing those taxa were analyzed and plotted later in the sequence. Such is the case in Figure 1a, where the graph leveled off after 4500 specimens were identified and 20 taxa found. However, after 12,500 additional specimens were examined, an additional taxon was added. It is not possible, in principle, to predict this sort of pattern in an accumulation curve. Taxa showing up early in the sequence will tend to be the most ubiquitous, while those appearing only after a large sample has been identified will generally be rarer.
In Lyman and Ames' Figure 4, the shape of the curve is likely an artifact of having plotted sites from most to least rich. Had they plotted the sites in a random order, the graph probably would not have leveled off where it did, and different conclusions might have been drawn about the adequacy of the sample size. The strength of this sequencing artifact is enhanced by the very small contribution of the last three sites in the sequence to their axis of Cumulative Taxonomic Richness (~1%). Indeed, dealing appropriately with eliminating artifacts of sample sequence is a topic which has received much attention in the ecological literature (e.g., resampling approaches or their analytical equivalents; Colwell et al. 2004).
A second, sometimes related reason for a species accumulation curve to rise abruptly after having leveled off is that the latter samples could represent a different statistical population (e.g., a different ecological community) with a distinct species mix. This arises from the phenomenon described by ecologists as alpha, beta, and gamma “diversity” (in these discussions “diversity” refers to “richness”). Alpha diversity (or richness) refers to the taxa present within a single habitat or community type, while beta diversity reflects the addition or turnover of taxa as different habitats are encountered. Gamma diversity refers to diversity at the landscape or regional level (Peet 1974). As samples are added from beyond the initial habitat sampled, there will be an increase in the cumulative taxa represented because distinct communities tend to occupy different habitats.
The concepts of alpha, beta, and gamma diversity are also of relevance in determining taxonomic richness in an archaeobiological setting. We can think of alpha diversity as reflecting the richness of a specific archaeological context (e.g., samples from inside a house), whereas beta diversity could be the total richness obtained via samples from inside and outside the house, or from samples from various noncontemporaneous contexts within a site. Gamma diversity could be thought of as the cumulative richness represented across sites within a region. It is important to recognize that samples from temporally distinct assemblages within a specific archaeological context are most appropriately thought of as representing beta, or even gamma, not alpha diversity. These differences in alpha, beta, and gamma diversity measures observed in archaeological contexts could reflect access to different animals or ecosystems (because of sociopolitical, logistical, or ecological reasons). The second steep rise in Lyman and Ames' figure 4 may represent a shift from beta to gamma diversity. Lyman and Ames recognize this point in their discussion of possible reasons for this rise in the number of taxa (pp. 339–340). Besides the beta–gamma shift, the other reasons they suggest are the extreme rarity of the two taxa representing the rise, possible errors in taxonomic identification, and variable time and duration of site occupation.
It is important to distinguish between the use of the redundancy method as a tool for assessing the adequacy of a sample size and the analysis of that sample in a formal comparison of species richness. The graphical approach recommended by Lyman and Ames (and used in our own previous work) is an excellent way to determine whether or not more samples need to be identified prior to embarking on statistical analysis. However, for a formal comparison of richness among different archaeological contexts, other tools are needed both to account for sequencing artifacts and provide statistical support for the comparison. For instance, Colwell et al. (2004) report on the development and application of confidence intervals for sample-based rarefaction and provide free software which calculates them (Colwell 2004), and there is other software available which will provide confidence intervals for a randomized species accumulation curve (Gotelli and Entsminger 2001). If the redundancy criterion has been reached in two or more contexts, then, in principle, the richness estimates can be compared directly, though no statistical assessment of the difference or similarity can be made. If redundancy was not reached in a subset of the contexts one wishes to compare, then no comparison should be made without re-expressing the accumulation curve though a process such as rarefaction (Gotelli and Colwell 2001). In Lyman and Ames' case, as they recognize, they did not reach redundancy for the Cathlapotle site (their figure 2), preventing a valid comparison of richness with the Meier site. Application of rarefaction methods to both data sets would allow such a comparison.
We applied Colwell's software (Colwell 2004) to the data in the example of Figure 1 in order to produce the sample-based rarefaction curves and confidence intervals of Figure 2. This figure illustrates these data with sequencing artifacts removed and they are now ready for comparison with other assemblages. As an aside, we can see from Figure 2 that the column samples are tremendously more efficient at estimating the number of taxa in this assemblage than are the unit samples representing an order of magnitude fewer specimens. Figure 2b illustrates the species accumulation curve we would have obtained had we applied the redundancy criterion to the unit sample data set shown in Figure 2a and stopped identification at around 7,000 specimens (Figure 1a).
Additional considerations in evaluating plots of cumulative taxonomic richness concern the scale of the x-axis and its influence on the apparent patterns as data accumulate. In the example shown in Figure 1, when we scale the x-axis to include all 22,000 specimens from the unit samples, it appears that we are not in a position to estimate taxonomic richness of the column samples (Figure 1a). But when we truncate the x-axis at 2000 specimens (Figure 1b) and thus expand the resolution of the graph for the early part of the curve, we appear to be in a much better position to evaluate the pattern of species accumulation for the column samples. On the other hand, while truncating the axis in this way provides a clearer representation of the column data, it may lead to erroneous conclusions about the fauna from the unit samples. Truncated at 2,000 specimens, one might reasonably conclude that the true richness of the unit samples was approached at 15 taxa, rather than the 21 taxa represented in Figure 1a. These points are not profound: we present them to illustrate how strongly one's conclusions can be contingent on a series of small decisions made in the data analysis process.
A further issue regarding the x-axis is the choice of whether it is individual-based or sample-based, and thus whether it represents NISP, as we and most other practitioners have done, or a larger pooled unit, such as samples. In Lyman and Ames' case, the x-axis is sample-based and represents all the specimens collected for a year (as in their figures 2 and 3) or a site (as in their figure 4). Individual- and sample-based axes represent fundamentally different measures of species accumulation. Colwell et al. (2004) and Gotelli and Colwell (2001) argue convincingly that, while a plot of species accumulation versus number of identified specimens (NISP) represents a measure of species richness, a plot of species accumulation versus the number of samples represents species density. For instance, samples from different contexts may vary systematically in the number of identifiable specimens per sample, so that comparisons of species accumulation on a per sample basis will be strongly confounded by differences in the NISP per sample. Gotelli and Colwell (2001) show that using a measure of species density to make inferences about species richness among contexts can lead to erroneous conclusions.
Estimating Diversity from Cumulative Curves
Using the sampling-to-redundancy method, Lyman and Ames also explore when sample sizes are adequate to assess the “quantitative property of diversity” of their samples. Following “zooarchaeological tradition,” they measure diversity with the Shannon index. We recognize that the point of their article is not to assess the relative merits of diversity indices, but we question the utility of using a diversity index in this context at all. There is no singularly meaningful quantitative property of diversity, since different indices give differential weights to evenness and richness of assemblages (see Hurlbert 1971; Magurran 1988, 2004). Determining which index is appropriate depends on the particular research questions and appropriateness of the data for any given index. The choice of the diversity measure can radically influence when redundancy is reached (Figure 3)—and the redundancy criterion for diversity indices will not necessarily be the same as that for richness estimates. Zohar and Belmaker (2005) discuss the appropriateness of the Shannon index for archaeological faunal assemblages specifically, and Popper (1988) discusses concerns about the use of diversity indices and paleoethnobotanical assemblages more generally. In ecological discussions, there has been for some time a tendency to avoid the use of diversity indices which confound richness and evenness in favor of independent consideration of the two properties of communities.
Diversity indices are notoriously hard to interpret. Since samples with different combinations of richness and evenness could have the same index value, it is difficult to make meaningful comparisons between assemblages (Hill 1973). In fact, we suspect that what archaeologists are really more often interested in is the relationship of richness and evenness in an assemblage. We agree with Lyman and Ames that graphical techniques are far better for assessing the richness and evenness than “long tables of NISP values” (p. 334) , but we suggest that simple frequency distributions of species abundances are often a more effective way to explore diversity than plots of diversity indices. Such graphs, where the percent abundance of taxa is presented in decreasing abundance, provide an easily interpretable display of relative evenness and abundance (e.g., Betts and Friesen 2004; Gordon 1993; Lyman 1991:94). Among ecologists, the plotting of these “dominance-diversity” graphs is the first step in understanding species diversity (Krebs 1989:367).
Most archaeobiological assemblages, like ecological communities (Fisher et al. 1943), are characterized by a few common taxa and many more taxa that occur in low numbers. In many archaeological instances, we are most interested in comparing the relative abundance of the most common taxa. In such cases, a diversity measure, especially one like the Shannon index that weighs the rare taxa heavily (Krebs 1989:378; Peet 1974:296) may not be appropriate. However, in these cases, it would be useful to plot redundancy curves of the frequency of only the most common species to examine whether they have been adequately sampled to assess relative abundance (e.g., Fasham and Monk 1978; Miksicek 1987). Rare taxa, on the other had, should be treated separately and strictly as nominal data in such cases.
We have characterized the relative abundance of the most common taxa in archaeobotanical and zooarchaeological assemblages by assessing the degree of “specialization” (Lepofsky and Lyons 2003; Trost 2005), but of course how this is defined is an analytical decision. In this context, we assessed specialization by summing the percent abundance of the three most common taxa in an assemblage. If the total percent of these three taxa is near 90% of the total assemblage, then we characterize the assemblage as “specialized.” This is essentially a form of the measures of dominance within a community used by ecologists.
Applying the “sampling to redundancy” method to species accumulation curves is an easy to understand and quick way to assess the adequacy of sampling in archaeobiological contexts. The method should be used judiciously, however. We make the following recommendations for using this approach to evaluate the richness and diversity of archaeobiological assemblages.
Be aware that the sequence of samples encountered can create artifacts in the shape of the species accumulation curve (e.g., false plateaus). When using the sampling to redundancy method, samples should be plotted in random order, to minimize sequencing effects. After completing laboratory analysis, use software that randomizes or resamples the data to summarize species accumulation curves and present them in an unbiased sequence.
Be aware that new taxa may be encountered even after an accumulation curve has apparently stabilized. Consider being conservative in declaring that redundancy has been reached—and if a plateau is terminated unexpectedly, think carefully about whether it represents unexpected heterogeneity within the same assemblage or the sampling scheme having encountered a new assemblage (beta diversity).
Be aware of the difference between the species density curve, with samples on the x-axis and the species richness curve, with NISP on the x-axis. Think about whether it is most appropriate for you to consider sample- or individual-based curves and examine the curve most meaningful for your application.
Unless the difference in richness between two assemblages is radically different and you are confident you have sampled to redundancy, their relative richness should be compared using rarefaction curves, ideally using software which provides confidence intervals.
Choose diversity indices based on a careful assessment of their relative strengths, their relative weighting of richness and evenness, and their weighting of common versus rare species. Be aware that different diversity measures may require very different sample sizes because they vary in how they are influenced by rarer taxa.
Consider separating your evaluation of richness and evenness by not using an index that combines them. Be clear regarding what metric is most appropriate for the questions you are asking.
Many thanks to Iain McKechnie for providing us with the Ts′ishaa archaeofaunal data, helping us produce the graphs of these data, and for his many helpful comments on this paper. We also appreciate the useful suggestions made by Rob Colwell, Tom Rocek, Bob Muir, Jon Driver, Charlie Schwegar, and Michele Wollstonecroft. We appreciate Lee Lyman and Ken Ames' input and the stimulus of their paper in moving us to write this one.