Neither individually distinctive songs nor “lek signatures” are demonstrated in suboscine Screaming Pihas.—Why vocal learning has evolved in some taxa (most notably, songbirds, parrots, and hummingbirds) but not others remains an unanswered evolutionary question. Recent evidence from suboscine bellbirds (Procnias spp.; Kroodsma 2005, Saranathan et al. 2007), however, reveals that song learning may be more common than previously thought, and so with considerable excitement I began to read Fitzsimmons et al.'s (2008) article about “lek signatures” and the possibility of song learning in the suboscine Screaming Piha (Lipaugus vociferans; hereafter “piha”). By the end of the article, however, I was sadly disappointed.
In short, the data used in the study consist of 256 songs recorded from 26 male pihas that were distributed among four “exploded leks” (with 3, 4, 6, and 13 males in the four leks). Birds were recorded sequentially between 0730 and 1300 hours over 5 days, with birds in the same lek usually recorded on the same day, and 13 frequency or time measurements were made or calculated from the 8 or 10 best recorded songs from each male.
The first major problem is with sampling. Because the reported song differences among males and among leks are so subtle and are not obvious in sonagrams (see Fitzsimmons et al. 2008: figure 3) but are instead measured by computer, to the nearest 22 Hz and 2.9 ms, considerable care must be taken to ensure that the results of the analyses are not artifacts of the sampling methods. If each male is sampled during a single half-hour period, for example, but different males are sampled over a significant portion of the day (from 0730 to 1300 hours), in a variety of unknown contexts (female present or absent in lek, interaction with neighboring males current or not), over a 5-day span, then it is imperative that each male also be sampled at a variety of times of day, in a variety of motivational contexts, and on different days. If, like Fitzsimmons et al., one chooses not to sample each male in multiple sessions, one cannot attribute the between-session variation to individuals, only to the different “recording sessions.” Nor can one convincingly demonstrate that songs vary among leks if one has sampled the leks in succession and each lek primarily on only one day.
The biological reason for these statistical limitations is simple: songs of birds are highly expressive, and they can vary in subtle to striking ways with time of day and motivational context. Examples of such variation in song abound in the literature. Individual song phrases of a Scarlet Tanager (Piranga olivacea) can differ in duration from early morning to midmorning, for example, and some birds (e.g., the non-songbird Belted Kingfisher [Ceryle alcyon]) change frequency characteristics of their vocalizations from one context to another (Kroodsma 2009). For the pihas, as motivation to sing waned and males slowed their song rates from early morning to midday, or if a female appeared in the lek, even the most subtle changes in songs due to different contexts could have a marked effect on the outcome of the analyses. Given the non-obvious differences among the sonagrams, I would not be surprised if the reported results were even dictated, in part, by the relatively poor recordings used in the study: with three different shotgun microphones with different polar responses being used to record singing birds at various positions in the rainforest canopy, the resulting large and varying amounts of degradation in the recordings could have a significant effect on fine-scale frequency and time measurements (see reverberation lasting >100 ms in the sonagrams).
The second major problem is in analysis. One set of analysis problems arises directly from the sampling failures. For example, although it is reported that “all 13 song features were more variable among males than within males” (Fitzsimmons et al. 2008:911; my italics) and that canonical discriminant function analysis “assigned 93.2% of songs to the correct male” (p. 912; my italics)—the authors thereby claiming that males have individually distinctive songs—all that can really be claimed is that all 13 song features were more variable among recording sessions than within recording sessions and that the canonical discriminant function analysis assigned 93.2% of songs to the correct recording session (but see below for other problems with the discriminant analysis). Without sampling individual males in different recording sessions, thereby controlling for known and unknown sources of song variation, the observed song variation cannot be attributed to male distinctiveness, only to recording-session distinctiveness.
Most troublesome among the analysis problems are those that plague the discriminant function analysis that the authors rely on for their “lek signature” conclusion; I provide three examples. (1) Most glaring is the issue of “simple pseudoreplication” (Hurlbert 1984). The 10 (or 8) songs from each male are incorrectly used as if they are independent samples, as in the following statements: “canonical discriminant analysis assigned 76.4% of songs to the correct lek…[and the] discriminant analysis capably differentiated songs from the four different leks…suggesting that variation between males at different leks is small, but present” (Fitzsimmons et al. 2008:912–913; my italics). The sampling problem cripples this analysis from the outset, but, additionally, the desired test is whether males differ among leks (as stated in the conclusion of the above quotation), and to use the 256 songs as if they represented 256 different males is false replication (McGregor et al. 1992); this error is serious and not a mere technicality, because this particular kind of “pseudoreplication…tends to produce (sometimes grossly) incorrect results…and discriminability of…groups…can be drastically overestimated” (Mundry and Sommer 2007:965). (2) The authors appear to have used the discriminant function to classify the same songs that they used to compute the function; this kind of circular analysis vastly increases the likelihood that the function will seem to correctly classify the songs, and is simply bad practice (Tabachnick and Fidell 2007). (3) The function “assigned 76.4% of songs to the correct lek, well above the 25% level of correct assignment expected by chance”; statistical significance for the function is implied, yet no such test was done, and it seems incorrect to simply assert a 25% chance level of assignment when, to complicate matters, 50% of the songs come from one lek. Preferably, one determines a priori the chance classification probability for each category and then determines how close the classification comes to those probabilities (Tabachnick and Fidell 2007).
The third problem is in interpretation. I provide one example: “Our finding that Screaming Pihas sing individually distinctive songs adds to growing evidence that there may be a learned component to song in some suboscines” (Fitzsimmons et al. 2008:913). This statement is in the final sentence of the paper, the place where an author wants to leave the reader with a lasting impression about the significance of a study, yet the statement is nonsensical and, even worse, misleading, because songs in a wide range of species (most likely all species) are individually distinctive whether the songs are learned or not. In nonlearning flycatchers (Empidonax spp.), for example, songs are individually distinctive, perhaps best documented by the two papers the authors cite about the Alder Flycatcher (E. alnorum; Lovell and Lein 2004a, b); the birds even use the variation to discriminate among individuals. Even if individually distinctive songs had been demonstrated for the pihas, such a finding would have no bearing on whether the songs were learned or not.
Given the paper's problems in sampling, analysis, and interpretation, Fitzsimmons et al. (2008) cannot reach any valid conclusions about whether songs are individually distinctive. Nor do they present valid evidence of songs differing from lek to lek. Nor are the findings relevant to the question of vocal learning in suboscines.
When papers like this appear in print, authors rightly share blame with others who facilitate the publication process, including reviewers and editors. How this extended responsibility can fail is illustrated not only by the initial publication of Fitzsimmons et al. (2008) but also by the reluctance of those involved in the review process to share my desire that a severe, but fair, review be published. This cavalier attitude toward the design of research and the collection and analysis of numbers is unacceptable, because such permissiveness undermines the very science we claim to be doing. The present case is not unique, and such flawed papers can do considerable damage if they go unchallenged. If the research model is emulated by others and if the conclusions and logic are accepted as reported, progress in understanding birds is confused and stymied (for additional discussion, see Byers and Kroodsma 2009). We deserve better from each other, and we should hold each other to a higher standard.
I thank K. Yasukawa and editors S. Sealy and M. Murphy for their advice on the content of this letter, and two reviewers whose advice I sought but who chose to remain anonymous.