Geographic variation in acoustic signals can be important in species divergence, especially the maintenance of prezygotic barriers to gene flow. Furthermore, selective pressures on acoustic signals likely vary both across geographic distances and among vocalizations used in different behavioral contexts. We described the call repertoire of 5 subspecies of Marsh Wren (Cistothorus palustris) in eastern North America and tested for variation in both the acoustic structure and likelihood of production of each call type at 3 functional–ecological levels: subspecies identity, migratory pattern, and habitat type. Three of the 7 described call types exhibited acoustic variation best explained by either migratory pattern or habitat type. These calls were used principally in courtship–territorial patrol contexts, whereas 4 calls that did not exhibit geographic variation were used in agonistic interactions. How often a call is used in a population may be indicative of the behavior or breeding phenology associated with that vocalization. We found that 4 calls varied in how commonly they were produced among the subspecies and/or habitat types. We also described and quantified the degree to which these 5 subspecies produce calls in association with song—a little reported, but possibly more widespread, behavior in birds. Marsh Wrens commonly embedded 3 call types into song in a nonrandom pattern. This behavior was more common in freshwater-marsh populations than in saltmarsh populations, and we discuss several possible functions for call-song associations. Overall, when geographic variation in call structure occurred, it was most commonly explained by differences in habitat type and, therefore, may be indicative of local adaptations that could limit gene flow across environments.
Acoustic signals can play an important role in the divergence of populations and the reinforcement of past divergence events. Geographic variation in acoustic signals has been demonstrated in a wide variety of taxa, including insects, anurans, fishes, birds, and mammals (e.g., Baptista 1977, Ryan and Wilczynski 1991, Wielgart and Whitehead 1997, Irwin 2000, Parmentier et al. 2005, Stephan and Zuberbühler 2008, Marshall et al. 2011, Jiang et al. 2015). When acoustic signals play a role in mate selection, they have been shown to act as strong prezygotic barriers to gene flow (e.g., Ryan and Wilczynski 1991, Wells and Henry 1992, Mendelson and Shaw 2002, Danner et al. 2011) and can have demonstrated fitness costs to hybridization among individuals with heterotypical vocalizations (e.g., Grant and Grant 1996, Snowberg and Benkman 2007).
A great deal of what we know about geographic variation in acoustic signals has come from decades of study on birdsong and song dialects (reviewed in Podos and Warren 2007). Birdsong plays a well-known role in mate attraction and territory defense (Catchpole and Slater 2008). However, song is only a small portion of the repertoire of avian vocal signals (Krebs and Kroodsma 1980). Signaling behaviors in general are under myriad selective pressures that may vary in their strength and direction across a geographic range, including differences in the acoustic environment (Morton 1975, Nicholls et al. 2006), signaler physiology or morphology (Bertelli and Tubaro 2002, Ballentine 2006), genetic drift (Laiolo et al. 2001a, Nicholls et al. 2006, Roach and Phillmore 2017), cultural drift (Soha et al. 2004, Roach and Phillmore 2017), or sexual selection (Irwin et al. 2001). Similarly, within an individual, different vocalizations in the repertoire may be under different types or intensities of selection (Laiolo et al. 2001a, Sturge et al. 2016). In contrast to most songs, calls are used in broader functional contexts (not largely mate attraction or territory defense), and they are typically shorter in duration and less acoustically complex than song (Marler 2004a). Furthermore, unlike song, which involves learning and therefore includes a cultural component in its vertical and horizontal transmission within oscine passerines (“songbirds”), most calls are thought to be innate, though there are now numerous exceptions to this generality (see Groth 1993, Ficken and Popp 1995, Greenlaw et al. 1998, Hughes et al. 1998, Riebel and Slater 1998). Studying calls alongside song therefore affords the opportunity to understand patterns of divergence in functionally and acoustically distinct signals that may be subject to different types of selection. By exploring multiple signals within a species' vocal repertoire across a wide geographic area, we can build on the powerful framework of song dialects to create a more complete view of the evolution of vocal signal repertoires.
The Marsh Wren is a well-developed model for song learning and has an impressive and well-described song repertoire (Verner 1975, Kroodsma and Canady 1985, Kroodsma and Verner 1987, Luttrell et al. 2016). Although several early naturalists described Marsh Wren calling behavior anecdotally, there has been no comprehensive description of the call repertoire in this species (Townsend 1905, Allen 1923, Welter 1935, Verner 1975). Marsh Wrens comprise a species complex with 14 current subspecies (Kroodsma and Verner 2013) that vary in morphology, migratory pattern, and habitat type. Further, the Marsh Wren may represent 2 cryptic species divided into eastern and western groups in North America (Kroodsma and Canady 1985, Hebert et al. 2004). Here, we explore in detail the call repertoire of only the eastern group. We recorded 5 subspecies of Marsh Wren in eastern North America, including nonmigratory, partially migratory, and fully migratory populations that are endemic to 2 different habitat types, tidal saltmarsh and freshwater inland marsh (Kroodsma and Verner 2013). Partially migratory subspecies are short-distance, non-obligatory migrants that likely migrate along suitable habitat corridors; the fully migratory subspecies is an obligatory migrant that likely migrates over unsuitable habitat.
This study has 3 main goals. First, we describe the acoustic structure and behavioral context of the Marsh Wren call repertoire in qualitative and quantitative detail. We focus on breeding-season calls, because several Marsh Wren subspecies are sympatric during the nonbreeding season and discrimination among subspecies may be difficult during this time. Second, we test for variation in acoustic structure of each described call type at 3 functional–ecological levels: subspecies identity, migratory pattern, and habitat type. Third, we provide preliminary evidence of differences in the likelihood of call types being produced at each of the same 3 functional–ecological levels, and discuss the potential importance of those differences for inferring underlying behavioral and ecological causes.
Each of the functional–ecological categories we explore is likely to support a different regime of selective pressures. At the level of subspecies identity, we expect few vocal differences, especially if calls are innate, given that subspecies of Marsh Wren are likely quite young geologically. The current geographic range of suitable Marsh Wren habitat stabilized only in the past 5,000–21,000 yr (Malamud-Roam et al. 2006). Despite this recent range stabilization, Marsh Wrens demonstrate multiple migratory patterns among different populations. Mechanistically, differences in migratory patterns result from alterations in hormone expression that may also affect breeding phenology (Ramenofsky and Wingfield 2007). Thus, populations with different migratory patterns have already accumulated key physiological differences and are more likely than subspecies within a migratory pattern to demonstrate differences in vocal characteristics. Lastly, Marsh Wrens are endemic to 2 distinct habitat types, freshwater marsh and saltmarsh. Other subspecies pairs of birds that exhibit this habitat dichotomy have marked differences in physiology and morphology related to heat dispersal (Greenberg and Danner 2012) and osmoregulation (Goldstein 2006). Differences in bill morphology related to heat dispersal directly affect vocal signal production and receiver preference in other saltmarsh endemics (Ballentine 2006, Liu et al. 2008). We expect the largest degree of vocal differences to occur at the habitat level, whether that difference is due to divergence in morphology or to selection acting on reduced fitness of ecotype-hybrids (Maley 2012).
We collected recordings from 5 of the 14 subspecies of Marsh Wren (see Figure 1). Cistothorus palustris dissaeptus (hereafter dissaeptus) is the only fully migratory subspecies endemic to freshwater marshes in our sample. The remaining 4 subspecies are either partial migrants or nonmigrants and are found in coastal or estuarine marshes that vary in salinity and tidal action. Cistothorus p. palustris (hereafter palustris) and C. p. waynei (hereafter waynei) are considered partially migratory. Cistothorus p. griseus (hereafter griseus) and C. p. marianae (hereafter marianae) are considered nonmigratory (Kroodsma and Verner 2013).
We recorded Marsh Wrens throughout their distribution in eastern North America at 19 sites along the Gulf of Mexico, the Atlantic Coast, and the Great Lakes region during the breeding season (April 30–July 21, 2012–2014; see Figure 1). We focus on the breeding season because although Marsh Wren subspecies are geographically separate during the breeding season, the migratory and nonmigratory populations are sympatric during the nonbreeding season. While variation in plumage may allow a wintering individual to be correctly identified to its breeding population, focusing on breeding birds eliminates error in population assignment. For Atlantic coast populations, we worked from south to north to minimize differences in breeding phenology among populations. For the Great Lakes and Gulf Coast birds, we collected recordings beginning on June 1 in each year to ensure that we recorded local birds on established territories. The number of individuals recorded at a site varied from 1 to 35, averaging 18.6 individuals site−1.
We recorded individual Marsh Wrens on territory from sunrise (earliest time: 0525 hours EST) until midday (latest time: 1245 hours), using directional microphones (Sennheiser ME67 with K6 power module) and solid-state digital recorders (Marantz PMD 660) set to 24-bit sample depth and 48 kHz sample rate. Birds were not color banded for individual identification, but each territory was identified by GPS location and adjacent territories were recorded on the same day, and simultaneously when possible, by different observers to ensure recording of distinct individuals one time each. Recording sessions ranged from 20 to 60 min. We dictated the behavior of the bird during each observation onto the recording and included information about the presence of any conspecifics and whether we could locate an active nest on the territory. Sex was either assessed via behavior (female Marsh Wrens are not known to sing; Kroodsma and Verner 2013) or was already known for previously captured birds, which we had fit with a federal band as part of a related study, at which time we ascertained sex via visual inspection of cloacal protuberance or brood patch.
Acoustic Description of Song and Call Types
We annotated entire recordings—including songs, calls, and behavioral observations—for each individual, using Raven Pro 1.4 (Cornell Lab of Ornithology, Ithaca, New York, USA). From the recordings, we identified 8 qualitatively distinct vocalizations: songs (Figure 2) and 7 call types (hereafter “buzz,” “chuck,” “churr,” “rattle,” “scream,” “trill,” and “twitter”; Figure 3). We defined a call bout as a series of vocalizations separated by >0.5 s of silence. A preliminary analysis in 3 subspecies examining the spacing (time) between all vocal elements (i.e. traces on a spectrogram; excluding notes within songs) showed that most traces were within 100 ms of one another, representing notes within syllables. There was a second major drop-off between 300 and 500 ms with a long tail that continued out past 5 s, which we identified as intervals between call bouts. This suggested that 0.5 s was a conservative estimate for identifying a break in vocal bouts that would be reasonable for an observer to quickly visualize on a spectrogram. Each call bout was extracted from the original file, named individually, and high-pass filtered to reduce background noise (buzz = 500 Hz; chuck, churr, rattle, scream = 800 Hz; trill, twitter = 1,000 Hz). We standardized the amplitude of each recording and removed background noise between notes before making detailed measurements of each call.
We measured spectral and temporal parameters for each call type using SIGNAL/RTSD (Engineering Design 2015). In keeping with common practice, we defined a note as a single trace on a spectrogram, and a syllable as a series of consistently repeating notes that occur clustered together. For each call, we took the following acoustic measurements (using a 32,768-point fast Fourier transform [FFT] for frequency measures; spectral resolution = 0.73 Hz; time measurements taken from a time waveform aligned with a 256-point FFT; temporal resolution = 5.3 ms): total duration of the call, number of notes in the call (or, in the case of the churr call, number of syllables), duration of an individual note in the calling bout (for the churr call, the duty cycle of a single note), temporal spacing between notes in the bout (hereafter “inter-note interval”), peak frequency (frequency of greatest power), bandwidth at 15 dB down from peak, and lower and upper frequency limits 15 dB down from peak. For the buzz call, which consisted of a sequence of amplitude modulations (hereafter “pulses”) separated by silence, we also measured the duty cycle of a single pulse, as well as the average depth of amplitude modulation in the pulses. To measure depth of amplitude modulation, we tracked the amplitude envelope of the signal with a 2 ms exponential decay. We then calculated a long-term contour (running average width in milliseconds = duration of call divided by 10) of the envelope and subtracted this contour from the original amplitude envelope, removing any remaining DC offset, to create a modulated signal with an average value of zero. We then calculated the standard error of the resulting envelope over the duration of the call.
We measured 1–20 high-quality call exemplars for each individual (when we recorded >20 calls of a given type, we measured the 20 highest-quality calls). Within an individual, we averaged all measures of a given call type and retained for subsequent analyses only individuals with ≥3 examples of a call type. Subsequent analyses were then performed on the individual average value of a call variable for those individuals with ≥3 examples of the call type.
In early inspections of spectrograms of 2 call types (the trill and twitter), the frequency and bandwidth of notes appeared to change throughout the duration of the call. To quantify this pattern, we measured peak frequency, bandwidth, lower and upper frequency limits, note duration, and inter-note intervals for 3 groups of 4 consecutive notes representing the beginning, middle, and end of each call. Then we used repeated-measures analysis of variance (ANOVA) to test for variation in each measure across call duration. We corrected for multiple testing using a false-discovery-rate adjustment. The false discovery rate controls the proportion of type 1 errors accepted and is more powerful but less stringent than the Bonferroni correction. We ranked P values from smallest to largest and then accepted as significant only P values that exceeded the value of family-wise alpha*rank/total number of tests (α = 0.05; Benjamini-Hochberg 1995). For each variable that was not significantly different among positions in the call, we chose only a single section of the call for further analyses.
Quantitative Discrimination of Call Types
We based our initial identification of call types on visual inspection of spectrograms and observations of Marsh Wren behavior. While most individual calls were visually and acoustically distinct and easily classified, several represented intermediate forms. We used multivariate tests and t-tests to analyze the chuck, churr, and rattle calls (broadband, short duration) and the trill and twitter calls (narrowband, longer duration) in greater detail to determine whether these call classes were statistically distinct. For these analyses, we used individual average measurements for each call type (any individual with ≥3 instances of a call type) regardless of subspecies designation.
Chuck, churr, rattle comparison. The chuck, churr, and rattle are all broadband signals with rapid onset and offset and are produced as a sequence of notes (chuck, churr, rattle) or syllables (churr) (Figure 3C–3E). To establish the distinctiveness of these calls, we employed a linear-model-based categorization method. At this level of analysis, many of our acoustic measurements did not conform to multivariate normality, so we used a multinomial logistic regression with a logit link function (a nonparametric equivalent to linear discriminant analysis) to test for differences among call types (SPSS 24.0; IBM, Armonk, New York, USA). We used means of original variables, rather than a composite measure such as a principal component score, in the multinomial regression in order to interpret directly how the call types compared with one another. Several of our acoustic measures were tightly correlated, such as bandwidth, peak frequency, and upper frequency limit. In these cases, we retained only one measure in our model to avoid issues of collinearity. Because call duration and number of notes in a call can vary widely within each of the call types, we did not include them in this model, focusing instead on the properties of individual notes. Our final model included the 3 most normally distributed, non-collinear measures of individual notes: bandwidth, lower frequency limit, and note duration (or, for churr calls, pulse duration).
Trill and twitter comparison. The trill and twitter calls are narrowband calls consisting of a series of repeated notes (Figure 3F, 3G). Although the trill and twitter have visible differences in call structure based on the spectrogram, the 2 calls are used in similar contexts and are often observed with trill-like notes interspersed with twitter-like notes in calls that we categorized as “mixed.” Multivariate analyses using a binomial model and logit link function to differentiate the trill and twitter failed to converge. As a consequence, we tested trill and twitter calls using paired t-tests or Wilcoxon signed-rank tests (depending on whether the data met the assumptions of parametric tests) to determine whether any of the acoustic measurements from the 2 subjective categories were quantitatively different. We corrected significance using a false-discovery-rate error adjustment for a family-wise alpha (α = 0.05; Benjamini-Hochberg 1995).
Behavioral Association of Call Types
For each call type, we observed 47–171 individuals with sufficient quality to describe behavior associated with that call type. In each case, we noted the sex of the caller, whether there was another Marsh Wren detected on the territory at the time of the vocalization, and whether that territory contained an active nest that we could locate. When Marsh Wrens were visible through the dense vegetation, we also noted behaviors related to the following categories: courtship (including male courtship display, copulation solicitation, or rebuffing courtship; Welter 1935); territory patrol (moving frequently around the territory without engaging in any other identifiable behavior); agonistic interactions (pursuing another Marsh Wren or being pursued by another Marsh Wren); post-agonistic interaction (within 30 s of agonistic interaction); indeterminate pursuit (an adult bird following and closely associating with another Marsh Wren, but without engaging in any courtship or agonistic behaviors); nonspecific interaction (interaction with no clearly delineated function); interaction with young (including provisioning nestlings); preening; foraging; nest building; flight; a heterospecific interaction; and reaction related to researcher presence.
Geographic Call Variation
For each functional–ecological level of analysis (subspecies, migratory pattern, or habitat type), we performed a linear discriminant analysis (LDA) on the acoustic properties of each of the 7 call types (lda function, MASS package; Venables and Ripley 2002). LDA is a multivariate technique that fits orthogonal, linear functions from a series of predictor variables to divide individuals into assigned categorical groups with the least amount of error. It is a useful technique for determining whether proposed groups can be reliably identified on the basis of the simultaneous assessment of a small number of measureable traits. Various thresholds have been proposed regarding what constitutes a reliable, correct classification of groups, all based on some variation of a “75% rule” (75% of population A lies outside the range of population B; reviewed in Patten and Unitt 2002). Here, rather than designate a number as a critical threshold for diagnosability, we discuss instead the relative ability of each of our functional–ecological levels of analysis to explain variation in our data. All LDAs were performed using jackknife, leave-one-out cross-validation and group sizes proportional to original group membership. In each test, only variables that met the following assumptions of an LDA were included in the model: normality, equal variance, not covarying with any other variables.
Testing for variation among subspecies. We ran the LDA model with subspecies as the response variable and only the acoustic measures that conformed to the assumptions of the LDA as predictor variables. For all acoustic analyses examining subspecies' differences, we excluded waynei because we were only able to sample this population at a single site (24 individuals).
Testing for variation among migratory patterns. We categorized populations to migratory pattern on the basis of previous natural-history descriptions of the populations (Kroodsma and Verner 2013). We considered dissaeptus to be full migrants, palustris and waynei to be partial migrants, and griseus and marianae to be nonmigrants. We ran the LDA model with migratory pattern as the response variable and as many acoustic measures as conformed to the assumptions of the LDA as predictor variables.
Testing for variation among habitat types. For habitat-type analysis, all of the tidal marsh subspecies were treated together as a saltmarsh group (griseus, marianae, palustris, waynei) and compared with the freshwater-marsh subspecies (dissaeptus). We treated habitat type as the response variable and as many acoustic measures as conformed to the assumptions of the LDA as predictor variables.
To address the differences in sample sizes between the 2 habitat categories, we established a 95% confidence interval (CI) for the saltmarsh population's linear discriminant (LD) values and determined the number of freshwater-marsh individuals that fell outside the 95% CI. In order to establish an LD value for each individual bird, we reran the LDA for each call type without jackknifing, because the jackknifing procedure calculates probabilities of group membership, not explicit LD values. Then we bootstrapped the saltmarsh sample by sampling with replacement to simulate 1,000 new samples of saltmarsh LD values that were equal in size to the smaller freshwater-marsh sample. We calculated the mean ± SD for each of our 1,000 bootstrapped samples, and then used the mean-of-the-mean ± 1.96 times the mean-of-the-standard deviation to create a 95% CI for the saltmarsh LD values.
In addition to variation in acoustic qualities of calls, we were interested in the commonness of each call type within each of the 3 functional–ecological categories. We examined the commonness of call types by coding observations of individual birds as binary responses; a call was considered “present” for an individual (that individual produced the call type at least one time) or “absent” (that individual was never observed to produce the call type). Because all birds within a sample site share the same predictor categories (subspecies, migratory pattern, or habitat type), we used sample site as a unit of comparison by calculating the proportion of individuals with a call type “present” at each site (Crawley 2007).
We included data from 17 sites representing 4 subspecies. We excluded waynei because we sampled it only at a single site, and one sample site for griseus because we were able to record only a single bird at that site. Sample sizes for the remaining sites varied from 4 to 35 individuals, with an average of 20 individuals observed per location. Excluding the sites with <14 individuals did not change the outcome of the analyses. For these analyses, we were able to include all occurrences of all call types regardless of quality, including the twitter from dissaeptus, because we were evaluating presence or absence and not taking detailed acoustic measurements.
We evaluated whether the log likelihood of a call type being present at a site could be predicted by a fixed effect of grouping strategy (subspecies, migratory pattern, or habitat type) and a random effect of ordinal date to account for seasonality using a generalized linear mixed model with a binomial distribution and logit link function (glmer, lme4 package; Bates et al. 2015). All models were ranked against the null model of intercept and the random effect of date. We used a combination of ΔAICc scores and P values from a likelihood ratio test to determine the model that best explained differences in probability of call observation (AICc(), qpcR package, Spiess 2014; anova(), stats package, R Core Team 2014). The best-fit model was considered a model that was significantly different from the null model under a likelihood ratio test and either ≥2 ΔAICc values better than the next best model (in the case that the top 2 models were ≤2 ΔAICc values apart, the model with the fewest number of parameters was selected). Call occurrence was coded as either “present” or “absent” for each bird, and individual responses were combined to find an overall probability of a call being present at each sampling location. For each call type, we ranked 4 binomial models using the probability of observing the call (response variable) at a given sampling location and predictor variables of either a null random intercept, subspecies designation, migratory pattern designation, or habitat-type designation while controlling for the random effect of sample date. When the highest-ranking model was significantly different from the null model, we performed a post hoc analysis using least-squares-means Tukey's pairwise independent contrasts (lsmeans package; Lenth 2016) to determine which comparisons contributed to the difference while accounting for unbalanced samples and the inclusion of a random effect in the model. From the post hoc analysis, we found predictions of the group averages with 95% CIs and odds ratios (magnitude of difference) for the pairwise contrasts.
Calls Embedded in Songs
Often, Marsh Wren songs have at least one embedded call element, including the buzz, trill, and/or twitter. This phenomenon was quite common in all of our observations, but since call elements may play an important role in courtship in the Marsh Wren, we wanted to know whether the inclusion of embedded call elements in songs varied at any of our 3 levels of functional–ecological analysis: subspecies, migratory pattern, or habitat type. We considered any vocalization with one or more calls embedded in a song as a subcategory of song, “call-song.” The commonness of a call-song can vary in 2 ways. First, some functional–ecological groups might be more or less likely to produce call-songs. We analyzed whether the probability of a call-song's presence at a sample site varied at each functional–ecological level using the “call presence–absence” model-ranking method described above. Second, individual birds within a site might be more or less likely to produce call-songs. To address whether the proportion of songs vs. call-songs that an individual produced varied within vs. among functional–ecological groups, we employed an ANOVA-like framework. We limited our analysis to include only individuals for which we recorded ≥30 songs. For each individual, we calculated the proportion of total songs recorded that contained embedded call elements. The distribution of proportions was non-normal, could not be transformed to approximate normality, and was bounded from 0 to 1. As a result, we used a nonparametric Kruskal-Wallis test to compare groups according to rank rather than raw proportion values. Lastly, in order to balance the sample design at the subspecies level, we randomly sampled 50 dissaeptus individuals to include in the analysis. Sample sizes for the subspecies analysis were as follows: dissaeptus = 50, griseus = 22, marianae = 50, and palustris = 44. All individuals with ≥30 songs were retained for the migratory-pattern and habitat-type analyses. Sample sizes for the migratory-pattern analysis were as follows: full migrants = 98, partial migrants = 54, and nonmigrants = 72. Samples sizes for the habitat-type analysis were as follows: freshwater-marsh = 98 and saltmarsh = 124.
Acoustic Description of Song and Call Types
Song. Marsh Wren songs begin with a series of introductory notes followed by a rapid trill of 4–16 syllables (groups of repeated notes) and end with 2–6 short pure-tone notes. Cistothorus p. dissaeptus typically produced 2 or 3 very short, broadband introductory notes (Figure 2A). The rest of the eastern subspecies produced introductory notes that consisted of a series of harmonic sweeps that increased in frequency and bandwidth (Figure 2B). Songs could be produced independently, but 85.3% of all songs observed were associated with one or more calls. If produced in the context of a song, the buzz call was added before the song or embedded in the introductory notes. Twitter calls were commonly added before the introductory notes in the saltmarsh populations, were rarely added in freshwater-marsh populations, and could transition into introductory notes of the song in some cases. The twitter call was rarely produced after the song but in that case was often followed immediately by another song. The trill call was typically added after the song was completed and, in some cases, bridged 2 songs together. The subspecies studied here have song repertoires of 30–60 song types (Kroodsma and Verner 1987, Luttrell et al. 2016).
Buzz. The buzz call is a low-amplitude, noisy sound composed of 7–59 broadband pulses separated by short periods of silence (Figure 3A and Table 1). Rarely, the buzz included some tonal harmonic elements produced at the end of the vocalization. Marsh Wrens may produce a single buzz call or a series of buzz calls that can vary in duration or intensity. We treated each individual buzz as a separate unit.
Acoustic measurements of the 7 Marsh Wren call types. Grand mean ± mean SD is presented for all measures to show the overall mean for eastern Marsh Wrens. Coefficient of variation for within-individual variation is shown in parentheses. In the case of the buzz call, “note duration” represents the duty cycle of the pulse duration.
Chuck. The chuck call is a short, loud, broadband vocalization with rapid onset and offset (Figure 3C and Table 1). Chucks were produced in bouts of individual notes, though the temporal spacing of notes within a bout varied. We considered each note as an individual unit within a bout.
Churr. The churr call is a sequence of syllables that can vary in amplitude from quiet to loud, each consisting of several short, broadband notes with rapid onset and offset (Figure 3D and Table 1). Bouts varied in length from 1 to 217 syllables. The duration of the syllable could be increased or decreased by altering the number of notes within the syllable (the duty cycle of the notes was relatively invariant to the number of notes in the syllable; Table 1).
Rattle. The rattle call is a loud sequence of short-duration, broadband notes with rapid onset and offset that have a descending frequency sweep (Figure 3E and Table 1). It is produced as a series of individual notes.
Scream. The scream is a broadband call that contains both noisy and harmonic tonal elements (Figure 3B and Table 1). It is amplitude modulated, like the buzz, but the modulations are less regular in time and have a smaller dynamic range. The relative contribution of noise vs. tonal elements within a call varied within an individual and across bouts. It appears to be functionally and structurally analogous to the distress calls of many species of passerines (Jurisevic and Sanderson 1994).
Trill. The trill call consists of a series of short, narrowband notes with harmonics (and sometimes nonharmonic elements), produced as a sequence (Figure 3G and Table 1). Notes could be spaced evenly (single), could be pairs of notes spaced closely together with longer gaps between pairs (pairs), could be 3 quickly produced notes with longer gaps between these groups (triplets), or could be a mix of these rhythmic pattern types within a single bout. Note structure and duration were independent of rhythmic pattern. Within an individual bout, the bandwidth and peak frequency of notes was consistent over the duration of the call. Apparent small shifts in frequency during a call were not significant after correcting for multiple comparisons (Appendix Table 4). Although the trill was produced independently, it was also commonly associated with the buzz or the song. Independently produced trill calls were considered for all further behavioral and acoustic analyses.
Twitter. Like the trill, the twitter call is a series of narrowband notes with harmonics. However, the twitter most commonly consisted of a 2-note syllable (a short-duration note followed by a longer-duration note with an upward frequency sweep), although there was some variation in this pattern (Figure 3F and Table 1). Multiple measures of note frequency increased from the beginning to the middle portion of the call, then remained invariant from the middle to the end of the call (Appendix Table 4). The lower frequency limit increased by a mean of 112 Hz, the upper frequency limit increased by a mean of 157 Hz, and the peak frequency increased by a mean of 190 Hz. Subtle changes in bandwidth were not significant after corrections for multiple testing (Appendix Table 4). Like the trill, the twitter was often produced as an independent vocalization but could be produced immediately following a buzz call or immediately prior to a song. Subsequent analyses included only twitter calls produced independently of other vocalizations.
Trill–twitter mix. Twitter notes are often interspersed with trill notes to produce a “mixed” call type, which we observed 1,311 times from 129 individuals (32% of all trill- or twitter-like calls were mixed call types, and 32% of all individuals that produced trill- or twitter-like calls produced mixed call types). In this mixed call type, birds transitioned between a series of trill-like and twitter-like notes, and the number of times the note types switched back and forth varied. All subsequent measures for trill and twitter calls came only from unmixed versions of each call.
Behavioral Association of Call Types
We observed both sexes producing all call types, but for every call type one sex was more likely to produce the call than the other. Behavioral observation data are summarized in Table 2. We observed chucks, churrs, rattles, and screams more often when a conspecific was detected on the territory. The remaining calls (buzz, trill, and twitter) and song were observed in similar proportions of recordings when conspecifics were either detected or not detected on the territory.
Behavioral observations for each Marsh Wren call type reported as a proportion of total observations for each call type. First, second, and third most common behaviors associated with each call type are in bold and are indicated with 1, 2, and 3 asterisks, respectively.
All call types fell most commonly into 2 broad behavioral categories: territory patrol or agonistic interactions. The buzz, churr, trill, twitter, and song were all most commonly associated with territory-patrol behaviors, nest building, and courtship, and males were more likely to produce all of these calls than females (ranging from 58% to 100% of observations from known males). The chuck, rattle, and scream were most commonly observed in agonistic encounters, and the sex of the caller varied depending on the behavioral context of the calling event but was most often female or of unknown sex.
Quantitative Discrimination of Call Types
Chuck, churr, rattle comparison. The chuck, churr, and rattle were readily distinguishable on the basis of bandwidth and note duration. A multinomial logistic regression of call type using bandwidth, lower frequency limit, and note duration as predictor variables was significantly different from a null model of intercept only (χ2 = 331.35, df = 6, P < 0.001; Nagelkerke's R2 = 0.956). The lower frequency limit did not contribute significantly to the model in a likelihood ratio test (χ2 = 4.74, df = 2, P = 0.093) but was left in, because it may be biologically relevant. The full model separated the chuck, churr, and rattle with a high degree of accuracy (Figure 4A). The model identified 95.8% of chucks correctly, 93.0% of churrs correctly, and 96.4% of rattles correctly. Bandwidth differed significantly between the churr and chuck calls (churrs having narrower bandwidth; Wald statistic = 11.54, df = 1, P = 0.001), and between the churr and rattle calls (churrs having narrower bandwidth; Wald statistic = 17.58, df = 1, P < 0.001), but did not differentiate between the chuck and rattle calls (Wald statistic = 3.10, df = 1, P = 0.078). Note duration differed between the rattle and chuck calls (rattles having shorter duration; Wald statistic = 9.63, df = 1, P = 0.002) and between the rattle and churr calls (rattle having shorter note duration; Wald statistic = 22.41, df = 1, P < 0.001) but did not differentiate between the chuck and the churr calls (Wald statistic = 1.40, df = 1, P = 0.236).
Trill and twitter comparison. We found many significant differences between the trill and the twitter, mostly related to the up-sweeping pure-tone note found in the twitter that was typically the second note in a 2-note repeated pattern (absent in the trill) (Figure 4B). Using pairwise comparisons of all calls designated by eye from spectrograms as pure trill or pure twitter, we found that 13 of 16 variables were significantly different after correcting for multiple testing via the Benjamini-Hochberg false discovery rate (Bejamini-Hochberg 1995; Appendix Table 5). Only the first inter-note interval, peak frequency of the beginning of the call, and peak frequency of the middle of the call did not differ significantly between the trill call and the twitter call.
Geographic Call Variation
Variation among subspecies. At the subspecific level as a whole, LDA did not reliably discriminate any call type. In all cases, we excluded waynei from our analyses because its sample size was too small in relation to the number of variables used in the LDA. Therefore, for each call type, 3 LD functions were fit to differentiate 4 subspecies based on a suite of normally distributed, noncorrelated acoustic measurements. Overall, the percentage of individuals assigned to the correct subspecies by each LDA ranged from 28% for the rattle to 65% for the buzz. Two call types, the buzz and twitter, had overall correct classifications >50%, but within-call-type correct classification among subspecies was highly variable (10–87% correct; Appendix Table 6). A 25% correct assignment is expected by chance alone. There was no consistent pattern across call types in which subspecies were best or worst classified.
Variation among migratory patterns. For the buzz, rattle, trill, and twitter, we found that LDA differentiated migratory pattern with greater accuracy than subspecies identity. The chuck, churr, and scream were slightly less well categorized by migratory pattern than by subspecies identity (Appendix Table 7). For each call type except the twitter, 2 LD functions were fit to differentiate 3 migratory patterns based on a suite of normally distributed, noncorrelated acoustic measurements. We had only 3 examples of poor-quality twitters from nonmigratory birds (dissaeptus), and so nonmigrants were excluded from the migratory-pattern model for the twitter analysis, resulting in one LD function fit to differentiate between partial and nonmigrants. The percent correct classifications for the chuck, churr, rattle, and scream were poor, ranging from 21% for the scream to 37% for the chuck and churr. A 33% correct assignment was expected by chance. The LDA models performed better for the buzz, trill, and twitter (69–71% overall correct classification; Appendix Table 7). For both the buzz and the trill, only the nonmigratory and fully migratory birds were well classified, whereas the partially migratory populations were only correctly classified 40% and 31% of the time, respectively (Appendix Table 7).
The final model for the twitter included peak frequency, lower frequency limit, bandwidth, note rate, length of the fourth note, duration of the first inter-note interval, and duration of the second inter-note interval. The most important variable of the LD function was note rate (LD coefficient = 0.50). Nonmigratory birds had a lower note rate (13.7 notes s−1) than partially migratory birds (14.8 notes s−1). Nonmigratory birds also had a shorter first-note duration in the 2-note series (39.5 vs. 43.0 ms) than partially migratory birds (LD coefficient = 0.07).
Variation among habitat types. Because freshwater-marsh birds (dissaeptus) rarely produced twitter-like calls (see above), we were unable to analyze the twitter at the habitat level. For each call type, one LD function was fit to differentiate 2 groups: freshwater-marsh birds and saltmarsh birds. For each call type, the habitat-type models performed best compared to the subspecies and migratory categorizations, classifying between 56% and 90% of individuals correctly. The chuck, rattle, and scream were still poorly categorized (56–61% correct; 50% correct classification expected by chance). However, the accuracies of all habitat-level LDAs were skewed as a result of sample-size differences (a much larger sample of saltmarsh birds as a whole compared with freshwater-marsh birds). Because our LDAs included prior probabilities of group membership based on original group sizes, they would, by design, assign more individuals to the saltmarsh category. In all calls except the buzz, we had a high percentage of correct classification for the saltmarsh birds and a low percentage of correct classification for the freshwater-marsh birds (Appendix Table 8; the buzz LDA had nearly equal sample sizes and assigned 89% of freshwater-marsh birds correctly and 82% of saltmarsh birds correctly).
To address the disparity in sample sizes, we created a 95% CI for the LD values for the saltmarsh subspecies using 1,000 bootstrap replicate samples equal in size to the freshwater sample (see Figure 5). We found that for the chuck, churr, rattle, and scream, fewer than a third of the freshwater-marsh individuals had LD values that fell outside of the saltmarsh 95% CI (15%, 32%, 29%, and 0% respectively). Therefore, although there may be acoustic differences in the chuck, churr, and rattle between habitat types, the magnitude of those differences was small.
For the buzz, although the LDA discriminated readily between the 2 habitat types, only 39% of freshwater-marsh birds fell outside the range of 95% CIs of the saltmarsh birds. The model included peak frequency, pulse duration, and SE of pulse depth. The highest weighted coefficients in the LD function for the buzz were mean pulse duration (LD coefficient = 0.338) and SE of pulse depth (LD coefficient = −0.264). Freshwater-marsh birds had shorter pulse duration (5.5 vs. 6.1 ms) and a greater degree of amplitude modulation (pulse depth) (SE*1,000 = 17 vs. SE*1,000 = 14) than saltmarsh birds.
For the trill, we found that 72% of the freshwater-marsh birds fell outside of the 95% CI for saltmarsh birds, demonstrating that habitat-level differences in the trill call type are sufficiently large that we were able to detect them even with a skewed sample size. The final model included peak frequency, upper frequency limit, bandwidth, note rate, first and second inter-note duration, and total call duration. The highest weighted coefficients in the LDA were mean note rate (LD coefficient = 0.25) and total call duration (LD coefficient = 0.37). Freshwater-marsh birds had a lower note rate (17 vs. 22 notes s−1) and shorter total call duration (1.5 vs. 2.2 s) than saltmarsh birds.
Using a model selection approach, we asked whether the functional–ecological designation of a population predicted the likelihood of observing a call type. The results are summarized in Table 3. For the churr, rattle, scream, and trill, there were no significant differences in the likelihood of observing a call at any functional–ecological level. In these cases, we found that the null model was the best fit.
Results of best-fit logistic models predicting the likelihood of observing each call type based on fixed effects of either a random intercept (null) model, subspecies model, migratory-pattern model, or habitat-type model each with random effect of ordinal date. Least-squares-mean post hoc analysis with Tukey contrasts demonstrates the magnitude of the likelihood difference as an odds-ratio with the first pairwise object as the reference. P value of least-squares mean indicates the significance of the pairwise comparison. Odds ratio of 1.0 is equivalent. Bold indicates significant differences.
For the buzz call, the habitat model was the best-fit model and was 10.8 ΔAICc values better than the null model (χ2 = 12.77, df = 1, P = 0.001). We were more likely to observe a buzz from freshwater-marsh birds than from saltmarsh birds (P = 0.005; Figure 6A). For the chuck call, the subspecies model was the best-fit model and was 5.2 ΔAICc values better than the null model (χ2 = 5.66, df = 1, P = 0.017). In general, we were more likely to observe a chuck from an Atlantic coast subspecies (griseus and palustris) than from any other subspecies. Post hoc contrasts indicated several significant pairwise subspecies comparisons. We were significantly more likely to observe a chuck from palustris vs. dissaeptus (P = 0.018) and from palustris vs. marianae (P = 0.002). Additionally, we were more likely to observe a chuck from griseus vs. marianae, but this difference only approached significance (P = 0.061; Figure 6D).
For the twitter call, we found that habitat was the best-fit model and was 44.5 ΔAICc values lower than the null model (χ2 = 46.43, df = 1, P << 0.001). Post hoc analysis of the model indicated that we were much more likely to observe a twitter in saltmarsh birds than in freshwater-marsh birds (P < 0.001; Figure 6B). For the trill–twitter mix, the habitat model was the best-fit model, at 26.9 ΔAICc values lower than the null model (χ2 = 23.71, df = 1, P < 0.001). Similar to the twitter, we found that the trill–twitter mix was rare in freshwater-marsh birds compared to saltmarsh birds (P < 0.001; Figure 6C).
Calls Embedded in Songs
Within an individual, the proportion of songs produced that included one or more call elements was high in all subspecies, ranging from 72.5% to 96.7% of all songs produced. At each functional–ecological level of testing, we found significant differences in the proportion of songs produced that had embedded call elements (Kruskal-Wallis test; habitat level: χ2 = 29.95, df = 1, P << 0.001; migratory-pattern level: χ2 = 30.04, df = 2, P << 0.001; subspecies level: χ2 = 33.56, df = 3, P << 0.001; Figure 7). Post hoc Dunn's tests with Bonferroni adjustment revealed significant pairwise differences at the subspecies level between dissaeptus and marianae (P << 0.001; dissaeptus mean [± SD] = 0.93 ± 0.10, marianae mean = 0.70 ± 0.29), between dissaeptus and palustris (P < 0.001; palustris mean = 0.80 ± 0.22), and between griseus and marianae (P = 0.020; griseus mean = 0.85 ± 0.23). Post hoc tests also revealed differences at the migratory-pattern level between full migration and nonmigration (P << 0.001; full-migrant mean = 0.92 ± 0.12, nonmigrant mean = 0.75 ± 0.28), and between full migration and partial migration (P << 0.001; partial-migrant mean = 0.76 ± 0.24). Freshwater-marsh birds had a significantly higher proportion of call-songs in their repertoire (freshwater-marsh birds: mean = 0.92 ± 0.11; saltmarsh birds: mean = 0.75 ± 0.26). Although we were unable to determine which level of analysis was the best for explaining the variation in proportion of songs with embedded calls because the 3 Kruskal-Wallis tests are not directly comparable in all cases, the single freshwater, fully migratory subspecies, dissaeptus, had a higher proportion of call-songs in its repertoire than any other combination of habitat, subspecies, or migratory pattern, and the magnitude of this difference is most striking at the habitat level.
Differences in acoustic signals that correspond to taxonomy, geography, or habitat type have been noted across a wide variety of taxa and may play an important role in understanding the mode and tempo of species diversification (e.g., Ryan and Wilczynski 1991, Wielgart and Whitehead 1997, Irwin 2000, Mendelson and Shaw 2002, Parmentier et al. 2005, Stephan and Zuberbühler 2008, Jiang et al. 2015). One way in which acoustic signals may facilitate the divergence of populations is by acting as barriers to hybridization, since acoustic signals can diverge faster than postzygotic barriers to reproduction (Grant and Grant 1996, Wilkins et al. 2013). As mate attraction signals, bird songs have a clear potential to act as prezygotic barriers to hybridization. Calls might also act as barriers to hybridization, particularly if they function in contexts related to mate attraction, though structural and functional differences in calls used in other contexts may still reflect the degree of underlying diversification (while not acting as barriers themselves). Barriers to hybridization could be especially important in recently diverged taxa that inhabit different physiologically challenging environments that favor local adaptation, such as saltmarsh endemics (Basham and Mewaldt 1987, Greenberg and Droege 1990, Greenberg and Danner 2012, Maley 2012, Danner et al. 2017). Signal divergence can occur through multiple mechanisms, including indirect selection on the morphology of the signal-production apparatus (e.g., Ballentine 2006), direct selection of the habitat on the signal (e.g., acoustic transmission properties; Morton 1975), cultural selection on the signal form (e.g., Price 1998, Soha et al. 2004), social or sexual selection on the signal form (e.g., Colbeck et al. 2010, Dingle et al. 2010), and the cultural or genetic drift of the signal (e.g., Irwin et al. 2008). One of the challenges in comparing acoustic signals among populations is that animals with complex acoustic repertoires may demonstrate different patterns of selection and differentiation on different calls within the repertoire, depending on their form and function (Laiolo et al. 2001a, Laiolo and Rolando 2002, Irwin et al. 2008, Sturge et al. 2016).
Acoustic Descriptions of Call Types
Marsh Wrens, like many birds, have complicated call repertoires that facilitate interactions among individuals. We described the acoustic properties and behavioral context of calling behavior in Marsh Wrens, identified 7 discrete call types, and quantified the acoustic properties of those calls across all eastern subspecies. Despite the fact that Marsh Wrens are models of vocal learning with well-studied song repertoires (Verner 1975, Kroodsma and Verner 1987, Luttrell et al. 2016), Marsh Wren calls have not been formally described prior to this work.
Quantitative Discrimination of Call Types
Multivariate analysis among similar calls confirmed our initial qualitative categorization of 7 call types. Intermediate call types may be produced but are uncommon. Variation within individuals ranged from stereotyped (most notably in note duration of alarm calls) to highly variable (e.g., total call duration), depending on the call type and measurement. Flexibility in delivery in each call type may allow for graded signal information that parallels underlying signaler motivation but falls within a defined acoustic space for each of the 7 call types (Morton 1977, Davis 1988, Ficken 1990; Figure 4). Graded signals are common in the alarm calls of some mammals (Fichtel and Kappeler 2002) and birds, including Phasianidae, Corvidae, and Paridae (Suzuki 2016). Within Troglodytidae, all species with at least partially described call repertoires have at least one call that could be considered graded (Hejl et al. 2002, Hamilton et al. 2011, Toews and Irwin 2012, Haggerty and Morton 2014, Johnson 2014).
Behavioral Association of Call Types
The behavioral contexts of Marsh Wren vocalizations fall into 2 broad categories: calls associated with mate attraction or territory patrol, and calls associated with alarm or distress. The buzz, churr, trill, and twitter calls, as well as Marsh Wren song, were all most commonly associated with territory-patrol behaviors, nest building, and courtship (Table 2). Males were more likely to produce all of these vocalizations than females. Our observations of Marsh Wren calling behavior were made during the breeding season. Nonmigratory populations of Marsh Wren do not defend territories in the nonbreeding season, when social interactions may differ (Kale 1965, Verner 1965). Future work during the nonbreeding season would help identify any behavioral differences in the described calls or additional call types not used during the breeding season (Marler 2004b).
The buzz call is an integral part of male Marsh Wren courtship. Welter (1935) described the buzz call as part of the song and reported that it was rarely heard after the beginning of June. By contrast, we observed the buzz call more often than any other call type, and it was produced throughout the entirety of our field seasons (May–July), both independently and in conjunction with song. Marsh Wrens have a distinctive courtship display during which the male raises the tail over the head and rocks forward and backward while primarily producing the buzz call (Welter 1935, Kale 1965). The buzz, churr, trill, and twitter calls were also used during pursuit of females around the territory (song is less commonly used than calls during these courtship pursuits). The buzz and associated display have also been reported during territory intrusions (Welter 1935, Kale 1965), but during simulated territory intrusions we found that the male typically produced the buzz and display when a female also approached the simulated intruder.
In addition to territory patrol, male Marsh Wrens also frequently produced the buzz call during nest building; and, to a lesser degree, they also produced the churr, trill, and twitter calls, as well as song, in this context. Eastern male Marsh Wrens build, on average, 5–12 nests on a single territory that become courtship display centers, although these nests likely serve multiple functions (Bent 1948, Kale 1965, Picman 1977). The trill and twitter calls were also often produced during flight while entering or leaving the territory. The trill, in particular, was associated with song during pop-up flight displays (as noted by Welter 1935, Kale 1965).
The remaining call types—chuck, rattle, and scream—were most commonly observed in agonistic encounters and could be produced by either the instigator or the recipient of conspecific agonistic encounters. The chuck is an alarm call used by both sexes, usually when a conspecific was present, and most commonly by females, confirming Welter's (1935) observations. Marsh Wrens may produce chuck calls in response to other Marsh Wren intrusions, during an unwanted courtship display, or in response to researcher presence. The rate of delivery and number of notes are variable within and among individuals, which suggests a graded structure to the signal (Davis 1988, Fichtel and Kappeler 2002). The rattle call was typically produced by females or by birds of unknown sex and was nearly always observed when conspecifics were detected. It is likely analogous with Welter's (1935) “kek,” which he describes as a female-only alarm call. In keeping with Welter's observations, we observed the rattle most often immediately before or after an agonistic encounter, but also often during flight away from the same location on repeated observations, which suggests that this call may be associated with nest departure. Nest-departure calls are common in marsh-breeding species and are found frequently in species that breed in high densities (McDonald and Greenberg 1991). The broad-bandwidth, steeply modulated, repetitive structure of the rattle call is consistent with nest-departure vocalizations that are easily localizable and can be used to track flight trajectory (McDonald and Greenberg 1991, Haff et al. 2015). Marsh Wrens readily destroy conspecific nests, and females guard nests closely (Picman 1977). If females use rattle calls to alert males when the nest is left unattended, it may function to increase vigilance against potentially agonistic interactions (Yasukawa 1989, Grunst et al. 2014). The scream was most commonly observed from birds of unknown sex or females that were the recipients of aggressive encounters. It was also produced when birds were restrained for banding and measurements. The broadband, noisy, frequency- and amplitude-modulated nature of the scream was typical of distress calls used by a variety of passerines (Conover 1994, Jurisevic and Sanderson 1994).
Geographic Call Variation
We examined call variation in the context of 3 levels of functional–ecological categorization—subspecies-level grouping, migratory behavior, and habitat type—and found that 4 call types were not diagnosable by any level of functional–ecological categorization. That is, these calls were similar in acoustic structure across the range of Marsh Wren populations we examined. By contrast, another call, the twitter, was most accurately categorized according to migratory pattern, and 2 calls (buzz and trill) were most accurately categorized according to habitat type. A variety of evolutionary mechanisms could lead to variation in call structure among taxonomic groups, migratory patterns, or habitat types, including drift (Benedict and Krakaur 2013), indirect selection related to variation in body size or physiology (e.g., Laiolo et al. 2001a), selection via acoustic transmission properties of the habitat (Morton 1975), or social or sexual selection (West-Eberhard 1983). Although many studies exploring geographic variation in vocalizations focus on a single vocal signal, those examining multiple signals have found that different signals may differ in the amount of variation exhibited across geographic ranges (Laiolo et al. 2001a, 2001b, Irwin et al. 2008, Sturge et al. 2016). Varying patterns and degrees of divergence among vocalization types may be the rule, rather than the exception. Because different call types can have different anatomical or production constraints and different behavioral functions, they may also experience different selective pressures.
The 4 call types that did not exhibit any acoustic differences across the geographic range that we examined—the chuck, churr, rattle, and scream calls—share some similar acoustic features. All 4 calls are broadband vocalizations with rapid onset and offset. Alarm and distress calls of passerines such as the chuck, rattle, and scream, especially those that are highly localizable, often draw the attention of both conspecific and heterospecific individuals to investigate the source of the call and, in some cases, mob potential predators or other threats (Stefanski and Falls 1972a, 1972b, Hurd 1996, Chu 2001, Lee et al. 2015). As a result, these call types may be under convergent selection to be broadly recognizable and easy to localize for both conspecifics and heterospecifics. In contrast to these 3 calls, it is less clear from a functional perspective why the churr call did not show evidence of geographic variation. Like the other calls, the churr is a broadband call, though it is lower in amplitude and has a narrower bandwidth than any of the alarm or distress calls. The churr was used in territory-patrol and courtship contexts but was more stereotyped within an individual than other territory-patrol or courtship calls.
The twitter call was most divergent among migratory patterns. Nonmigratory populations produced a slower-paced twitter with a shorter first note in the 2-note delivery than partially migratory populations. Other call features were more similar; for example, there was little variation in frequency of the twitter call between the 2 groups. While slower note delivery may be advantageous in closed habitat types because of increased reverberation, there are no consistent differences in habitat acoustics among the nonmigratory or partially migratory populations (Wiley and Richards 1978). In addition, there are no reported body-size differences among the nonmigratory or partially migratory populations (Pyle 1997, S. A. M. Luttrell personal observation), which suggests that pleiotropic effects of body size on signal structure are also unlikely. The nonmigratory and partially migratory populations of Marsh Wren are parapatric, distributed in a wide latitudinal gradient, and isolated during the breeding season, meeting several key criteria for drift. Despite differences between nonmigratory and partially migratory twitter calls, we found no evidence of differences in the twitter between the 2 nonmigratory subspecies that also meet key criteria for drift, griseus and marianae, which are isolated during both the breeding and nonbreeding seasons, though divergence times between these 2 subspecies are unknown in comparison with the divergence between dissaeptus and the saltmarsh subspecies as a whole. Because freshwater-marsh populations and saltmarsh populations occur over disjunct rather than continuous geographic ranges, it is difficult to apply an isolation-by-distance approach in this instance. More detailed sampling between the freshwater-marsh subspecies dissaeptus and the parapatric saltmarsh subspecies palustris could be used to further test an isolation-by-distance hypothesis in this call type. The lack of divergence in the twitter call among allopatric nonmigratory populations suggests a minor role for drift in this signal but doesn't rule out drift as an important force in this system.
The twitter call is produced primarily by males and is used in territory patrol and courtship, behaviors related to either intrasexual or intersexual communication. Partially migratory populations winter within the breeding range of nonmigratory populations, and it is possible that nonmigrants begin breeding before all partial migrants have left the wintering range. Subspecies of birds are capable of interbreeding, though hybrid migratory patterns have been shown to have direct fitness costs in other species (Helbig 1991, Delmore and Irwin 2014). Given that variation in the twitter is best explained by migratory pattern, it is possible that sexual selection on the twitter call could reinforce selective breeding within individuals that have a similar migratory pattern. Although it is not known whether coastal Marsh Wren populations with different migratory patterns hybridize, the genetically and vocally divergent subspecies of the Great Plains, C. p. illiacus and C. p. laingi, do occasionally hybridize, which suggests that such hybridization across migratory patterns is possible (Kroodsma and Verner 2013). Female-preference tests with twitter calls from birds of different migratory populations are needed to confirm that birds can perceive these differences, and that these differences are meaningful in an analogue for a mate-selection context. Although these tests would not determine whether drift preceded preferences for local signals or whether selection against hybrid migratory patterns drove divergence, they would provide support for the idea that the observed differences are reinforced through sexual selection between birds with different migratory patterns.
Two call types, the buzz and the trill, were most accurately grouped according to habitat type, rather than subspecies or migratory pattern. Freshwater-marsh birds produced buzzes with higher peak frequency, faster pulse rates, and greater amplitude variation between pulses than saltmarsh birds. Freshwater-marsh birds produced trills that were shorter in duration, with higher frequency and slower pace of delivery. Marsh Wrens from freshwater marshes are larger than saltmarsh birds (S. A. M. Luttrell personal observation). However, increased body size is more often correlated with decreased fundamental frequencies rather than increased frequencies (Wallschläger 1980, Ryan and Brenowitz 1985, Martin et al. 2011). Both freshwater marshes and saltmarshes are open habitats over water. Given that the structural components of these habitats are quite similar, direct selection on the frequency elements of calls as a result of habitat differences seems unlikely in these 2 habitats (Wiley and Richards 1978, Cosens and Falls 1984).
Given that neither direct selection of the habitat nor pleiotropic effects of body size are likely to explain differences in the buzz and trill calls between freshwater-marsh and saltmarsh populations, we are left with neutral isolation by distance (drift) and/or direct effects of sexual selection to explain the observed differences. The effect of drift may be enhanced if calls or call components are learned, given that cultural changes may proceed more rapidly than genetic variation over time (Lynch 1996). Examples of drift or isolation by distance are common among species in which geographic variation in calling behavior has been documented, and many of these cases involve or are suspected to involve vocalizations that are learned (Miyasato and Baker 1999, Laiolo et al. 2001a, Wright et al. 2005, Mulard et al. 2009). We do not know whether any acoustic components of the buzz or trill calls involve learning. In the absence of learning, the observed variation in these calls may be indicative of underlying genetic drift (Benedict and Krakauer 2013). As in the prior comparison with the twitter call, however, we found no discernible evidence of drift or variation in the buzz or trill calls between the 2 nonmigratory saltmarsh subspecies, griseus and marianae.
Lastly, whether divergence in the buzz and trill calls was initiated via neutral or selective processes, both calls are used in the context of intrasexual and intersexual communication, and, as a consequence, they are candidates for ongoing sexual selection. Although freshwater-marsh and saltmarsh populations are allopatric during the bulk of the breeding season, saltmarsh populations may begin breeding before freshwater-marsh populations have completely left the wintering grounds (Kroodsma and Verner 2013). If hybrids exist, as they do at the boundary between eastern and western subspecies-complexes (Kroodsma and Verner 2013), sexual selection could facilitate or reinforce divergence in these call types. Playback tests in same-ecotype vs. different-ecotype songs have demonstrated discrimination in freshwater-marsh vs. saltmarsh subspecies of the Swamp Sparrow (Melospiza georgiana; Liu et al. 2008). Similar tests with Marsh Wrens could identify whether the differences we measured are perceptible and meaningful to birds in a territorial context, and confirm whether vocal variation in saltmarsh endemic populations is a more widespread phenomenon in birds.
In addition to geographic variation in acoustic structure in the calls of Marsh Wrens, we found that some call types varied in how commonly they occurred across subspecies and habitat types. The freshwater-marsh, fully migratory subspecies, dissaeptus, produced the twitter and trill–twitter mixed calls rarely compared with all other subspecies and habitat types but produced the buzz call more frequently than saltmarsh populations. Although both the buzz and twitter are used during territory patrol and courtship, the buzz seems to play an integral role in courtship behavior over short distances, whereas the twitter seems to occur primarily in a long-distance context as a broadcast signal similar to song. This preliminary analysis suggests that there could be behavioral differences between freshwater-marsh and saltmarsh populations with respect to courtship or advertising patterns. Although we did not evaluate the breeding density or degree of polygyny among our populations, both density and polygyny vary among populations of Marsh Wrens and could contribute to differences in mate-advertisement and territory-defense patterns (Welter 1935, Kale 1965, Verner 1965, Leonard 1986). For example, if territories are densely distributed in suitable habitat, we might expect higher proportions of the low-amplitude, short-distance buzz call in the repertoire than of the trill or twitter calls, which are broadcast signals. Furthermore, since the buzz call is predominantly used in courtship or during territory intrusions when a female is present, populations with higher levels of polygyny might be expected to have higher proportions of the buzz call. Lastly, because the buzz and twitter seem to play different roles in courtship and territory patrol, differences in production of the buzz and twitter calls could result from differences in breeding stage. We controlled for the effect of date in our analyses, but we did not systematically quantify the breeding stage of individuals. Differences in the occurrence of the buzz and twitter could be related to differences in breeding phenology between saltmarsh and freshwater-marsh Marsh Wrens, or some combination of the effects of breeding density, breeding phenology, and levels of polygyny.
We also found differences in the likelihood of observing a chuck call at the subspecies level, with individuals of 2 Atlantic coast subspecies (griseus and palustris) producing the chuck call more often than individuals of other subspecies. The chuck call is primarily an alarm call. If the number of agonistic encounters or risk of predation is greater in Atlantic coast populations than in other populations, then the propensity to use this call type may be increased. Increased breeding density could also result in increased alarm calling, given that Marsh Wrens commonly destroy conspecific nests and females may aggressively defend nest areas from conspecifics (Picman 1977, Leonard and Picman 1987). The propensity to produce alarm calls, and the composition of alarm calls in response to threats, has been shown to vary among individuals and sex classes in another North American passerine, the Tufted Titmouse (Baeolophus bicolor; Freeberg and Branch 2013), further suggesting that the age or sex composition, or the dominance dynamics of a population as a whole, could influence how common a vocalization is within a population. Freeberg (2012) has demonstrated variation in both the relative proportion and context of components of the chick-a-dee call in Carolina Chickadees (Poecile carolinensis) between 2 populations in central North America, which suggests a functional divergence in these calls despite a lack of variation in the acoustic structure of those components. Future studies on geographic variation in complex calling behavior, coupled with detailed demographic information, could clarify whether geographic differences in how commonly a call is produced, or the context of its production, are more widespread in passerines and perhaps act as additional mechanisms by which vocal divergence can occur.
Calls Embedded in Songs
Eastern Marsh Wrens commonly embed 3 call types—the buzz, trill, and twitter calls—into the song. The incidence of such “call-songs” may be less common in western populations (Verner 1975). Calls can be included or excluded in different renditions of the same song type, and they can be inserted between multiple songs within a single, continuous song bout. When calls are embedded in songs, they follow a nonrandom organizational pattern. Twitter calls typically precede introductory notes, the buzz call is often embedded among introductory notes, and the trill call usually follows the terminal notes of the song. Each of these call types is regularly produced independent of song as well, and all 3 calls appear to function primarily in territory patrol and courtship. Although the proportion of total songs produced that included one or more call elements was high in all populations, the freshwater-marsh, fully migratory subspecies, dissaeptus, consistently had the highest proportion of songs with embedded call elements. Systematic differences in breeding density or breeding phenology between freshwater-marsh and saltmarsh populations of Marsh Wren could contribute to our observed differences. For example, if call elements are directed at different individuals than song elements, or function as local vs. broadcast signals, increased breeding density may result in an increase in the use of call-songs. Welter (1935) suggested that males reduce the incidence of buzz calls produced in conjunction with the song as the breeding season progresses. Although we did not observe an effect of date on the occurrence of songs with embedded calls, Welter's observation could be a consequence of a difference in the likelihood of call-songs at different stages in the breeding cycle. Thus, a greater proportion of embedded calls in the freshwater populations could be indicative of differences in breeding phenology. We did not quantify breeding density or breeding stage in our analysis. Alternatively, embedded calls may function to increase the diversity of the already complex song repertoires of eastern Marsh Wrens. Prior work on Marsh Wren repertoires has demonstrated that all eastern subspecies have similar repertoire sizes regardless of habitat type or migratory pattern, in particular when only the main portion of the song is considered (Kroodsma and Verner 1987, Luttrell et al. 2016). If embedded calls serve to increase song repertoire sizes in fully migratory, freshwater-marsh populations, then this would support the hypothesis that migratory populations of birds should have larger song repertoire sizes due to increased intensity of sexual selection pressures during shorter breeding seasons (Read and Weary 1992).
The use of call notes embedded in songs may be common in other species, but if so it is underreported. In most cases where this behavior has been reported, the calls of heterospecifics are included in a small proportion of overall songs produced and are hypothesized to increase the repertoire size of the singer (Howard 1974, Hindmarsh 1984, Greenlaw et al. 1998). Although it is not a perfectly analogous case, we could find only one other reported bird species, the Zebra Finch (Taeniopygia guttata), that consistently embeds conspecific call types between song syllables (e.g., Zann 1993). The function of call syllables within Zebra Finch songs is unknown. As in Marsh Wrens, the calls appear in a nonrandom order within the song; but unlike in Marsh Wrens, call notes in Zebra Finch song represent a crystallized song motif rather than a syllable that may be flexibly added or removed. More detailed observations of when and how calls are embedded in bird songs generally, and in Marsh Wren songs specifically, or the response of Marsh Wrens to songs with and without calls, are needed to determine their function in this context.
We describe the call repertoire of Marsh Wrens and report evidence of geographic variation in the acoustic structure of 3 of 7 call types, as well as geographic and taxonomic variation in how commonly 4 call types are produced. Marsh Wrens are semicolonial, gregarious birds that have complex vocal and social behavior. While it may not be surprising that calls show varying levels of divergence in this species, it is nonetheless interesting that the signals demonstrating the greatest degree of divergence are those most likely to be under sexual selection pressures based on the behavioral context of the calls. The effects of drift cannot be explicitly ruled out in these circumstances, but if drift is playing a role in the divergence of calls involved in intrasexual and intersexual communication, we expect that sexual selection may reinforce or strengthen the effects of neutral divergence when birds from different populations are sympatric. The variation that we observed is not well explained by selective effects of the habitat for sound transmission or by pleiotropic effects on acoustics due to body size. Because within-species divergence is likely recent and ongoing, focusing on within-species rather than between-species variation in behavior is useful for inferring the mode and tempo of divergence in populations. In particular, such a focus avoids the confounding effects of the accumulation of additional differences over long time spans (Foster 1999). Extensive work on song dialects within species of birds has provided us with ample evidence of geographic divergence in vocal signals (reviewed in Podos and Warren 2007), in particular as a consequence of cultural evolution. However, the vocal repertoires of birds are typically complex, and song is only one component of the overall repertoire (Marler 2004a). Geographic variation in calls has been demonstrated in many species, but in such studies often only a single call type is examined (e.g., Rothstein and Fleischer 1987, Miyasato and Baker 1999, Wright et al. 2005, Nicholls et al. 2006, Snowberg and Benkman 2007; but see Laiolo et al. 2001a, 2001b). By comparing multiple vocal signals across a wide geographic range, we can help build a more complete picture of the overall processes of selection and divergence involved in speciation.
This work was completed under funding to S.A.M.L. from the Maryland Ornithological Society, the Carolina Bird Club, the Washington Biologists' Field Club, and the University of Maryland Baltimore County Graduate Student Association, and with the support of the University of Maryland Baltimore County Biological Sciences Department. We are especially grateful to C. Power, C. Redmon, and N. Sotnychuk for their assistance in the field gathering recordings, and to S. Perry for assistance in preliminary recording analysis. This work would not have been possible without the permission and assistance of staff at the 19 local, state, and federal wildlife preserves at which we collected the data, many of whom went above and beyond to make this research successful—thank you!
Ethics statement: The research was conducted in compliance with the Institutional Animal Care and Use Committee of the University of Maryland Baltimore County (IACUC project no. BL01771417) and in accordance with the AAALAC Guidelines to the Use of Wild Birds in Research.
Author contributions: S.A.M.L. conceived the idea for the study and collected and analyzed the data. S.A.M.L. and B.L. developed and designed methods and wrote and edited the manuscript. B.L. contributed substantial resources.
APPENDIX TABLE 4.
Repeated-measures ANOVA (twitter: df = 2 and 72; trill: df = 2 and 130) demonstrated that several measures of the Marsh Wren twitter call vary significantly throughout the duration of the call after correcting for multiple testing (Benjamini-Hochberg false discovery rate, α = 0.05). Variables are ranked from lowest to highest P value; values in bold are significant.
APPENDIX TABLE 5.
Comparison of acoustic measures for trill vs. twitter calls. Thirteen of 16 pairwise tests (Welch's t-test or Wilcoxon signed-rank test) showed significant differences after correcting for multiple testing (Benjamini-Hochberg false discovery rate, α = 0.05). Variables ranked from lowest to highest P value; values in bold are significant.
APPENDIX TABLE 6.
Percent correct classification at the subspecies level using linear discriminant analysis (LDA).
APPENDIX TABLE 7.
Percent correct classification at the migratory-pattern level (full migrants, partial migrants, and nonmigrants) using linear discriminant analysis (LDA).
APPENDIX TABLE 8.
Percent correct classification at the habitat-type level using linear discriminant analysis (LDA). Sample-size differences make the percent correct classification data difficult to interpret. In place of overall correct classification, we show the percentage of freshwater-marsh birds that fall outside a 95% confidence interval (CI) for bootstrapped saltmarsh-bird linear discriminant values. Twitter could not be analyzed at this level (see text).