Molecular markers can be used to infer the demographic history of a given species, but many historic processes simultaneously impact multiple species. Thus, comparative historical demography has the potential to provide insight into drivers of evolution. In this study, we used nuclear DNA (nDNA) sequences to corroborate (or refute) demographic inferences based on earlier mitochondrial DNA (mtDNA) data from 16 species of Hispaniola birds. Our previously published analysis suggested population expansion in five of six migratory species (following glacial retreat in North America), with less evidence of expansion in non-migratory species. Additional molecular markers should reduce locus-specific bias, and so we generated sequence data for several nuclear loci. Test statistics associated with the nDNA provided only equivocal evidence for population expansion in 10 of the 16 species. Discordance between mtDNA and nDNA is not uncommon because the two genomes are exposed to different selective pressures and have different effective population sizes and modes of inheritance. The nDNA analyses reported here cast some doubt on our earlier mtDNA inferences. They also suggest that the signal to noise ratio of demographic statistics is typically low because of the inherent variability in selective regimes and coalescence across loci.
The island of Hispaniola harbors more biodiversity than any other island in the West Indies, and the evolution of its fauna is of considerable interest because of the complex geologic and climatic history of the Antillean archipelago (Donnelly 1989, Perfit & Williams 1989, Higuera-Gundy et al. 1999, Iturralde-Vinent & MacPhee 1999). Hispaniola consists of two paleo-islands (North and South), which are still partially sundered by a former marine channel that acts as a distribution barrier for some species (Mann et al. 1991, Townsend et al. 2007). The island as a whole contains three parallel mountain ranges separated by deep valleys, and it is associated with ten nearby offshore islands. Its topology varies dramatically, resulting in an ecologically heterogeneous landscape that includes flooded grasslands, savannahs, wet forests, dry forests, and row-crop agriculture (Latta et al. 2006). During glacial periods, the climate was cooler and drier than at present (Bonatti & Gartner 1973, Curtis et al. 2001), which presumably increased the extent of suitable habitat for montane species and restricted the extent of tropical forest species (Higuera-Gundy et al. 1999). The varied topography and associated vegetation no doubt contribute to the considerable biodiversity found on the island.
Geographic barriers to dispersal on Hispaniola are reflected in the gene pools among the most vagile vertebrates, such as the birds. Molecular studies have revealed pronounced genetic structure of some resident species across the island, and coalescent approaches indicate that divergence times between isolated populations differ dramatically among species (roughly 0.5–10 MYA, Sly et al. 2010, 2011). These and other studies suggest a complex evolutionary history of Hispaniola's avifauna.
One factor contributing to the complex evolutionary histories of Hispaniolan birds may be the migratory tendencies and associated population dynamics of many species. The island is close enough to continental North America to support large numbers of wintering migrants, yet isolated enough to have evolved endemic species (Ricklefs & Bermingham 2008). Some of the bird species that live year-round and breed on Hispaniola are endemic to the island, whereas other non-migratory species have ranges that extend beyond Hispaniola (we refer to these as “resident” in contrast to “endemic” species). The migratory species that overwinter on the island but breed in the United States and Canada should have experienced population expansion following the Last Glacial Maximum (LGM) due to the increased breeding area associated with the retreat of the ice sheets, unless populations of these species are limited in their wintering ranges. In contrast to migrant species, we expect non-migrants to exhibit little evidence of population expansion because the melting of ice sheets at the end of the Pleistocene caused a concomitant rise in sea level of ∼120 meters that reduced somewhat the area of the island as a whole (Bonatti & Gartner 1973, Fleming et al. 1998, Higuera-Gundy et al. 1999, Curtis et al. 2001, Clark & Mix 2002).
Contemporary gene pools harbor signals of historic population demography including longterm equilibrium, growth, and population decline (Harpending et al. 1998). We recently used mitochondrial DNA (mtDNA) sequences to infer the demographic history of 16 avian species sampled on Hispaniola. The mtDNA data suggested that migratory species experienced more pronounced population growth in their recent evolutionary history than did non-migratory species, perhaps because continental breeding habitat for migratory birds increased as glaciers retreated, whereas changes to the insular breeding habitat of non-migratory birds occurred on a more modest scale, and rising sea levels reduced the total land area (Fahey et al. 2012).
These inferences relied solely on mtDNA, a common molecular marker of choice because of its inherent nucleotide variability, conserved genomic organization, uniparental mode of inheritance, and haploid condition. The latter two attributes account for the 4-fold smaller effective population size (Ne) of mtDNA relative to nuclear genes, of which each individual has two copies, one inherited from each parent. Thus, mtDNA lineages are expected to undergo lineage sorting more rapidly than nuclear genes (Avise 2000, Zink & Barrowclough 2008).
Such lineage sorting may be stochastic (as expected for genes behaving in a neutral manner), deterministic (as expected for genes under strong selection), or some combination thereof. Multiple (independent) genetic markers are likely to produce more robust demographic inferences and help to overcome the idiosyncratic histories of individual loci (Hare 2001, Dupuis et al. 2012).
Beyond multiple loci, studies of multiple taxa often reveal more robust insights than single-taxon studies, in part because similar patterns across species imply common evolutionary processes (e.g. vicariance, Knowlton et al. 1993). For example, phylogeographic patterns of individual species can be informative, but data from multiple sympatric species often provide important context for the evolutionary process (e.g. geologic or climatic events as drivers, Avise 2000). In terms of demography, congruent histories among taxa suggest that common environmental factors may be responsible (Bos et al. 2008, Qu et al. 2010). In contrast, disparate histories among sympatric taxa could point to species-specific demographic drivers (e.g. novel predators or pathogens, Ricklefs 2010). Herein, we reevaluate Fahey et al. (2012) mtDNA findings of better support for population expansion in migratory birds than in resident birds in light of new data from six nuclear introns.
Material and Methods
Field sampling and laboratory sequencing techniques
Avian blood samples were collected via venipuncture from the Aceitillar sector of the Dominican Republic's Sierra de Bahoruco National Park, under permits issued by the government; a description of the field work can be found in Latta & Ricklefs (2010). Ten non-migratory species of birds were used in these analyses, including six resident species ‒ Coereba flaveola (CFA), Columbina passerina (CPA), Elaenia fallax (EFA), Loxigilla violacea (LVI), Myiarchus stolidus (MST), Turdus plumbeus (TPL) ‒ and four endemic species ‒ Microligea palustris (MPA), Phaenicophilus palmarum (PPA), Spindalis dominicensis (SDO), and Todus subulatus (TSU). Six migratory species, all warblers (Parulidae), were also included in the analysis: Setophaga caerulescens (SCR), Setophaga discolor (SDI), Setophaga palmarum (SPA), Setophaga tigrina (STI), Mniotilta varia (MVA), and Seiurus aurocapilla (SAU). Species descriptions are provided in Latta et al. (2006). Note the genus Setophaga was still referred to as Dendroica in Fahey et al. (2012), and thus DCR = SCR, DDI = SDI, DPA = SPA, and DTI = STI.
Details of our DNA extraction procedures can be found in Fahey et al. (2012). We used avian PCR primers described in Kimball et al. (2009) to amplify nuclear gene fragments. Primer pairs were tested in a subset of individuals from each of the 16 species, and those that produced amplicons in all 16 species were further evaluated by dideoxy sequencing. Of the > 20 nuclear loci evaluated, six amplified well in most of our study species and harbored sequence variants; these six markers were then amplified and sequenced in a larger subset of individuals.
The nuclear loci used in this study included hmgn2, irf2, mb, nat15, pcbd, and rho. None of these loci are syntenic in the chicken genome and, because of the conserved chromosome structure in bird genomes, they presumably are unlinked (Derjusheva et al. 2004, Griffin et al. 2007). PCRs were conducted in 20μl total volumes using ∼40 ng template DNA, 1U NEB Taq polymerase, 0.3 μM of each primer, MgCl2 (as listed in Table 1), 10 mM Tris-HCl, 50 mM KCl, 0.5 mg/ml BSA, and 0.2 mM of each dNTP. PCRs were performed in Eppendorf MasterCyclers under the following conditions: 94 °C for two minutes; 31 cycles of 94 °C for 30 seconds, primer specific annealing temperature (Table 1) for 30 seconds, and 72 °C for 30 seconds; and concluding with a five minute extension at 72 °C. Touchdown annealing protocols were used to amplify pcbd in PPA and MVA (30 cycles at 68°, decreasing by 0.5° each cycle, then 58° for 30 more cycles) and mb in MVA individuals (30 cycles at 60°, decreasing by 0.5° each cycle, then 48° for 30 more cycles). For pcbd amplifications in some species, two sets of primers (internal and external) were used to generate better coverage of this long sequence. A low sodium acetate precipitation was used for PCR purification prior to sequencing. Amplicons were checked for size via gel electrophoresis and subsequently quantified using a Nanodrop8000 (Thermo Scientific) spectrophotometer. Dideoxy sequencing reactions employed BigDye version 3.1 chemistry and a Prism 3730XL sequencer (Applied Biosystems, Foster City, CA, U.S.A.) using ∼50–100 ng of PCR template. Sequencher 4.7 (Gene Codes Corp., Ann Arbor, MI) was used for alignment of sequences into contigs. All candidate single nucleotide polymorphisms (SNPs) were carefully inspected by eye; when necessary, poor quality sequence reads were discarded and individuals were resequenced. Gametic phase was addressed computationally with PHASE (version 2.1, Stephens et al. 2001, Stephens & Scheet 2005).
We used PHASE to assign haplotypes to individual birds because this algorithm is generally robust even when underlying assumptions are violated (e.g. selective neutrality, Bos et al. 2008). We first used a probability threshold of 0.60 to reduce the number of unresolved haplotypes with a minimal increase in false positives (Harrigan et al. 2008, Garrick et al. 2010). For PHASE runs where site probabilities were less than 0.60 (excluding polymorphic sites unique to an individual), then cloning techniques were used to help resolve gametic phase by priming the PHASE algorithm. In those cases, a Promega T/A cloning kit (pGEM-T Easy Vector System II) was used and amplicons were sequenced directly from individual colonies; these cloned sequences were then used as “known” haplotypes in subsequent PHASE analyses. When PHASE probabilities remained less than 0.60, we then removed individual polymorphic sites for all individuals of that species. This approach is imperfect because overall haplotype diversity is reduced (i.e. some information is discarded), but culling sites is preferable to culling unresolved individuals, as the latter approach more egregiously reduces overall phylogenetic diversity and biases population genetic parameters (Garrick et al. 2010).
Because cloning and traditional dideoxy sequencing are expensive and time consuming, we also used second-generation sequencing techniques to identify haplotypes that could prime the PHASE algorithm. We used the Nextera XT DNA sample preparation kit (Illumina) to simultaneously fragment and tag the six nuclear amplicons for subsequent sequencing using the Illumina MiSeq platform. Individual samples from each species were assigned a unique index/tag and six amplicons (one for each gene) were pooled for each species. The Nextera XT kit provides 24 tags, so we used the remaining eight tags to help further resolve highly polymorphic genes/individuals by again combining amplicons, but also having only one gene per species per tag (Table 2). We were able to tag multiple individuals with one index tag because we only had one representative for each of the six genes, and the genes could easily be sorted out computationally after the Illumina run (i.e. reads from the same amplicons were assembled into the same contigs).
Illumina sequences were trimmed with fastx_clipper to remove adapters and clip poor quality bases. Filtered reads were sorted by index tags into BAM files and viewed using the program Integrative Genomic Viewer (IGV, Robinson et al. 2011). Gametic phase of polymorphic sites was determined using a sliding window of ∼100 nt because this corresponds to the MiSeq read length following processing (i.e. trimming primers and tags). Sequences from Illumina contigs were used to identify haplotypes in corroboration with the Sanger sequencing efforts and PHASE analyses. If polymorphic sites in a haplotype were separated by more than 100 nt, then only partial haplotype fragments could be determined. In such cases, we used the fragment with either the greater number of polymorphic sites or the more informative polymorphic sites for assigning “known” haplotypes in PHASE.
Analyses of historical demography
We used Arlequin 3.1 (Excoffier et al. 2005) to compute standard population genetic statistics, including nucleotide diversity (average number of nucleotide differences between two sequences in a population, π), haplotype diversity (measure of the distinctiveness of a haplotype in a population, hd), mean number of pairwise differences (MPD), number of haplotypes (h), and number of polymorphic sites (s). We also used Arlequin to quantify characteristics of haplotype mismatch distributions for each locus and species individually. This includes the raggedness index (r), which quantifies the smoothness of the mismatch distributions (Harpending et al. 1993). Low values (< 0.05) of r indicate the mismatch distribution is smooth and thus does not differ significantly from the null model of population expansion.
To test the hypothesis that migratory bird populations have recently expanded to a greater extent than nonmigratory populations, we employed Arlequin and DnaSP (version 5.1, Rozas et al. 2003) to estimate neutrality test statistics. Tajima's D (Tajima 1989) and Fu's F (Fu 1997) were calculated in Arlequin. Fu's F uses information from the distribution of haplotypes to detect population growth, whereas Tajima's D uses the frequency of segregating sites to test for deviations from neutrality. DnaSP was used to determine Fu and Li's D* and F* (Fu & Li 1993) statistics, which are similar to Tajima's D in that they use the frequency of nucleotide substitutions to test for deviations from neutrality. Significantly negative values (P < 0.05) of Tajima's D, Fu's F, and Fu and Li's D* and F* indicate population expansion or positive selection. Fu's F has been found to be the most powerful test statistic for detecting population growth and selection (Fu 1997, Ramos-Onsins & Rozas 2002, Ramírez-Soriano et al. 2008) except in the case of recombination, Fu and Li's D* and F* are the most powerful at detecting background selection (Fu 1997). We used general linear models (SAS GLM procedure) to estimate the effects on r, D, and F statistics of population status (endemic, resident, migrant), genome (nuclear versus mitochondrial), and individual genes.
The six nuclear loci used for this study with the PCR amplification details and references. See Kimball et al. (2009) for more details.
Illumina sequencing templates and the number of reads per tag (after trimming). Individuals in bolded italics (n = 3) had read depths < 100×, all others (n = 129) had read depths > 100×. CFA = Coereba flaveola, CPA = Columbina passerina, SCR = Setophaga caerulescens, SDI = Setophaga discolor, SPA = Setophaga palmarum, STI = Setophaga tigrina, EFA = Elaenia fallax, LVI = Loxigilla violacea, MPA = Microligea palustris, MVA = Mniotilta varia, MST = Myiarchus stolidus, PPA = Phaenicophilus palmarum, SAU = Seiurus aurocapilla, SDO = Spindalis dominicensis, TPL = Turdus plumbeus, and TSU = Todus subulatus.
We sequenced a mean of 18.3 (range 4–32) individuals per species (n = 16) across a mean of 5.4 nuclear genes (Table 3). We successfully cloned and sequenced 47 amplicons using conventional (dideoxy) approaches. We also sequenced 132 amplicons using the Illumina MiSeq (Table 2); the mean number of trimmed Illumina reads for each tag/sample was 265, 116 (range 164, 562–612, 302). For most of the genes, the mean coverage exceeded 1000× and 98 % of the amplicons (129 of 132) yielded > 100× coverage. This depth of coverage helped us decipher the gametic phase of complex haplotypes. The mean and median PHASE probabilities for each gene and species are listed in Table 4.
Genetic diversity and neutrality test statistics. C = category of species [ R = regional residents ( species that breed on the island but are found on other nearby islands), E = endemics ( endemic to Hispaniola), and M = migrants (breed in continental North America)], length of the sequence (nt), sample size (n), the number of haplotypes (h), and the number of polymorphic sites (s). Genetic diversity and neutrality statistics for each species: π = nucleotide diversity, hd = haplotype diversity, MPD = mean number of pairwise differences between haplotypes, r = raggedness index. Italicized and bolded values are indicative of population expansion. CFA = Coereba flaveola, CPA = Columbina passerina, SCR = Setophaga caerulescens, SDI = Setophaga discolor, SPA = Setophaga palmarum, STI = Setophaga tigrina, EFA = Elaenia fallax, LVI = Loxigilla violacea, MPA = Microligea palustris, MST = Myiarchus stolidus, MVA = Mniotilta varia, PPA = Phaenicophilus palmarum, SAU = Seiurus aurocapilla, SDO = Spindalis dominicensis, TPL = Turdus plumbeus, and TSU = Todus subulatus. The ND2 mitochondrial data are from Fahey et al. (2012). * = p < 0.05; ** = p < 0.01.
Across species, the intron hmgn2 (mean length of 347 nt) had a mean of 15.4 haplotypes and 17.4 polymorphic sites; irf2 (540 nt) averaged 12.2 haplotypes and 15.5 polymorphic sites; mb (689 nt) had the smallest number of haplotypes (7.4) and polymorphic sites (6.5); nat15 (528 nt) had a mean of 14.1 haplotypes and 17.5 polymorphic sites; pcbd (538 nt) had the highest mean number of haplotypes (15.7) and polymorphic sites (19.5); and finally, rho (434 nt) had a mean of 9.7 haplotypes and 9.6 polymorphic sites. The mean haplotype diversity for each gene across the 16 species ranged from 0.533–0.909, with mb having the lowest value and hmgn2 the highest. The average nucleotide diversity for each gene over all species ranged from 0.0016–0.013, again with mb having the lowest value and hmgn2 as the highest (Table 3).
Nearly half (41 of 87) of the raggedness index (r) values for each species and gene were < 0.05, consistent with population expansion (Table 3). Nearly all F values (83 of 87) were negative, and 57 of 87 (66 %) were significantly so. Most Tajima's D values were also negative (78 out of 87), but only 16 of 87 values (18.3 %) were significantly negative and thus consistent with population expansion. As in Fahey et al. (2012), Fu and Li's D* and F* were highly correlated and thus for simplicity we refer to them as a unit (D*F*). We computed D*F* for only 74 genes instead of 87 because at least four haplotypes are required for this assessment. No D*F* values were significantly negative, and 12 out of 74 (16.2 %) had positive values (though none significantly positive). This was surprising, as Fahey et al. (2012) found that all mtDNA values of Tajima's D and D*F* were negative and approximately half were significantly negative. When comparing nDNA values (averaged across all loci) against the mtDNA results for any given test statistic, 87.5 % of the mtDNA statistics were less than (more negative) than the nDNA averages (Fig. 1A–C). Lower figures in these test statistics are more likely to be significant, thus with the mtDNA statistics lower than the average nDNA statistics it is not surprising that we did not see as many significant values for the nDNA test statistics.
The mean and median PHASE reconstruction probabilities for each gene and species. Bolded entries have four sites with probabilities < 0.60, italicized entries are those with 1–3 sites with probabilities < 0.60 and a mean or median not above 0.90.
Our mtDNA dataset suggested that migratory species in our sample exhibited more pronounced population expansion than did resident or endemic species (Fahey et al. 2012). Mismatch distributions based on nuclear introns revealed some evidence of population expansion in three resident (LVI, MST, and TPL), two endemic (MPA and SDO), and two migrant (SPA, and MVA) species. Each of these species had 4–6 independent r values < 0.05, but the differences between species in each of the population status categories were not significant. That said, mismatch distribution statistics are not especially powerful (Ramos-Onsins & Rozas 2002) and thus we concentrate our discussion on the neutrality statistics. For any given nDNA locus, no species had significantly negative values of all neutrality test statistics (i.e. Tajima's D, Fu's F, Fu and Li's D*, and Fu and Li's F*). As previously indicated, none of the nuclear genes had significantly negative values of D*F*. Furthermore, no species had more than three significantly negative Tajima's D values, and only four species (SPA, STI, PPA, and TSU) had two or three significantly negative Tajima's D values. Significantly negative Fu's F values at four or more genes were observed in three resident (CFA, LVI, and TPL), three endemic (PPA, SDO, and TSU), and four migrant (SPA, STI, MVA, and SAU) species. Thus 10 out of 16 species (with equal representation in migrant and non-migrants) showed evidence of population expansion in most of their Fu's F statistics, one of the most powerful tests for detecting population expansion (Fu 1997, Ramos-Onsins & Rozas 2002, Ramírez-Soriano et al. 2008).
General linear model results for differences between genomes, genes, and population status with respect to several indicators of historic demographic change. Significant F statistics and Least Squares Means (LSMeans) are indicated in bold type. Superscript letters indicate significant differences between mean values according to Duncan's multiple range test.
If we compare these nuclear data to our published mtDNA data, we see concordant mtDNA and nDNA evidence for historical population expansion in one resident (CFA), three endemics (PPA, SDO, and TSU), and four migrants (SPA, STI, MVA and SAU). In five cases, however, there were discordances between the nDNA and mtDNA results. Nuclear genes of CPA, SCR, and SDI showed little sign of population growth and thus do not support the mtDNA data. Similarly, LVI and TPL also had discordances among nuclear and mitochondrial genomes: the nuclear intron sequences generally supported population expansion, yet there was no such evidence in the mtDNA sequences. Using mtDNA and nDNA, a greater proportion of migratory species (4/6) exhibited evidence of population expansion than the non-migratory species (4/10), but the difference was not significant (G-test: P = 0.3).
The general linear model revealed different patterns of variation for each of the indices of demographic change (Table 5). The raggedness index (r) differed significantly among genes, but did not differ between genomes or with respect to population status (resident, endemic, migratory). The remaining statistics (D, Fu's F, and Fu and Li's D* and F*) differed approximately two-fold between the nuclear and mitochondrial genomes, but they did not differ with respect to population status or among nuclear genes within genome. Thus, although some species show genetic evidence of recent population expansion, this cannot be tied to commonalities in distribution or migratory status.
The distribution of pairwise nucleotide differences between gene sequences can highlight the evolutionary demography of a species, revealing population growth trends (Rogers & Harpending 1992). Fluctuations in population size reflect the idiosyncratic life-history characteristics of a given population, but concordant demographic fluctuations among species could also reflect long-term trends in environmental conditions. Many studies have considered the historical demography of a single species with a single marker (usually mtDNA, Merilä et al. 1997, Zink et al. 2006, Hawley et al. 2008, Lerner et al. 2009, Norgate et al. 2009, Brace et al. 2012). More recently, authors have used multiple markers to study individual species (Firestone et al. 1999, Tchaicka et al. 2007, Bos et al. 2008, Reding et al. 2010, Nietlisbach et al. 2012). With technological advances in sequencing, more studies in the future will be able to analyze more than one marker to study multiple species (Qu et al. 2010, Sonsthagen et al. 2012), providing the most comprehensive reconstruction of evolutionary demography.
We investigated the comparative historical demography of Hispanolian birds (n = 16 species). We surveyed molecular variation at six nuclear genes to determine whether nDNA would corroborate our mtDNA results (Fahey et al. 2012). Overall, the average haplotype diversities of mtDNA and nDNA were similar (mtDNA hd = 0.724, nDNA hd = 0.755) but average nucleotide diversity was lower for mtDNA (mtDNA π = 0.0026, nDNA π = 0.0067). Although haplotype diversity for nDNA and mtDNA might be similarly impacted by population bottlenecks, nucleotide diversity might be lower for mtDNA because contemporary nDNA haplotypes represent older allelic lineages with deeper coalescences that have had more time to accumulate nucleotide substitutions. We note that the three-fold difference in nucleotide diversity between mtDNA and nDNA observed in our empirical data is reasonably consistent with theoretical expectations which predict a 4-fold difference in Ne between mtDNA and nDNA (Hare 2001).
Haplotype and nucleotide diversities did not differ between migrants, regional residents, and endemics (though endemics did have the lowest values in the mtDNA data). We did not find concordance between nDNA and mtDNA for each species, and we found relatively little nDNA support for recent population expansion in migratory species. Instead, (nonsignificant) positive values of Tajima's D and Fu and Li's D* and F* were scattered among 14 of the 16 species in our matrix of species/loci/test statistics, which could point towards stable population sizes. All other neutrality test statistics were negative, but few (∼20 %) nDNA values were statistically significant. In contrast, all of the mtDNA neutrality test statistics were negative and > 50 % were significantly negative. Fu's F is generally considered the most powerful such statistic, so we identified species where the majority of the Fu's F statistics were significantly negative across nDNA loci. Using this approach, values for three residents (CFA, LVI, and TPL), three endemics (PPA, SDO, and TSU), and four migrants (SPA, STI, MVA, and SAU) were consistent with population expansion. Eight of these ten species (CFA, SPA, STI, MVA, PPA, SAU, SDO, and TSU) exhibited significantly negative neutrality test statistics in the mtDNA, but these include both migratory and nonmigratory species in similar proportions.
Disparities among mtDNA and nDNA within a species are not surprising as nuclear markers are often idiosyncratic due to selection or drift (Toews & Brelsford 2012). For example, selective sweeps of beneficial mutations will carry nearby linked loci to fixation (Galtier et al. 2009, Karl et al. 2012). This reduces genetic variation and causes demographic inferences to be based upon the last selective sweep rather than the most common recent ancestor. Fu and Li's D*F* statistics are more powerful at detecting background selection than Tajima's D (Fu 1997), and thus significant values of D*F* in conjunction with non-significant values of D suggests selection at a given locus. None of the genes in our datasets exhibited this pattern, and thus we assume that background selection has had relatively little impact on the standing levels of genetic variation at these loci.
Selection aside, the mtDNA data generally supported expanding migratory populations whereas the nDNA data did not. We see three possible explanations for this incongruity: technical artifacts, incomplete lineage sorting, and statistical issues. One possible source of technical noise is haplotype inference. Our nuclear sequences were highly polymorphic, but such variability is a double-edged sword in that it provides increased power for inferring past selection and population change, but makes haplotype phase inference more difficult. We inferred phase computationally from direct sequencing of PCR products, from TA cloning, and from Illumina sequencing. We used PHASE probability thresholds to trim ambiguous regions; this eliminates some polymorphic sites, but it is preferable to removing the unresolved haplotype because removal can bias population genetic statistics. For example, removing unresolved haplotypes can result in overestimation of neutrality test statistics (because this removes rare alleles and the excess of rare alleles leads to more negative values, which is indicative of population expansion) or underestimation of Θw and π (Garrick et al. 2010). Trimming can also lead to overestimation of neutrality test statistics if the deleted polymorphism(s) is from a novel haplotype and incorrect haplotype reconstructions could cause overestimation of neutrality test statistics. Thus it is possible that the relatively few nDNA neutrality test statistics with significantly negative values could be an artifact of genotype reconstruction. Haplotype reconstruction was more difficult for some genes (pcbd, nat15, and hmgn2), but the number of significantly negative Fu's F values did not differ in these genes compared to the other three genes. Therefore, there was no obvious difference between genes that we might expect to have been biased due to reconstruction errors and genes that were more confidently deduced.
The lack of a clear signal in our data might also reflect biological factors such as incomplete lineage sorting and the longer coalescence times associated with diploid nDNA when compared to mtDNA genes (Hare & Avise 1996). The maternal inheritance and haploid nature of mtDNA means that its effective population size is ¼ that of nDNA, which means that coalescence of mitochondrial markers requires ¼ of the time for nDNA; thus, mtDNA is more informative of the recent past (Moore 1995, Zink & Barrowclough 2008). The 4× larger effective population size of the nDNA makes it more informative in the distant past, and nDNA gene genealogies are expected to be unresolved over intermediate and very recent timescales (Zink & Barrowclough 2008). Thus, the presumptive population expansions of migratory species that have occurred since the last glacial maximum may be so recent that insufficient time has elapsed for coalescence of nDNA genes. The average haplotype diversities for the mitochondrial and nuclear data were similar and moderately high, and therefore did not indicate recent bottlenecks (average overall hd: mtDNA = 0.72, nDNA = 0.75).
Finally, our results could also reflect the inherent noise associated with neutrality test statistics. This may be a plausible explanation, as other studies have also reported inconsistencies between nDNA and mtDNA data in terms of neutrality test statistics (Bensch et al. 2006, Eytan & Hellberg 2010, D'Horta et al. 2011, Lim & Sheldon 2011). If commonly employed demographic metrics are sound in the face of complex evolutionary realities, we would expect that a large enough suite of markers should overcome the inherent statistical noise and eventually converge. The six nuclear markers we evaluated no doubt paint an incomplete picture of the whole genome, and underscore the idea that the most complete studies of demographic history will evaluate many more markers. Fortunately, advances in genome sequencing and associated technologies suggest the number of markers will soon cease to be a significant limitation.
The aim of this study was to corroborate inferences from our mtDNA analysis with data from independent nuclear genes, in part to distinguish genome-wide effects (e.g. genetic drift) from selection on individual genes/genomes. Our data reveal very limited support for our hypothesis that migratory species would exhibit more pronounced indicators of population growth, and the nDNA data were more equivocal than the mtDNA data. The disparities among genomes/ loci may reflect systematic (technical), stochastic (statistical), or biological (incomplete lineage sorting) noise. Ultimately, the disparities between mtDNA and nDNA probably reflect a) the deeper coalescence of nDNA and its inability to accurately capture recent demographic history and b) the noise inherent to neutrality test statistics. More empirical data and/or more extensive computer simulations are required to distinguish among these scenarios.
Samples were collected in the field by Steve Latta at the National Aviary, funded by NSF Grants DEB-0089226 and DEB-0542390 to RER. Permission to work in the Sierra de Bahoruco was provided by the Dirección Nacional de Parques and the Departamento de Vida Silvestre, República Dominicana. Field assistance in the Dominican Republic was provided by Danilo Mejía, Vinicio Mejía, Mamonides Heredia, Miguel Angel Landestoy, Marie Abbott, and Elvis Cuevas Mendoza. Chris Rimmer and Kent McFarland of the Vermont Center for Ecostudies provided blood samples from montane broadleaf forest sites in Hispaniola. We also thank B. Dunning, K. Nichols, and members of the DeWoody lab group for their insights and comments on the paper. Finally, we thank the Provost's Office at Purdue University for funding through the University Faculty Scholar program (JAD) and the Curators of the University of Missouri and the Alexander von Humboldt Foundation for financial support (RER).