We present the results of a study that implements a recently developed phylogenetic algorithm that combines fixed-states nucleotide optimization with breakpoint analysis to identify and examine the evolution of a mitochondrial intergenic spacer between the tRNAVal and 16S rRNA loci in a clade of fairy basslets (Serranidae: Anthiinae). The results of the analysis indicate that this spacer evolved once and that it may be increasing in size through evolutionary time. The resulting molecular hypothesis corroborates much of the previous morphological phylogenetic work.
INTRODUCTION
During recent investigations focusing on the relationships of “mail-cheeked” fishes and sea basses (Smith and Wheeler, 2004, 2006; Smith and Craig, 2007), we discovered a mitochondrial intergenic spacer in a clade of anthiine sea basses. After surveying the analytical literature, we were concerned about the subjectivity of the methods that have been previously used to identify and analyze novel genetic elements and genomic rearrangement data. Therefore, we have used a recently developed genomic algorithm (Wheeler, 2007) to examine the evolution of this anthiine intergenic spacer and the relationships within anthiine serranids. We then discuss how this method can be expanded beyond this comparatively simple example to simultaneously analyze genomic and sequence data in a phylogenetic framework. During the last 20 years, evolutionary biologists have made tremendous advances by combining innovations in automated DNA sequencing and computational biology. This progress has provided the data necessary to test evolutionary hypotheses using complete genomic sequences. Animal mitochondrial genomes (mitogenomes) represent the vast majority of complete genomic data. As the number of sequenced mitogenomes increases, so does our understanding of their organization and function. These studies confirm previous suggestions that animal mitogenomes are largely conserved in terms of gene order and content (Boore and Brown, 1998; Miya et al., 2005), but they also challenge previous assumptions of mitochondrial genome evolution (e.g., the lack of recombination; Piganeau et al., 2004). Although most sequenced animal mitogenomes (primarily vertebrate) share a particular gene order, rearrangements and other transformations have been described in a variety of animal groups (e.g., amphibians, squamates, teleosts, hymenopterans [Macey et al., 1997; Dowton et al., 2003; Miya et al., 2005]). These transformations include dramatic reductions in genomic size in chaetognaths (Helfenbein et al., 2004) to increases in size ranging from small tandem repeats (Moritz et al., 1987) to large-scale duplications (McKnight and Shaffer, 1997). The consequences of these duplications range from multiple functional copies of loci undergoing concerted evolution (Kumazawa et al., 1996) to rearrangements (Macey et al., 1997; Dowton et al., 2003; Miya et al., 2005) to intergenic spacers (McKnight and Shaffer, 1997; Bakke et al., 1999; Mabuchi et al., 2004). Many authors have argued that the assumed uniqueness and rarity of these genome-level transformations make them more reliable than nucleotide transformations for inferring deeper phylogenetic relationships (Moritz et al., 1987; Curole and Kocher, 1999). However, as additional mitogenomes are published, it is clear that these transformations are more common than once believed (Curole and Kocher, 1999; San Mauro et al., 2006), and, more importantly, it is striking how frequently these rearrangements occur in “hotspots” between transfer RNAs, often evolving repeatedly within large clades (Dowton et al., 2003; San Mauro et al., 2006). It seems clear that as the number of sequenced mitogenomes passes 2000, there will be continued discovery of novel and convergent rearrangements; these mitogenomic rearrangements foreshadow the complications that will be encountered as researchers begin to sequence the larger prokaryotic genomes and eukaryotic plastid and nuclear genomes. The realization that these rare mitogenomic rearrangements may arise through convergent evolution (Mindell et al., 1998; Curole and Kocher, 1999; Dowton et al., 2003) or be complicated by recombination (Piganeau et al., 2004) challenges the assumptions upon which previous analyses of these genomic characters were based. Most previous studies made a priori assumptions of locus homology based on sequence similarity, perceived secondary structure, or tRNA anticodon sequences (Dowton et al., 2003; Mabuchi et al., 2004; Miya et al., 2005). None of these methods use a historical, evolutionary framework for inferring or testing homology. Investigators using these nonphylogenetic approaches for identifying homologous loci will be misled by convergent evolution. For example, Rawlings et al. (2003) showed that tRNALeu had convergently altered its anticodon sequence at least seven times within metazoans. Therefore, it is clear that homology assessment should be explicitly tested—not assumed—in a comparative, evolutionary framework using all available evidence. When these scenarios are not quantitatively tested, they are prone to investigator bias or suboptimal explanations. This homology-assessment problem will only be exacerbated as we move from the simple animal mitogenomes to the larger prokaryotic genomes and eukaryotic plastid and nuclear genomes. Sankoff and Blanchette (1998), recognizing the need for explicit methods to infer phylogenies using genomic rearrangement data, developed breakpoint analysis, which infers phylogenetic relationships from locus order data. Unfortunately, breakpoint analysis ignores sequence variation and requires that locus homology be established prior to the analysis. As mentioned above, this locus homology assessment is difficult for all but the simplest cases. Recently, Wheeler (2007) described and implemented an algorithm in POY (Wheeler et al., 2003) that combines a breakpoint algorithm with a fixed-states nucleotide algorithm (Wheeler, 1999) to optimize chromosomal and nucleotide data simultaneously for inferring locus and nucleotide homology during phylogenetic analysis. This method does not assume locus homology or nucleotide alignment prior to analysis; it requires only that the boundaries of the loci be assigned (e.g., using start and stop codons for protein-coding sequences). Using evidence from the sequence data, this algorithm will infer homologous loci and simultaneously analyze both the nucleotide and genomic variation to recover the optimal phylogenetic relationships and hypothesize both nucleotide and locus homology in an evolutionary framework. Furthermore, this method is an improvement over current similarity-based or function-based approaches because it is automated, internally consistent, repeatable, explicit, does not presuppose a mechanism for resulting rearrangements, allows evaluation of competing hypotheses, and is grounded in a phylogenetic framework.
Herein, we use this new method to examine the evolution of this novel seabass intergenic spacer. Typically, vertebrate mitogenomes have two large intergenic spacers: the origin of light-strand replication, which has been lost in some snakes, crocodiles, and birds (Macey et al., 1997), and the control region (Fernández-Silva et al., 2003). Additional intergenic spacers are rare in vertebrate mitogenomes, but have been described between tRNAThr and tRNAPro in a variety of vertebrate clades, e.g., Xenopus (Roe et al., 1985), Struthio (Härlid et al., 1997), ambystomatid salamanders (McKnight and Shaffer, 1997), and gadiform fishes (Bakke et al., 1999). McKnight and Shaffer (1997) showed that the salamander intergenic spacer had the highest substitution rate among sequenced regions of the salamander mitogenome (including the control region). Furthermore, these authors noted that the persistence of this intergenic spacer contradicted Rand's (1993, 2001) hypotheses that mitogenomic size is minimized over time. Additional mitogenomic modifications and rearrangements have also been described and discussed by Mueller and Boore (2005) for other salamander taxa. The intergenic spacer described in this study is restricted to a subset of sea basses in the subfamily Anthiinae. The relationships of the sea basses (Serranidae and Epinephelidae) have received substantial attention (Johnson, 1983; Baldwin and Smith, 1998; Craig and Hastings, 2007; Smith and Craig, 2007), but the only explicit phylogenetic study, to date, of the Anthiinae is that of Baldwin (1990), who examined the relationships among American anthiines. Here, we report and characterize a vertebrate intergenic spacer discovered in H-strand transcription unit one (H1). Although comparatively rare, an H1 rearrangement has been noted in some toadfishes (Miya et al., 2005). The H1 includes tRNAPhe, tRNAVal, and both ribosomal RNAs (Fernández-Silva et al., 2003). Within H1, this anthiine intergenic spacer evolved between tRNAVal and the 16S large ribosomal subunit (fig. 1). The remaining 24 heavy-strand loci are transcribed in H-strand transcription unit two (H2), along with some transcriptional overlap with H1. Given the smaller size and 20-fold greater transcription rate of H1 (Fernández-Silva et al., 2003), this first described H1 intergenic spacer may affect its posttranscriptional processing.
MATERIALS AND METHODS
Acquisition of nucleotide sequences and taxon sampling
Fish tissues were preserved in 80% ethanol or were used fresh prior to extraction of DNA. Genomic DNA was extracted from muscle using a DNeasy Tissue Extraction Kit (Qiagen) following the manufacturer's protocol. Polymerase chain reaction (PCR) was used to amplify three overlapping fragments crossing three rDNA fragments (12S, tRNAVal, and 16S) and the intergenic spacer, when present (fig. 1). Multiple overlapping primer pairs were used to ensure that we were amplifying the intergenic spacer region in the mitochondrial genome instead of amplifying a nuclear pseudogene. Double-stranded amplifications were performed in a 25 µL volume containing one Ready-To-Go PCR bead (Amersham Biosciences), 1.25 µL of each primer (10 µM) and 2–5 µL of DNA. To amplify and sequence the tRNAVal, intergenic spacer (when present), and 5′ end of 16S fragment, the primers 12SL13-L 5′-TTAGAAGAGGCAAGTCGTAACATGGTA-3′ and TitusI-H 5′-GGTGGCTGCTTTTAGGCC-3′ (Titus, 1992; Feller and Hedges, 1998) were used. To amplify the middle section of the 16S fragment, the primers 16SL2A-L 5′-CCAAACGAGCCTAGTGATAGCTGGTT-3′ and 16SH10-H 5′-TGATTACGCTACCTTTGCACGGT-3′ (Hedges, 1994) were used. To amplify the final section of the 16S fragment, the primers 16S ar-L 5′-CGCCTGTTTATCAAAAACAT-3′ and 16S br-H 5′-CCGGTCTGAACTCAGATCACGT-3′ (Kocher et al., 1989; Palumbi, 1996) were used. Amplifications for all fragments were carried out in 36 cycles using the following temperature profile: initial denaturation for 360 sec. at 94° C, denaturation for 60 sec. at 94° C, annealing for 60 sec. at 48° C, and extension for 75 sec. at 72° C, with an additional terminal extension at 72° C for 360 sec. The double-stranded amplification products were desalted and concentrated using AMPure (Agencourt® Bioscience Corporation) following the manufacturer's protocol. Both strands of the purified PCR fragments were used as templates and directly cycle-sequenced using the original amplification primers and Prism Dye Terminator Reaction Kit v1.1 (ABI). The sequencing reactions were cleaned and desalted using CleanSEQ (Agencourt® Bioscience Corporation) following the manufacturer's protocol. The nucleotides were sequenced on an ABI 3730×l automated DNA sequencer. Contigs were built in Sequencher (GeneCodes) using DNA sequences from the complementary heavy and light strands. Sequences were edited in Sequencher and Bioedit (Hall, 1999). All novel sequences were deposited in GenBank under accession numbers FJ548764–FJ548784. The remaining three sequences (Myripristis berndti [NC 003189], Caranx melampygus [NC 004406], and Holanthias chrysostictus [AY141436]) were taken from GenBank and were derived from studies by Miya et al. (2005) and Chen et al. (2003). Mitochondrial DNA sequences were analyzed for 18 serranid and epinephelid species and three outgroups (Myripristis, Caranx, and Acanthistius). We focused our taxon sampling on representatives of the smaller fairy basslets (e.g., Luzonichthys, Nemanthias, Pseudanthias, Tosana) in the subfamily Anthiinae because these genera were initially found to posses the intergenic spacer in H1. We also included representatives of four additional anthiine genera (Hemanthias, Holanthias, Plectranthias, and Pronotogrammus), three serranine genera (Diplectrum, Paralabrax, and Zalanthias), two epinephelid genera (Epinephelus and Grammistes), and the former anthiine, now percoid, genus Acanthistius to examine the phylogenetic distribution of the intergenic spacer. Furthermore, we included four specimens of the species Pseudanthias tuka to assess haplotype diversity for the intergenic spacer region. All four of these specimens were procured at one time from the aquarium trade, so their collection localies are unknown.
Analysis
For the phylogenetic analysis, 17,285 aligned base pairs (based on the implied alignment; Wheeler, 2003) were analyzed. This large number of base pairs resulted from our inclusion of the complete mitogenomes for two taxa (Myripristis and Caranx). The remaining terminals sequenced in this study included the sequenced regions of the tRNAVal, intergenic spacer (when present), and 16S, encompassing 2180 aligned base pairs (fig. 1). These sequences were initially analyzed simultaneously under the optimality criterion of parsimony with nucleotide transformations (e.g., transversions, transitions, indels) equally weighted with a cost of one, a breakpoint cost (the cost for separating adjacent loci) of 100, a locus gap cost (the constant fraction of the cost of locus origin-loss) of 200, and a locus-size gap cost (the variable fraction of the cost of locus origin-loss based on the size of the locus) of one. This analysis was repeated with a variety of other plausible weighting schemes with weights from one to 500 for the various breakpoint parameters. The limited changes in results that we recovered will be discussed below, but all inferences must be based on the explicit costs.
This analysis was conducted in POY (Wheeler et al., 2003, 2006) and run on the American Museum of Natural History parallel computing cluster. The analysis began with 500 random addition sequences that were improved with TBR branch swapping and 150 parsimony ratchet replicates (Nixon, 1999). The results of these analyses were submitted to a final round of tree fusing (Goloboff, 1999a) and TBR branch swapping using fixed-states character optimization (Wheeler, 1999) and breakpoint analysis (Sankoff and Blanchette, 1998). To estimate the robustness of the clades recovered in the phylogenetic hypotheses, Bremer supports (Bremer, 1994) were calculated using TreeRot (Sorenson, 1999) in conjunction with PAUP* 4.0b10 (Swofford, 2002), and jackknife resampling analyses were performed in NONA (Goloboff, 1999b; 1,000 replications, heuristic searches, 10 random additions per replication via the WinClada interface [Nixon, 2002]) using the implied alignment, which does not include the breakpoint costs. In addition to the POY analysis, we manipulated and compared sequence data using ClustalX (Thompson et al., 1997) using default values, and we examined possible secondary structures of the intergenic spacer transcript using the program MFOLD (Zuker et al., 1999) using default values.
RESULTS
The POY analysis results in 15 equally most parsimonious trees (strict consensus shown in fig. 2). Each of the resulting optimal hypotheses has a cost of 24,036 weighted steps, which includes the traditional nucleotide transformation costs as well as POY's rearrangements costs. Based on this analysis, we hypothesize a single origin of the intergenic spacer in the ancestor of the anthiine genera Pseudanthias, Luzonichthys, Tosana, and Nemanthias. Because the analysis includes two complete mitogenomes, it explicitly tests whether the intergenic spacer evolved de novo (e.g., duplication, insertion) or is a rearrangement from elsewhere in the mitogenome. The analysis indicates that the intergenic spacer is a novel element because it was not aligned (i.e., inferred to be homologous) with any mitogenomic element in the analyses that included the entire ∼2100 base-pair region that was sequenced. Subsequent POY and ClustalX analyses that compared the intergenic spacer sequences individually to both included mitogenomes consistently aligned it with tRNAVal despite the inclusion of all mitochondrial transfer RNAs for Myripristis and Caranx. This suggests that the intergenic spacer is a duplication of this adjacent tRNA. Using the resulting phylogenetic framework, we describe the general features of and discuss the evolution of this novel mitochondrial element as well as compare our phylogenetic results to those of Baldwin (1990). The tRNAVal and 16S fragments that flank the intergenic spacer, when present, align well with the homologous regions in the taxa that lack the intergenic spacer and were identified as homologs in the POY analysis. The intergenic spacers of all 10 terminals have unique sequences with high sequence variation (uncorrected “p” distances): 0.025–0.618 (mean: 0.409 ± σ: 0.149) between species. This variation is approximately three times that found in the flanking tRNAVal and 16S fragments for the same species: 0.010–0.181 (0.140 ± 0.041). All four Pseudanthias tuka haplotypes have distinct intergenic spacer sequences (two to nine variable sites) but identical tRNAVal and partial 16S sequences. Additionally, the intergenic spacer sequences are highly length variable, ranging from 77 to 259 bps (174 ± 62) between species (fig. 2). Despite the high variability, there is a region of ∼25 base pairs at the 3′ end of the intergenic spacer sequences that is highly conserved among these 10 terminals.
DISCUSSION
As Wheeler (1995) described for nucleotide sequences, the results of this and all other numerical phylogenetic methods are affected by explicit assumptions (e.g., costs of transitions, transversions, indels, breakpoint cost, locus gap cost, locus-size gap cost) and implicit assumptions. Different costs for these various parameters may result in different hypotheses of relationships and evolutionary scenarios, as is seen with different weighting schemes in parsimony analyses or different evolutionary models in model-based phylogenetic approaches (i.e., MrBayes or maximum likelihood). Our resulting phylogeny was recovered in almost all breakpoint cost regimes that were examined. The specific breakpoint costs that we present for the calculations were chosen because they are roughly equivalent to the range of sizes for the intergenic spacer sequences. They were chosen so that the breakpoint costs would be equivalent to the number of independent substitutions and insertions required to explain the evolution of a single intergenic spacer sequence of observed length. In our diversity of analyses, the only topological changes that we found occurred when the breakpoint costs approached 1 (i.e., evidentially equivalent to a single substitution or indel). These results suggest that the intergenic spacer evolved independently (i.e., it was not homologous) in all species (except Pseudanthias tuka and Nemanthias carberryi) due to the extreme sequence variability in the intergenic spacer sequences found among species. These scenarios, as well as others, should be examined further as additional anthiine taxa are sequenced for the intergenic spacer region. Our discovery of this large, highly variable, yet persistent, intergenic spacer has implications for future studies examining the systematics and population genetics of sea basses, as well as for the evolutionary dynamics of the mitogenome. Our phylogenetic hypothesis recovered a monophyletic Anthiinae (sensu Smith and Craig, 2007) and corroborates Baldwin's (1990) and Smith and Craig's (2007) hypothesis that Plectranthias (sensu Eschmeyer, 1998) is not monophyletic and Baldwin's (1990) hypothesis that among included taxa, Hemanthias, Holanthias, and Pronotogrammus form a monophyletic group. Clearly, additional work on anthiine relationships is still desperately needed, but our study clearly traces and documents the evolution and persistence of this intergenic spacer in a diverse clade of anthiine serranids (presumably 65+ species in Luzonichthys, Nemanthias, Pseudanthias, and Tosana [Eschmeyer, 1998]). The persistence of this large intergenic spacer is as surprising as its increase in size through evolutionary time (fig. 2). Among the taxa sequenced in this study, the most basal taxon with an intergenic spacer, Pseudanthias lori, has the smallest spacer (77 bp), roughly the size of a tRNA. At each successive node, there is an increase in the size of the intergenic spacer, culminating in more highly derived taxa with spacers as large as 259 bp (fig. 2). This trend of insertions within the intergenic spacer region is surprising considering current opinions about mitochondrial DNA evolution that argue for mitogenome size reduction through evolutionary time (Rand, 1993, 2001). Rand (1993, 2001) has argued that the lack of introns and intergenic spacers in animal mitochondrial DNA is due to strong purifying selection for small mitogenomic size (and thus rapid replication time). Given this argument, one would expect large, evolutionarily stable intergenic spacers to serve a function. None of the sea bass intergenic spacer sequences possess an open reading frame of realistic size, suggesting that they are not protein coding. As a further exploration of its function, we examined the predicted secondary structures of their RNA transcripts using a free energy minimization model in MFOLD. Although the intergenic spacer sequences often form complex two-dimensional stem and loop structures (fig. 2), these structures differ between species and haplotypes. Furthermore, they are not particularly stable, with free energies ranging from −4.0 to −20.3 kcal/mol, making it unlikely that they form functional secondary structures in organello. Additionally, we compared this tRNAVal-16S region with more than 600 acanthomorph species from more than 250 families, and none of these taxa had a similar sequence between the tRNAVal and 16S. Finally, we are unable to find any sequences similar to the intergenic spacer sequences by performing BLAST searches in GenBank. Therefore, it seems likely that the only functional role the intergenic spacer may play is in posttranscriptional processing. We speculate that this conserved 25-bp region is important for accurate cleavage at the 5′ end of the adjacent 16S rRNA, a function that would have to have been retained from the ancestral tRNAVal following the hypothesized duplication to ensure proper posttranscriptional processing of the 16S rRNA. Ultimately, we agree with Curole and Kocher's (1999) assessment that “the possibility of convergent evolution of gene order can no longer be discounted.” Therefore, we have analyzed our data using POY's new algorithm (Wheeler, 2007) that combines Sankoff and Blanchette's (1998) chromosomal rearrangement (breakpoint analysis) and Wheeler's (1999) fixed-states nucleotide optimizations to examine simultaneously the identity and evolution of this novel genomic locus as well as examine relationships among a subset of sea basses. This method for simultaneously combining chromosomal order and nucleotide data to assess locus homology (i.e., genomic annotation) and nucleotide alignment is preferred over other methods because it is automated, internally consistent, repeatable, explicit, does not presuppose a mechanism for resulting rearrangements, allows evaluation of competing hypotheses, and is grounded in a phylogenetic framework. Our results suggest a single evolution of an intergenic spacer in the highly transcribed H-strand transcription unit one, presumably through duplication and subsequent evolution of tRNAVal. Finally, contrary to current mitochondrial genomic selection theory, this intergenic spacer is not only persistent, but seems to be increasing in size as it evolves.
Acknowledgments
We would like to thank J. Smith (Los Alamos National Laboratory) and J. Faivovich, T. Grant, K. Pickett, J. Sparks, M. Stiassny, and K. Tang (all at or formerly at the American Museum of Natural History [AMNH]) for discussing aspects of this project with us. We are grateful to H. Endo (Kochi University), the Gahan Family, J. Leis and M. McGrouther (Australian Museum), Reef and Fin (Stamford, CT), and H. Walker (Scripps Institution of Oceanography) for providing specimens used in this study. This project was supported by funding from the AMNH Lerner-Gray Program for Marine Research, the NASA–Ames Fundamental Space Biology Program, the Field Museum of Natural History, and the National Science Foundation (DEB-0405246 and DEB-0732642).