Chloroplast DNA (cpDNA) sequences have historically been used for both phylogenetic and phylogeographic studies in plants (Soltis et al., 1997; Schaal et al., 1998; Schaal and Olsen, 2000; Shaw et al., 2005; Avise, 2009). Despite some advantages (i.e., uniparental inheritance, genome stability, and rare recombination; Clegg et al., 1994; Ravi et al., 2007; Pleines et al., 2008; Borsch and Quandt, 2009), there are also limitations in using cpDNA at the shallowest taxonomic level (i.e., low genetic variability and evolutionary rate heterogeneity between lineages; Korotkova et al., 2014; Shaw et al., 2014). Nuclear regions are usually combined with cpDNA data, considering that these regions usually have a higher evolutionary rate in these organisms (Wolfe et al., 1987). Low-copy or single-copy nuclear genes (LCGs) are promising markers to perform comparative studies in plants and, frequently, LCGs present higher levels of variability than the traditional plastid markers at the shallowest taxonomic level (e.g., Sang, 2002; Small et al., 2004; Naumann et al., 2011).
Cereus Mill. (Cactaceae) represents a Neotropical, long-lived, succulent taxon comprising approximately 30 species mainly distributed in South America and currently subdivided into four subgenera: Cereus, Ebneria (Backeb.) D. R. Hunt, Mirabella (F. Ritter) N. P. Taylor, and Oblongicarpi (Croizat) D. R. Hunt & N. P. Taylor (Hunt et al., 2006). Cereus presents some characteristics of Cactaceae taxa that make the genus a candidate for conducting evolutionary studies in relation to: (1) their specialization for xeric habitats (Arakaki et al., 2011; Hernández-Hemández et al., 2011; Bonatelli et al., 2014), being potentially informative on the causal effects of Pleistocene paleoclimatic changes under Neotropical biota (Majure et al., 2012; Bonatelli et al., 2014); (2) the diversity of environments where species occur, facilitating adaptation studies; and (3) their broad distribution across South American biomes (Taylor and Zappi, 2004; Hunt et al., 2006), permitting biogeographical studies on both continental and subcontinental scales.
As screening for variable markers has become the first mandatory step to perform phylogeographic studies in targeted species due to the high evolutionary rate heterogeneity between plant lineages (Shaw et al., 2014; Korotkova et al., 2014), our aim was to evaluate whether regions that are likely variable among closely allied species (i.e., those described by Shaw et al., 2007, 2014) could similarly be predicted as variable in intraspecific studies. We focused on three biological hierarchical levels to screen for potentially informative markers for Cereus: distinct clades in the phylogeny, different species of the same clade, and allopatric populations of the same species. We selected 16 plastid segments to collect empirical data, including those more likely to be variable as proposed by Shaw et al. (2005, 2007, 2014): rpS16, rpL16, trnL-trnF, trnT-trnL, 3′trnK-matK partial, 3′rps16-5′trnK(UUU), trnS-trnG, trnH-psbA, trnQ(UUG)-5′rps16, rpl32-trnL(UAG), psbJ-petA, atpI-atpH, petL-psbE, psbD-trnT(GGU), 3′trnV(UAC)-ndhC, and ndhF-rpl32. Furthermore, we included in our screening intronic regions of two nuclear genes: the impaired sucrose induction gene (isi1; Rook et al., 2006) and the Na+/H+ vacuolar antiporter gene (nhx1; Gaxiola et al., 1999), because these genes have been used successfully in comparative studies within Cactaceae (Franck et al., 2012, 2013). With this approach, we were able to identify regions suitable for intra- and interspecific studies in Cereus. In general, we found more variability in nuclear than in plastid regions. Moreover, our study suggests no correspondence between variation of plastid regions at the interspecific and intraspecific level, probably due to lineage-specific variation in cpDNA. This trend also appears to a lesser degree in nuclear sequences. The causes of lineage-specific evolutionary rates, especially in cpDNA, are discussed.
MATERIALS AND METHODS
Sampling and DNA extraction —We screened for molecular variability across 12 species of three subgenera of Cereus, according to phylogenetic information currently available for the group (Table 1; Appendix S1 (apps.1500074_s1.pdf)). Genomic DNA from each sample was extracted from root tissue using the QIAGEN DNeasy Plant Mini Kit (Hilden, Nordrhein-Westfalen, Germany) and maintained at −20°C.
DNA amplification and sequencing —Many relevant papers have contributed to the development of primers for the shallowest taxonomic level studies in plants (e.g., Taberlet et al., 1991; Savolainen et al., 1994; Hamilton, 1999; Muller and Borsch, 2005). However, efforts to identify and compare additional and predictive variable regions of the chloroplast genome throughout angiosperm groups were primarily undertaken by Shaw et al. (2005, 2007, 2014).
We selected 16 plastid molecular markers: 14 intergenic spacers (trnT-trnL, 3′trnK-matK partial, trnL-trnF, 3′rps16-5′trnK(UUU), trnS-trnG, trnH-psbA, trnQ(UUG)-5′rps16, rpl32-trnL(UAG), psbJ-petA, atpI-atpH, petL-psbE, psbD-trnT(GGU), 3′trnV(UAC)-ndhC, and ndhF-rpl32) and two introns (rpL16 and rpS16), which are considered the most variable regions on average for angiosperms as a whole in both Shaw et al. publications (Shaw et al., 2007, 2014: 3′rps16-5′trnK(UUU), trnQ(UUG)-5′rps16, rpl32-trnL(UAG), psbJ-petA, atpI-atpH, petL-psbE, psbD-trnT(GGU), 3′trnV(UAC)-ndhC, ndhF-rpl32), and also other variable markers previously used in Cactaceae (trnT-trnL, 3′trnK-matK partial, trnL-trnF, trnS-trnG, trnH-psbA, and rpL16; Nyffeler, 2002; Bonatelli et al., 2013). The rpS16 intron was included because it is among the most used in species-level studies due to the early development of universal primers for this region (Shneyer, 2009). The selected nuclear regions (nhx1 and isi1) were previously used in Cactaceae (Franck et al., 2012, 2013), being potential regions for screening for variability. Each plastid and nuclear region was amplified using universal primers ( Appendix S2 (apps.1500074_s2.docx)).
Table 1.
Species of Cereus and number of samples selected in the current study. Clades were delimited based on phylogenetic ( Appendix S1 (apps.1500074_s1.pdf)) and taxonomic information. See Appendix 1 for GenBank accessions.
PCR reactions for plastid regions were performed in a total volume of 15 µL containing 1 µL of genomic DNA (10–40 ng), 1× reaction buffer, 0.5–1 unit of Promega Taq DNA polymerase (Promega Corporation, Madison, Wisconsin, USA), and 200 µM dNTPs. The MgCl2 and primer concentrations as well as the temperature conditions used are described in Appendix S3 (apps.1500074_s3.doc). For nuclear regions, PCR reactions were performed in a total volume of 30 µL containing 2 µL of genomic DNA, 1× reaction buffer containing MgCl2 to 2 mM, and 1 unit of Q5 Hot Start High-Fidelity DNA Polymerase (New England Biolabs, Ipswich, Massachusetts, USA). Amplification was performed using an Eppendorf Mastercycler (Eppendorf, Hamburg, Germany). PCR products were isolated on a 1% agarose gel stained with SYBR Safe (Invitrogen, Carlsbad, California, USA) using the BioDoc-It 220 Imaging System (UVP, Upland, California, USA) and purified with ExoSap-IT PCR Product Cleanup (Affymetrix, Santa Clara, California, USA) or the illustra GFX Gel Band Purification Kit (GE Healthcare, Piscataway, New Jersey, USA). Sequencing was performed using an ABI PRISM 3730 DNA Analyzer (Applied Biosystems, Foster City, California, USA) with the BigDye Terminator version 3.1 Cycle Sequencing Kit (Applied Biosystems). Forward and reverse sequences were compared and edited in Chromas Lite 2.0 software (Technelysium Pty Ltd., South Brisbane, Australia; www.technelysium.com.au). The sequences were aligned using ClustalW 1.8 software (Thompson et al., 1994) available in BioEdit (Hall, 1999).
Descriptive analysis —We adopted three levels of variability assessment, with a minimum threshold of four potentially informative characters (PIC; Shaw et al., 2005) to progress to the next level of screening. We followed the respective levels of analyses according to phylogenetic information available for the group ( Appendix S1 (apps.1500074_s1.pdf)):
Clade level: species allocated into distinct clades of the Cereus phylogeny ( Appendix S1 (apps.1500074_s1.pdf)), which roughly correspond to a comparison of distinctive subgenera: clade A (C. fernambucensis Lem., subgenus Cereus), clade C (C. mirabella N. P. Taylor, C. albicaulis (Britton & Rose) Luetzelb., subgenus Mirabella), clade B (C. hankeanus F. A. C. Weber ex K. Schum., subgenus Ebneria), and clade D (C. saddianus (Rizzini & Mattos) P. J. Braun, subgenus Ebnerid), totaling five sequences analyzed in this level;
Subclade level: species allocated in clade A ( Appendix S1 (apps.1500074_s1.pdf)) and belonging to subgenus Cereus (C. fernambucensis subsp. fernambucensis, C. fernambucensis subsp. sericifer (Ritter) N. P. Taylor & Zappi, C. hildmannianus K. Schum., C. insularis Hemsl.), totaling four sequences analyzed in this level. This level roughly corresponds to a comparison within the same subgenus;
Species level: allopatric populations of C. fernambucensis (subspecies: four samples from different populations of C. fernambucensis subsp. fernambucensis, two samples from different populations of C. fernambucensis subsp. sericifer), totaling six sequences analyzed in this level.
Standard indices of variability were considered with all sequences obtained for each level of analysis. Nucleotide diversity (π), number of haplotypes (h), polymorphic sites (S), and average number of nucleotide differences (k) were calculated for each level of analysis in DnaSP version 5 (Librado and Rozas, 2009). The percentage of variability was calculated using the formula (PIC/L) × 100, where L is the total length of the sequence. We used a simple model of genetic distance (average p-distance and between group p-distance) among the levels calculated in MEGA 5.1 (Tamura et al., 2011). More complex models of genetic distance were also tested in our sampling and presented similar results to the simple model used in this study (data not shown).
RESULTS
Three of the 16 segments screened in Cereus (3′trnV(UAC)ndhC, ndhF-rpl32, and rpl32-trnU(UAG)) could not be amplified, even after several attempts to modify the PCRs ( Appendix S3 (apps.1500074_s3.doc)). This is not an uncommon result for universal primers such as ndhF-rpl32 and rpl32-trnL(UAG) (Prince, 2015) due the high variability of these regions, which prevents the design of universal primers (Shaw et al., 2014). For some Cactaceae, for example, to use this segment it has been necessary to design a new primer set for focal taxa (Calvente et al., 2011; Majure et al., 2012).
The results for the remaining 13 cpDNA segments are as follows: (1) the trnL-trnF, trnT-trnL, and 3′trnK-matK partial showed no or low variation at the clade level; (2) the segments trnH-psbA, rpS16, and 3′rps16-5′trnK(UUU) were variable only at the clade level; (3) the segment atpI-atpH was variable only at the clade and subclade levels; and (4) the rpL16 intron and the segments trnQ(UUG)-5′rps16, petL-psbE, psbD-trnT(GGU), psbJ-petA, and trnS-trnG were variable at all three levels of analysis (Table 2). The most variable segment at the clade and subclade levels was the trnQ(UUG)-5′rps16 segment, whereas trnS-trnG showed the highest PIC value among plastid regions at the species level (Table 2, Fig. 1).
The main sources of variation in cpDNA markers were insertion-deletions and nucleotide substitutions. The exception was the rpS16 intron, which showed an inversion in the C. saddianus sample. Inversions are difficult to analyze because they are not well recognized in the alignment, especially in small sample sizes (Borsch and Quandt, 2009). Recent studies have discovered that this mutation is usually common in noncoding regions (Borsch and Quandt, 2009), even at population level or in lineage-specific analyses (Quandt et al., 2003; Borsch and Quandt, 2009; Korotkova et al., 2014). Inversions are usually associated with inverted repeat sequences, yielding a hairpin secondary structure such as predicted in our sampling ( Appendix S4 (apps.1500074_s4.pdf)). To investigate the presence of this mutation in C. saddianus populations, we increased the number of samples. We observed some intraspecific variation in C. saddianus populations, detecting a 35-bp inversion as well as other variable characters ( Appendix S4 (apps.1500074_s4.pdf)).
The nuclear regions were variable on all three levels of analysis, being potentially informative for phylogenetic and phylogeographic studies, at least in the genus Cereus (Fig. 1, Table 2). Furthermore, one of the nuclear regions (isi1) presented higher variability than plastid data in almost all levels of analysis (except for the first level of analysis where trnQ(UUG)-5′rps16 was the most variable segment).
Genetic distance data calculated for those segments that were variable at the three levels of analysis suggest heterogeneity across levels and occasionally presented higher genetic distance within species than among species. This result clearly suggests that evolutionary rates vary among closely related species as well as within population units (Fig. 2). Furthermore, the plastid markers presented a higher evolutionary rate heterogeneity than nuclear regions, suggesting that plastid regions may likely be more affected than nuclear regions.
DISCUSSION
The three levels of analysis established in this study allowed us to analyze the potential utility of each region according to the distinct hierarchical levels defined in the Cereus phylogeny. From the 13 screened plastid regions successfully amplified in our sample, we identified three potentially informative markers for interspecific studies in Cereus (atpI-atpH, trnH-psbA, and 3′rps16-5′trnK(UUU)) and eight potentially informative markers for interspecific and intraspecific studies (six plastid regions: trnS-trnG, psbD-trnT(GGU), petL-psbE, trnQ(UUG)-5′rps16, psbJ-petA, and rpL16; and two nuclear regions: nhx1 and isi1).
Most of these regions were previously used in higher taxonomic studies in Cactaceae (e.g., Nyffeler, 2002; Calvente et al., 2011; Hernández-Hernández et al., 2011). Although most of the segments that revealed variability in the three levels of analysis were previously used in cactus studies to establish evolutionary relationships for phylogeographic analyses (trnQ(UUG)-5′rps16, psbJ-petA, and rpL16 [Korotkova et al., 2011]; trnS-trnG [Bonatelli et al., 2013]), some of the plastid regions previously used at the intraspecific level did not present enough variation in this work (trnT-trnL and psbA-trnH [Korotkova et al., 2011; Bonatelli et al., 2013]).
The segments rpl32-trnD(UAG), ndhF-rpl32, and trnV-ndhC were considered among the best ranked for angiosperms in general, but could not be amplified in this work. Segment rpl32-trnL(UAG) has been successfully used in evolutionary studies with Cactaceae (Larridon et al., 2015), but due to its high variability in the flanking region (Shaw et al., 2014) it is usually necessary for a new set of primers to be developed for a respective target group (e.g., Calvente et al., 2011; Majure et al., 2012). It is likely that rpl32-trnU(UAG) should be highly variable and useful for comparative studies, such as shallow phylogenetic (Miller et al., 2009; Calvente et al., 2011; Ornelas and Rodriguez-Gomez, 2015) and phylogeographic studies (Jiménez-Mejías et al., 2012; Aguirre-Liguori et al., 2014; Sramkó et al., 2014). However, due to the lack of suitable universal primers (Prince, 2015) its potential has not been explored.
It is worth noting that nuclear regions presented high variability when compared with plastid DNA (Fig. 1), thus becoming promising regions to perform inter- and intraspecific studies in Cereus. This is in agreement with the premise that nuclear markers frequently show more variation in plants than plastid markers (Sang, 2002; Small et al., 2004; Zimmer and Wen, 2012). For most of the cpDNA variable regions, we did not observe the expected increase in genetic differentiation among higher taxonomic evolutionary levels. In contrast, for the rpS16 intron, for example, we detected higher variation at the population level than in subclade or clade levels (Fig. 2). These results agree with the high heterogeneity in molecular evolutionary rates in plants (Korotkova et al., 2014; Shaw et al., 2014); however, they also suggest that lineage-specific variation in cpDNA seems to be more accentuated than in nuclear DNA. Furthermore, these data indicate that rate heterogeneity in cpDNA evolution appears to increase in early stages of population differentiation (Duchene and Bromham, 2013; Bromham et al., 2015).
Table 2.
Variation in plastid and nuclear regions at each level of analysis.
Non-clock events and punctuated molecular evolution occur widely in nature (Clegg et al., 1994; Pagel et al., 2006) and are extremely common in plants (Pagel et al., 2006), most likely due to patterns of hybridization, polyploidy, and gene duplication in these organisms (Pagel et al., 2006; Duchene and Bromham, 2013). Other explanations have been proposed for punctuated effects, including ecological traits such as life history (i.e., generation time associated with life forms such as shrubs, trees, and herbaceous plants; annual and perennial life cycles [Bromham, 2009]), environmental variables (i.e., temperature and UV radiation [Lancaster, 2010; Gaut et al., 2011]), and microevolutionary processes that occur during population differentiation (i.e., the founder effect and natural selection [Barraclough and Savolainen, 2001; Pagel et al., 2006; Duchene and Bromham, 2013; see also Lancaster, 2010; Pennell et al., 2014]).
Recently, Duchene and Bromham (2013) and Bromham et al. (2015) have reported faster rates of molecular evolution in plastid genes in species-rich lineages. Such observations suggest that cytonuclear interactions among plastid and nuclear genes promote hybrid incompatibility and sterility and possibly accelerate the number of substitutions in a single lineage, thus promoting hybrid incompatibility (Bromham et al., 2015). These interactions may indirectly drive the divergence of neutral chloroplast regions, resulting in heterogeneity among lineages (e.g., Shaw et al., 2014).
Although the causes of lineage-specific rates are under discussion, the possibility that rate heterogeneity in molecular markers in plants, especially those from cpDNA, may initiate early during diversification should be considered in the experimental design used to perform empirical screening for variation. In general, most screenings first compare allied species and then suggest the use of the best markers for the intraspecific level (e.g., Miller et al., 2009; Dong et al., 2012). For example, this is the design used in the publications by Shaw et al. (2005, 2007, 2014). Undoubtedly, this strategy may work, but our data agree that such interspecific comparisons do not ensure variability at the intraspecific level. For this reason, we recommend that the experimental design of the initial screening should consider the biological unit(s) (i.e., species, subspecies, and populations) that is as close as possible to the target unit(s) to be studied.
In summary, our data indicate that intronic regions of nuclear genes isi1 and nhx1 are candidate markers for comparative studies in cacti, in concordance with previous data (Franck et al., 2012, 2013). The plastid segments trnQ(UUG)-5′rps16, rpL16, and trnS-trnG showed higher levels of variability among the cpDNA markers tested at the population level for Cereus, becoming candidate markers for further phylogeographic studies in this genus, which is broadly distributed in South America. It is worth noting that despite the lack of universality in cpDNA variability, these three markers were considered potentially variable regions as reviewed by Shaw et al. (2005, 2007, 2014), and when these three regions were screened together, at least one of them showed useful information (e.g., Byrne and Hankinson, 2012; Martínez-Nieto et al., 2013). Thus, we suggest that these three segments are candidate regions to be included in initial variation screening for plant phylogeographic studies.
LITERATURE CITED
Appendices
Appendix 1.
Origin of the material used in this analysis and GenBank accession numbers. NS = not sampled. Taxon; population code; GenBank accessions: trnS-trnG, rpL16, psbJ-petA, atpI-atpH, psbD-trnT(GGU), petL-psbE, trnQ(UUG)-5′rps16, trnH-psbA, rpS16, 3′rps16-5′trnK(UUU), trnT-trnL, trnL-trnF, 3′trnK-matK, nhx1, isi1.
Notes
[1] We are particularly grateful to G. Olsthoorn for supplying some of the samples of taxa used in this study. We are also grateful to H. S. M. Utsunomiya for technical assistance and to the anonymous referees and I. A. S. Bonatelli for critical suggestions. This work was supported by grants from the Fundação de Amparo à Pesquisa do Estado da São Paulo (FAPESP) to M.R.B. (2013/07211-7), F.F.F. (2010/19557-7). and to E.M.M. (2005/55200-8).