Primers for the amplification and sequencing of DNA fragments from chloroplast genes and non-coding regions are provided to facilitate molecular phylogenetic studies aimed at building a tree-of-life for the Asteraceae. The primers reported here have been extensively tested and some empirical guidelines are included to facilitate their use.
This paper provides sequencing primer information useful for molecular phylogenetic studies of the family Asteraceae. The initial impetus to develop primers for new markers of the chloroplast was to elucidate phylogenetic relationships of the Helian-theae sensu lato (Panero et al., in prep.). These primers are the products of extensive testing and were used to produce DNA sequence data by the first author in recent comprehensive phylogenetic studies of the family (Panero & Funk, 2002; Panero 8c Funk, in prep.). We provide these primers to expand the suite of chloroplast markers now in use for phylogenetic studies, and expect that the Asteraceae research community will use these markers and expedite, through collaboration, the creation of a tree-of-life of sunflowers. Below we provide some empirical suggestions that can facilitate the use of these markers for phylogenetic purposes.
Molecular markers were chosen from several areas of the chloroplast genome (Fig. 1) and for the most part amplify DNA of all lineages of the family and some related families in the Asterales (sensu the Angiosperm Phylogeny Group, see http://www.mobot.org/MOBOT/Research/APweb/welcome.html; Stevens, 2003). Exceptions exist however, and are noted below. Markers were originally chosen to elucidate phylogenetic relationships in the Heliantheae.
Subsequently, the study was expanded to include representatives of most lineages of the Asteraceae including many members of subfamilies Mutisioideae, Gochnatioideae, Hecastocleioideae, Carduoideae, and Pertyoideae (see Panero 8c Funk, 2002 for latest tribal classification). For the most part the Heliantheae primers worked efficiently for most lineages of the Asteraceae but in several cases additional primers were developed to obtain adequate amplifications useful in sequencing reactions.
We developed primers for markers not previously used in phylogenetic studies such as accD, ndhC, ndhD, ndhI, ndhJ, ndhK, rpoB, the ndhI-ndhG and 23S-trnA intergenic spacers, and the trnA, petB and petD introns. We provide new primers for matK, rbcL, and ndhF to accomplish the longer runs (600 bp on average) possible using current sequencing technologies. All the primers that we report here were developed by using the Nicotiana tabacum chloroplast map of Shinozaki et al. (1986) as an initial source of sequence information and complemented with the Nicotiana genome sequence available in Genbank ( http://www.ncbi.nlm.nih.gov:80/entrez/viewer.fcgi?val=NC_001879) to provide exact coordinates for most primers (indicated at the end of the primer information in Figs. 2–5). Several primers are the result of extensive cloning of Asteraceae products from poor amplifications based on Nicotiana primer sequences. This is especially true for the petB and petD introns, and the rpoB and accD genes.
In general we found that data from coding regions is easier to align than noncoding regions and therefore have lower levels of homoplasy caused by ambiguous alignment of the data. However, we have refrained, for the most part, from ranking specific regions of the chloroplast on their perceived usefulness in uncovering phylogenetic signal or ease of alignment or a combination of both. We do not endorse the use of any single marker to obtain quick, credible phylogenetic hypotheses but rather we advocate the use of multiple markers to produce well-supported tree topologies to address outstanding questions in Asteraceae systematics. Because they are physically linked and recombination is minimal, chloroplast genes likely share the same branching histories, so multiple gene studies can result in more robust estimates of phylogeny when several markers are combined (Weins, 1998; de Queiroz, 2002). Indeed, we found that data partitions for large numbers of taxa sampled across the Asteraceae and across the tribe Heliantheae differed only in minor parts of their histories (Panero and Funk, in prep.; Panero et al., in prep). However, sparse taxon sampling may increase phylogenetic estimation error (see Pollack et al., 2002; Hillis et al., 2003; Graybeal, 1998). A large number of sequences for representatives of many lineages of the Asteraceae using the primers reported herein are already at the research community's disposal (i.e., Genbank). By publishing these primers we hope to accelerate the production of a robust molecular phylogeny for the entire Asteraceae through the collaborative efforts of sunflower researchers across the world.
Amplification and Primer Construction
DNA for all markers was amplified using the following PCR protocol: 5 min denaturation at 95°C, 45 sec annealing at 48°C, and 1 min primer extension at 72°C, followed by 35 cycles with 1 min denaturation at 95°C, 45 sec (2-sec were added in each subsequent cycle) annealing at 48°C, and 1 min primer extension at 72°C. The reaction was terminated with a 10 min final extension at 72°C and then held at 10°C. In some cases, non-specific amplification (multiple band pattern) was eliminated by increasing the annealing temperature 2–5°C. PCR products were purified using filter centrifugation (Ultrafree®-MC centrifugal filter units, 30 000 nominal molecular weight limit—NMWL, Millipore Corporation, Bedford, Massachusetts, USA, or PCR cleaning columns from Qiagen). Cloning of PCR products was performed using the TOPO TA Cloning® kit from Invitrogen. Ten transformed colonies were used as templates in PCR reactions using the same specifications as outlined above. Colonies having the insert of approximately the same size as the original cloning template were identified by gel electrophoresis of PCR products. One of the colonies containing the right insert was picked at random for plasmid harvest. A sample of the chosen colony was used to inoculate 5 ml of LB growing medium. The liquid medium was incubated at 37°C and shaken at 250 rpm. After 6 hours the cells were lysed and plasmids removed using a Promega Wizard® Plus Miniprep kit. The purified plasmids were used as template in a new PCR reaction whose product was prepared for sequencing.
The program MacVector 5.0 (originally from the Eastman Kodak Company, newer versions sold by Accelrys, Inc.) was used to construct primers with melting temperatures ranging between 50–60°C and G–C content above 45%. When it was not possible to meet these requirements for a specific DNA region, primers with lower G–C contents and melting temperatures were accepted. The program excludes primer designs with hairpins and self-duplexes. Primers were obtained from Integrated DNA Technologies of Coralville, IA 52241, USA ( http://www.idtdna.com).
As a general rule, we have designed a primer overlap of approximately 30–100 bp for the longer regions that required amplification and sequencing in multiple segments (see maps in Figs. 2–5 for ndkF, ndhC, matK, rpoB, rpoC, ndhK and rbcL). This also provides some sequence editing flexibility when determining where to trim overlapping contig sequence files. We never amplify an entire gene of more than 2000 bp in length using outside primers, instead we develop internal primers to divide the sequencing products into smaller regions. This strategy provides cleaner sequencing results. Below we provide a discussion of our amplification protocols and guidelines for each marker.
ndHF. We amplified the ndkF gene in two segments using 4 primers, primer pairs 52–1212R and 972F-607 (primers 52 and 607 reported in Jansen, 1992). The internal primer 1587F and derivatives are used to read the final 3′ end of the gene. The 1587F region of ndkF is very variable and placing primers that would produce high quality sequences is challenging. The 1587 family of primers was initially developed for the Heliantheae sensu lato and allies and works well for any other sunflower except Inula that has an insertion of 3 bp at the end of the priming region. Therefore, it is advisable to sequence the 972F-607 amplification product with primer 972F first, read the 1587F primer region, and then ascertain which primer to use. A map is presented in Fig. 2.
ndkD. This gene is difficult to amplify especially when DNA templates are derived from herbarium specimens. We initially amplified the gene using primers derived from the Nicotiana sequence and then used the cloned product to design primer pairs ndhDF-672R and 732F-ndhDR (Fig. 2). As we expanded the survey of sunflowers to cover most lineages of the family, we realized that amplification was no longer efficient at the basal lineages of the family. We then created two outside primers ycs5F and psaCR to replace ndhDF and ndhDR respectively. These have the added advantage of producing an amplification product containing the entire gene.
ndhI. This is one of the most reliable markers to use for its ease of amplification. When amplifying DNA of genera from different lineages, PCR products using primers ndhGF and ndhAexon2R can be variable in size due to variation in the ndhG-ndhl spacer. This region has a higher number of base pair changes and can be useful in studies aimed at elucidating phylogenetic relationships among closely related species (Fig. 2). The DNA of some taxa, (e.g., Doronicum and Dimorphotheca) did not produce amplification product for this marker using the primer pair reported above. We solved this primer-template mismatch by replacing the ndhGF primer with ndhD's 732R and sequencing with primer ndhAexon2R. This region may also be useful at the order and subclass levels as we recently amplified DNA of the monocot genus Nolina using the primers reported here.
The 23S-trnA spacer, trnA intron, and trnA-trnI spacer. This marker is the only one we developed from the inverted repeat region (Figs. 1 and 3). The primer pair 23SF-trnIR produces a strong amplification product of equivalent size for most genera of the Asteraceae we sampled. The region is not as generally useful for phylogenetic purposes because mutations are mostly auto-pomorphic and synapomorphies are minimal. However, this region contains a synapomorphy for the Senecioneae and an important synapomorphy shared by the Polymnieae and lineages above it (Panero et al., in prep).
The matK gene and trnK introns. This region is easy to amplify and provides many mutations useful in phylogenetic studies of Asteraceae (Bayer et al., 2002). In some species of tribe Heliantheae and allies, the 3′ end of the trnK split intron contains a polyT region followed by a polyA region that makes its sequencing challenging, so this region was omitted from our phylogenetic studies. We amplify the matK region using the primer pairs 3914F–884R, 816F–1857R, and 1755F–trnK2R (Fig. 3). For certain members of the basal lineages of the Asteraceae, we developed primer 1254R because 884R was providing bad sequences, good amplification product notwithstanding.
rpoB-rpoC1 genes. These genes provide a wealth of base substitution mutations that can be useful in phylogenetic studies. The primers reported here have been used to amplify all lineages of the Asteraceae as well as representatives of the Goodeniaceae and Calyceraceae. This region is not easy to amplify in the Asteraceae and it is even more difficult when the DNA is obtained from herbarium specimens. Furthermore, many of the sequences have ambiguous outputs for the first 30–50 bases. The primers reported here are products of extensive cloning of the PCR products for several sunflower genera, and still they are not the best or final primers for this region. The 3′ end of rpoB is difficult to amplify for most members of Asteroideae probably because the gene is longer than in members of basal lineages. We amplify the rpoB gene in three segments using the following primer pairs: rpoCF-1394R, 1270F-2503R, and rpoB1R or rpoB2R-2426F. To amplify the 3′ end of rpoC1exon2 through rpoC1exon1, we use rpoC904F, 917F, or 952F with rpoBS1R or rpoBS2R. If the aim is to sequence only exon1 of rpoC1 the primers rpoCl462CF or 1485CF with rpoBSlR or rpoBS2R can be used (Fig. 4). For those interested in the rpoC1 intron region, our preliminary data (i.e., PCR band size) show that the intron is fairly similar in size across the family.
trnL intron. The trnL intron and the trnL-trnF intergenic spacer were amplified using the primers C and F of Taberlet et al. (1991). We cloned the PCR product of primers C and F and found the primer C sequence to have changed by one base in the Asteraceae (primer C-ASTER in Fig. 4). This region is quite popular judged by the number of molecular phylogenetic studies based on it; it amplifies readily and sequences are long and of good quality. As mentioned by Taberlet et al. (1991), primer F is not a sequencing primer. Contrary to his advice, we have sequenced PCR product using the F primer and have obtained mixed results. The best sequencing reactions were obtained using primer C-ASTER.
ndhJ-ndhK-ndhC region. The ndhJ-K-C region amplifies reliably but we have not yet used these primers to amplify DNA of members of the basal lineages of the family. Primer pairs ndhJF-ndhK2R and ndhK1F-ndhCR amplify in two segments the entire ndhK gene and partial amplifications for genes ndhJ and ndhC (Fig. 4). Primers for this marker need further refinement to obtain amplification products that contain the three genes in their entirety.
rbcL gene. We amplified the rbcL gene using primers widely used in previous studies including studies of members of the Asteraceae here labeled rbcL1 and rbcL2 (Hillis et al., 1996, p. 240; Olmstead, 1992; see Fig. 5). However, sequencing of amplification products using these primers produces mediocre results in the Asteraceae, especially for the initial 30–40 bp of each sequence. Reverse primer rbcL2 is located 103 downstream of the coding region in tobacco but contains a stem-loop structure that may interfere with good sequencing of all taxa. Forward primer rbcL1 is located exactly at the first 26 bases at the start of the gene so that part of the 5′ end of the gene cannot be sequenced using that primer (Fig. 5). We developed internal primers 876F and 911R to be used with rbcL2 and rbcL1, respectively. Amplification of the gene in two segments using these internal primers yields products of high quality for sequencing and the ease in obtaining PCR product is second only to ndhI. Our sequencing primers are primers 876F and 911R and read through the priming sites of rbcL1 and 2 when sequencing runs of 750 or more can be obtained. In the future we will develop primers in atpB and accD to amplify the complete rbcL gene.
rbcL-accD spacer. The intergenic spacer between rbcL and accD is highly variable in size and may be of use to those researchers interested in DNA variation among closely related taxa. We amplified this region along with the 5′ end of the accD gene and sequence the spacer portion using our primer rbcL 1581F anchored in the rbcL gene.
accD. The accD gene was sequenced for the Heliantheae and related tribes only. Amplification of basal lineages of the Asteraceae has provided mixed success to date. We amplified this gene in two segments using primer pairs rbcLl58lF-accDl09lR and rbcL912F-accD1481R. Internal primer accDR556 was used to sequence the 5′end of the gene. However, we normally sequenced some taxa using accD1091R first because the 5′ end of the gene has multiple insertions and deletions that make sequenc ing this area challenging when they coincide with the 556R primer site. The 3′ end of the gene is amplified using the primer pair accD912F-accD1481R. This area is easy to align because there is little variation in sequences; primer accD9l2F is used to sequence the 3′ end. Primers for accD and the rbcL-accD spacer were obtained after extensive cloning of several taxa of the Helian-theae and thus the primers are efficient at amplifying DNA only of taxa in this and related tribes. Further experimentation including cloning may be required to produce primers that can be viewed as truly universal for the family.
petB and petD introns. Primers for the petB and petD introns amplify very efficiently DNA of members of subfamily Asteroideae (Fig. 5). The primers were developed after extensive cloning of members of tribe Heliantheae. Both introns show little length variation across the taxa sampled and are relatively easy to align when compared to other introns commonly used in phylogenetic studies such as the rpl16 intron. We amplified these regions by using the primer pairs psbHF-petB2R and petB2F-petD2R.
The pace and ease in obtaining molecular data for systematic studies continue to accelerate and it is plausible that in the near future such studies will rely more heavily on sequence comparisons of whole chloroplast genomes, or of a specific nuclear chromosome rather than on studies limited to one or a few genes. At present however, the cost of doing partial or complete chloroplast genome studies for most genera of Asteraceae or flowering plants is still not economically feasible and most systematists continue to sequence region by region. As mentioned above, sequences using these primers have already been performed for many genera of the family and are available through Genbank. By publishing a detailed description of the primers and protocols used to generate those sequences, we hope to facilitate the Asteraceae research community's efforts to produce a tree-of-life of sunflowers, a comprehensive phylogenetic hypothesis that will stimulate posing more interesting questions about the evolution and radiation of sunflowers across most of our planet.
We wish to thank K. Sata Sathasivan of the School of Biological Sciences, University of Texas for allowing us to use his copy of the program MacVector. We thank the employees of the University of Texas Sequencing Facility, Ningna Xiao, Indu Gosh, and Cecil Harkey for their support and help during our sequencing studies. We thank Robert K. Jansen, Beryl Simpson and an anonymous reviewer for helpful comments on the manuscript. Financial support for these studies was provided by NSF.