Melissopalynology, the identification of pollen collected by bees, has proven to be an indispensable tool in fields such as pollination biology (Kearns and Inouye, 1993; Cusser and Goodell, 2013), pollinator foraging behavior (Louveaux, 1959; Wilson et al., 2010; Baum et al., 2011), sourcing and authentication of apicultural products (Louveaux et al., 1978; Jones and Bryant, 1992; Dimou et al., 2007), and honey bee (Apis mellifera L.) nutritional biology (Severson and Parry, 1981; Forcone et al., 2011; Girard et al., 2012). The lattermost has taken on particular urgency in recent years due to alarming and largely unexplained regional declines in honey bee populations (vanEngelsdorp et al., 2009). Pollen is the main source of amino acids, lipids, sterols, vitamins, and minerals in the honey bee diet (Winston, 1987), and malnutrition, caused by the displacement of natural floral communities by agricultural development, may contribute to recent patterns of honey bee decline (Naug, 2009; vanEngelsdorp and Meixner, 2010; Huang, 2012). Melissopalynology allows for direct observation of the honey bee pollen diet, providing a unique means for identifying nutritional deficits and informing potential efforts to improve foraging habitat.
Traditional melissopalynology involves the careful preparation of pollen samples for microscopic inspection followed by the identification of individual pollen grains by comparison to a similarly prepared reference collection of pollen from local taxa (Erdtman, 1943; Kearns and Inouye, 1993). When pollen collected by honey bees is used, the steps above are normally preceded by the laborious sorting of bulk pollen samples into groups of pellets having similar color and texture (Dimou and Thrasyvoulou, 2007). To the extent that this sorting succeeds in discriminating true taxonomic groups, it provides a preliminary estimate of sample diversity and streamlines the process of microscopic examination for large volumes of pollen. These methods can provide reliable data when properly executed, but they suffer from being (1) highly dependent on human expertise and vulnerable to human error, (2) limited in taxonomic precision, as many taxa are only identifiable to family, and (3) prohibitively time-consuming for large-scale studies. A reliable and efficient alternative would be of immense value for many melissopalynological applications, especially those that require the analysis of large and taxonomically diverse pollen samples, such as are commonly collected by honey bees (Wilson et al., 2010).
We present a novel approach to melissopalynology using DNA metabarcoding. We used previously validated primers specific for the second internal transcribed spacer (ITS2) in the plant ribosomal sequence (Chen et al., 2010) and performed amplicon sequencing on DNA isolated from four samples of pollen collected by honey bees over a two-week period in spring 2013. We chose the nuclear ITS2 region because prior investigation of ITS2 as a barcoding locus supports its suitability for differentiating plant taxa at the genus and, in some cases, the species level (Chen et al., 2010; Han et al., 2013; Pang et al., 2013; Tripathi et al., 2013). Wilson et al. (2010) successfully used the internal transcribed spacer (ITS), as well as other ribosomal sequences, as a genetic barcode to identify monospecific pollen collected by a specialist bee. Our primary goal was to obtain taxonomic identification at the genus level, which made the ITS2 region suitable for this barcoding project.
Here, we apply our method to determine the pollen taxa collected by honey bee colonies located in an agroecosystem devoted to corn and soybean cultivation, a context in which nutritional stress due to agricultural development would be plausible. We then compare the results of our metabarcoding approach to those obtained by traditional microscopic analysis of the same pollen samples, and discuss the relevance of our findings to both melissopalynological methodology and honey bee foraging in agricultural landscapes.
MATERIALS AND METHODS
Sample collection—Samples from this study were obtained from an apiary managed by The Ohio State University located in Madison County, Ohio (39.96095, −83.43215), between April 23 and May 6, 2013. The apiary is located in an agricultural landscape dominated by corn and soybean fields (>60% land cover), with a small amount of forest, residential area, and uncultivated fields. Four healthy, actively foraging colonies in 8-frame Langstroth hives were fitted with Sundance I bottom-mounted pollen traps (Ross Rounds, Albany, New York, USA) to collect pollen samples. Pollen traps were emptied and turned on and off on a twice weekly cycle, alternating between colonies, so that pollen was always sampled from two colonies while the remaining colonies were allowed pollen for sustenance. Samples were pooled across colonies for each collection date for all analyses.
Pollen identification by microscopy—We sorted pollen pellets from 10% (15–68 g) of the total sample for each collection date into distinct color categories and weighed each category. A 10% subsample from each color category, or up to 10 pellets if a category contained fewer than 100 pellets, was then blended in a few drops of water to achieve a spreadable consistency, and four drops of the homogenized suspension were mounted separately in basic fuchsin jelly (Kearns and Inouye, 1993) on glass slides for microscopic examination at 400×–1000× magnification.
Pollen was identified to the lowest possible taxonomic level by comparison with reference pollens collected from fresh flowers found in the study area during April–May. Reference pollens were permanently mounted in basic fuchsin jelly on glass slides. Representative pollen grains on the reference slides were photographed at 400×–1000× magnification. Reference slides and digital images of the pollens were stored at the Ohio State University Bee Laboratory (1680 Madison Avenue, Wooster, Ohio, USA). Published pollen guides and image databases were used to identify pollens that did not match any of the reference collections. Reference pollen taxa and published references used for pollen identification in this study are listed in Appendix S1 (apps.1400066_s1.docx).
The sum of weight for color categories belonging to the same taxon was recorded. If a color category contained pollens of multiple taxa, we examined 1000 pollen grains on slides prepared for that color category and estimated the proportional abundance by counting pollen grains for each taxon. We chose to exclude any plant taxa constituting less than 1% of the pollen sample. We then measured the grain size for each taxon and estimated the proportional abundance using the volumetric method described in O'Rourke and Buchmann (1991). The weight of pollen from each taxon was summed to calculate the final proportion in the total sample for each collection date.
Pollens from plants in the family Rosaceae often share similar pellet colors and morphological characteristics, making them difficult to distinguish at the generic level by visual sorting and microscopy (Moore et al., 1991). Genera within Brassicaceae were also difficult to identify using light microscopy. Improving the accuracy of pollen identification to the genus level in these families would require scanning electron microscopy (SEM) and was beyond the scope of this study. Therefore, the abundance of Rosaceae and Brassicaceae estimated by microscopy is only presented at the family level.
Pollen identification by ITS2 metabarcoding—For metabarcoding analysis, 5% (8–34 g) of the original pooled sample was mixed in a solution of 75% ethanol in deionized water at roughly a 4:1 ratio of solvent volume to sample volume. The volume of the pollen sample was estimated using a 50-mL graduated cylinder. The pollen and solvent were then mixed in a glass beaker and stirred on a magnetic stir plate for 25 min at room temperature. Additional solvent was added as necessary to ensure optimal sample mixing. Buchner funnel vacuum filtration was performed, and the resulting sample was transferred to a weigh boat to air dry in a flow hood.
We used a bead-beater pulverization method adapted from Simel et al. (1997) to free the genomic DNA from within the protective exine coat of the nongerminated pollen. Fifty milligrams of dry, homogenized pollen from each sample was placed in a 2.0-mL microcentrifuge tube with 600 µL of QIAGEN DNeasy Plant Mini Kit lysis buffer (QIAGEN, Venlo, The Netherlands). Zirconium/silica beads (0.5 mm diameter) were added until the total contents of each tube reached 1.5 mL. The pollen was then pulverized in a bead-beater (Mini-BeadBeater-1; BioSpec Products, Bartlesville, Oklahoma, USA) for 2 min. After pulverization, 300 µL of deionized water was transferred to each tube and mixed with the contents. A 300-µL portion of the resulting lysate was then transferred from each tube to a sterile 1.5-mL microcentrifuge tube. Pollen DNA was then isolated using the QIAGEN DNeasy Plant MiniKit.
After DNA isolation, the ITS2 region was amplified via PCR. We used two primers (forward: 5′-ATGCGATACTTGGTGTGAAT-3′; reverse: 5′-GACGCTTCTCCAGACTACAAT-3′) designed and validated by Chen et al. (2010), but we used alternative PCR conditions to accommodate our use of the Phusion High-Fidelity PCR Kit (New England Biolabs, Ipswich, Massachusetts, USA). Parameters of the thermocycler (Mastercycler ep gradient; Eppendorf AG, Hamburg, Germany) were set as follows: initial denaturation at 98°C for 30 s, 30 cycles of 98°C denaturation for 10 s, annealing at 59°C for 30 s, and extension at 72°C for 30 s, and a final extension step at 72°C for 10 min. These PCR conditions were calculated with the use of the NEB Tm Calculator (New England Biolabs; http://tmcalculator.neb.com/).
Following PCR, ITS2 amplicons were purified using the PureLink PCR Purification Kit (Life Technologies, Carlsbad, California, USA). At this point, 500 ng of purified PCR product from each respective sample was indexed independently using the NEBNext Ultra DNA Library Prep Kit for Illumina and NEBNext Multiplex Oligos for Illumina (New England Biolabs). After multiplexing, samples were subjected to a final Agencourt AMPure XP purification step before being pooled (Beckman Coulter catalog no. A63880 [Brea, California, USA]). A final nine-cycle library amplification PCR step was performed and samples were then analyzed on a Qubit 2.0 Fluorometer (Life Technologies), as well as an Agilent 2100 Bioanalyzer (DNA 1000 kit; Agilent Technologies, Santa Clara, California, USA) to ensure sample quality before sequencing. Paired-end sequencing was performed with the Illumina MiSeq platform using the TruSeq LT assay (Illumina Inc., San Diego, California, USA). All sequence information has been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (accession code SRP044703).
To analyze the sequencing data for our samples, we adapted previously described metabarcoding analysis pipelines (Huson et al., 2011; Mitra et al., 2011). Illumina MiSeq reads were demultiplexed using CASAVA (version 1.9; Illumina Inc.) and preprocessed using the artifacts filtering tool from the FASTX-Toolkit (version 0.0.13; http://hannonlab.cshl.edu/fastx_toolkit/). Paired-end read merging was then accomplished using PEAR (version 0.9.1), a software package that calculates highly accurate paired-end mergers of forward and reverse reads (Zhang et al., 2014). PEAR was used with default parameter settings with the exception of the Phred scale quality score trimming threshold, which was set to 20. The resulting FASTQ files were dereplicated and converted to FASTA format using the FASTX-Toolkit Collapser tool. After generation of the FASTA files, the resulting sequences were compared against a reference library of 2628 plant ribosomal sequences downloaded from GenBank on 11 March 2014. This library represents approximately half of the 4918 plant species potentially present in Ohio and surrounding states (USDA PLANTS database; http://plants.usda.gov/). This library is available in FASTA format from the Dryad Digital Repository ( http://dx.doi.org/10.5061/dryad.64b5p: Richardson et al., 2015). The BLASTN algorithm (blast-2.2.17) (Altschul et al., 1997) was used to align reads with the GenBank-derived sequence library using the following settings: E-value cutoff 1e-125, number of alignments 1, output format 0, number of descriptions 1, percent identity threshold 95%. Alignment outputs were then annotated in MEGAN (version 5.1.5) (Huson et al., 2011) using the lowest common ancestor (LCA) algorithm with the following parameter settings: min support 1, min support percent 0.0 (off), min score 50.0, max expected IE-125, top percent 100.0, min complexity 0.00.
Analysis of results—We used two statistical tests to assess whether the metabarcoding approach could produce reliable quantitative measures of relative abundance for the plant taxa in our samples. The Spearman's ρ statistic, a rank-based measurement, was employed to test the association between the number of paired-end reads aligned and the amount of pollen in grams per plant taxon identified by microscopy. Additionally, average R coefficients, analogous to those described by Bryant and Jones (2001) in relation to honey authentication, characterizing the ratio of the number of aligned paired-end reads relative to the abundance of pollen determined by microscopy for taxa found in at least three of the four samples, were calculated using the equation below. This coefficient can provide information as to whether certain taxa are over-represented or under-represented in the metabarcoding results relative to the microscopic results, as well as the variance in this over- and under-representation across samples. With further development, the R coefficient may allow the quantitative estimation of the taxa present within pollen samples by enabling the calculation of taxon-specific correction coefficients.
Pollen identification by microscopy—During the sampling period (April 23–May 6, 2013), a large proportion of bee-collected pollen originated from mass-flowering woody species, ranging from 57% to 95% of the total sample weight per collection date (Fig. 1; identified taxa are listed in Appendix S2 (apps.1400066_s2.docx)). Pollens of trees in the genera Acer L. and Fraxinus L., both anemophilous, were the most abundant pollens on the three earliest sampling dates. On May 6, more than 90% of the total sample originated from entomophilous woody species, corresponding with the full bloom of rosaceous trees (e.g., Malus Mill., Prunus L., Crataegus L., and Amelanchier Medik.) and honeysuckles (Lonicera morrowii A. Gray and L. maackii (Rupr.) Maxim.) that were common in the surrounding landscape in early May. Pollen from herbaceous weeds, predominantly dandelion (Taraxacum officinale F. H. Wigg.) followed by crucifers (Brassicaceae), comprised 43% of the total sample on April 23 and declined progressively to 5% on May 6, when bees were collecting pollen from rosaceous trees and honeysuckles predominantly.
Pollen identification by ITS2 metabarcoding—After Illumina MiSeq sequencing and paired-end merging of the forward and reverse reads, we obtained between 593,478 and 1,062,208 paired-end reads from the samples. The mean paired-end read lengths for all sequencing runs were between 461 bp and 469 bp. Upon annotating the BLAST output files with MEGAN 5, between 71,125 and 340,606 reads were taxonomically assigned. In total, we detected 42 distinct plant genera across the four samples. However, upon cross-validating the list of genera with the known floral phenology calendar for Ohio (Ohio State University, Growing Degree Days and Phenology for Ohio; http://www.oardc.ohio-state.edu/gdd/) and herbarium records (Ohio State University Herbarium; https://herbarium.osu.edu/), we found alignments to two plant genera (Lactuca L. and Glycine Willd.) that were not likely to be flowering during our sampling period. For a complete list of the plant genera detected as well as their respective paired-end alignment coverage, see Appendix S3 (apps.1400066_s3.docx).
Comparison of metabarcoding and microscopic techniques— To compare the results of ITS2 barcoding with those of traditional microscopy, we calculated the percentage of mapped reads per plant taxon in comparison to the percentage by weight of plant taxa estimated using microscopic methods (Table 1). Using the barcoding approach, we were able to detect the majority of the plant taxa observed using the microscopic approach. Pollen from Salicaceae, Lamiaceae, and Lonicera were detected by microscopy but not by metabarcoding analysis.
To test the ability of the metabarcoding method to accurately determine the relative abundance of different pollen types, we compared paired-end alignment coverage with grams of pollen collected as determined by color sorting and microscopic analysis (Spearman's correlation test; P > 0.05 for all samples; Table 2). We found no significant associations between the relative abundances of pollen types identified using microscopic-based and metabarcoding-based methods. Furthermore, upon calculating R coefficients across all taxa, specific taxa were overrepresented or underrepresented consistently in the metabarcoding results relative to the microscopic results, although the variation in the degree of misrepresentation was large, with relative standard deviations being between 39% and 84% RSD (Table 3). These results indicate that ITS2 sequencing data alone are not sufficient for quantitative measurement of relative abundance of pollens within bee-collected samples.
Comparison of percent weight by microscopy to percent pairedend read alignments per taxon for each sampling date.
Spearman's rank-based correlation test between the number of paired-end alignments and the number of grams per plant taxon.
Metabarcoding proved useful for identifying the diversity of pollen collected by honey bees. However, metabarcoding also produced a relatively longer list of plant genera than the microscopic method. This may reflect the genuinely large diversity of plant taxa visited by many thousands of foraging honey bees over a large foraging range. Some genera identified through metabarcoding appear implausible given known floral phenology. The identification of these plant genera may be explained by (1) spurious false-positive BLAST alignments; (2) bees regurgitating honey stomach contents, which could contain stored honey or particles from bee bread collected and stored in the colony for months (Vasquez and Olofsson, 2009); or (3) contact between pollen foragers and stored bee bread within the hive prior to foraging. Additionally, in the case of spurious false-positive alignments, it is important to note that the GenBank-derived BLAST library used in this study lacked ITS2 reference sequences from approximately half of the 4918 plant species potentially present in Ohio and surrounding states (USDA PLANTS database; http://plants.usda.gov/). In the absence of a true reference sequence, the BLASTN algorithm could produce a misalignment, assuming quality control cutoffs were satisfied. Lastly, some unexpected grasses identified through metabarcoding, such as Poa L., Anthoxanthum L., and Triticum L. (family Poaceae) may reflect nonforaging contact with windborne pollens, which are usually in very low abundance when present in the bee-collected sample.
The metabarcoding approach does not provide quantitative estimates of the relative abundance of pollen in polyfloral samples. The nature of ribosomal and plant genetics likely skew quantitative calculations in a species-specific or even intraspecies-specific manner. The number of ribosomal DNA cassette copies varies widely within and between species (Long and Dawid, 1980) and could influence the number of reads generated for each taxon. Furthermore, variation in DNA extraction efficiency, primer annealing, and genome copy number between plant species may also affect the number of taxon-specific reads.
The average R coefficients and standard deviations per taxon across all samples.
Metabarcoding failed to detect pollen from plants in the genus Lonicera and families Lamiaceae and Salicaceae, despite the fact that pollens from these taxa were identified using microscopy. This may be explained in several ways, such as sequence divergence in the PCR priming site or current sequencing length limitations for the Illumina MiSeq platfom used (∼300 bp). Given that the length of the ITS2 region ranges from 100 bp to 700 bp in plants (Yao et al., 2010), the failure of metabarcoding to detect these taxa suggests that improvements to this method are necessary to expand the scope of detection. This may be accomplished using different loci, multiple loci, or through the development of ITS2 primers specific to problematic plant clades. Recent work by Galimberti et al. (2014) applied rbcL and trnH-psbA amplicon cloning in conjunction with capillary sequencing to characterize the taxonomic composition of bee-collected pollen samples from Italian Alpine habitats. With the use of these loci, the investigators were able to detect Lonicera and Lamiaceae; however, given the absence of morphological validation of the amplicon clone-based method, it is difficult to infer whether the primers used for these loci were relatively more or less expansive than the primers used in this study.
Both methods of pollen identification were in agreement with the conclusion that honey bees collect pollen from a wide variety of flowers in the agricultural landscape and the taxa visited varied over the weeks of our study. What was apparent from our microscopic results, but less clear through metabarcoding, was a dramatic shift in the assemblage of bee-collected pollen (Fig. 1). Early samples (April 23 and 29) were dominated by nonrosaceous woody taxa (Acer, Fraxinus) and dandelion (71 officinale). The later samples (May 2 and 6), however, saw a sudden increase in the prevalence of rosaceous woody taxa and a corresponding decline in Acer, Fraxinus, and T. officinale. This shift in principal pollen sources represents not only a phenological-taxonomic transition but also a change in spatial foraging patterns between different floral habitats. Acer and Fraxinus were restricted to small tracts of forest and scattered residential areas within the field-crop matrix, while T. officinale occurred abundantly in residential areas, along roadsides, and in agricultural fields that had not been recently tilled or treated with herbicide. The rosaceous woody taxa that became important in the latter half of our samples consisted of a combination of cultivated trees (e.g., Malus) from residential properties and wild taxa such as Crataegus and Prunus that occurred mainly in forest edge and understory habitat.
Our data underscore the dependence of honey bees upon landscape features that are systematically marginalized by agricultural intensification, namely field edges, forest, and residential habitats. Furthermore, the prevalence of herbaceous weeds, such as T. officinale, is highly subject to weed control practices. Thus, intensive field crop agroecosystems are not necessarily nutritional deserts for honey bees in the springtime, but nutrition could become limiting if aggressive land conversion and weed control practices progress unchecked.
In addition to using this methodological pipeline for apicultural applications, our approach may provide a useful tool for many other areas of scientific research. Honey bees collect pollen from a diverse array of plant taxa, possess a near-global geographic distribution, and a single colony can forage over an area of more than 100 km2 (Beekman and Ratnieks, 2000; Moritz et al., 2005). As such, honey bees may be seen as a large-scale sampling tool. This trait of honey bees has already been explored for monitoring industrial pollution, airborne bacteria and viruses, and even explosives (Bromenshenk et al., 1985; Lighthart et al., 2000, 2005). Furthermore, given a reliable, high-throughput metabarcoding approach to melissopalynology, honey bees could be employed to sample a regional flora rapidly, inexpensively, and with an intensity unapproachable by human investigators. Moreover, honey bees can easily sample environments that are not amenable to conventional floral sampling, such as dense forests or urban centers. This unique utility of honey bees could be broadly applicable in areas of research such as biogeography, biodiversity assessment, population genetics, floral phenology tracking, and invasive species monitoring.
In this study, we applied high-throughput ITS2 metabarcoding in conjunction with traditional microscopic melissopalynology to determine the pollen foraging habits of honey bees in an agroecosystem dominated by field crops. We demonstrate that the metabarcoding approach exhibits strengths in terms of ease of implementation and its ability to detect pollens that are rare or difficult to identify by microscopy. This approach also exhibits weakness due to the nonquantitative nature of the method. As such, we suggest this method is useful for qualitative applications such as large-scale melissopalynology-based floral surveying and pollinator habitat preservation. Melissopalynology has previously been difficult to apply in such applications due to the time, expense, and expertise required for microscopic pollen identification. The development and further refinement of molecular melissopalynological techniques may be a suitable research avenue for overcoming these difficulties.
 The authors thank J. Wenger, E. E. Wilson, C. S. Sidhu, J. Wallace, B. Klips, A. D. Wolfe, and anonymous reviewers for discussion; N. Douridas for site access; G. Cobb for help with sample processing; M. E. Hernandez-Gonzalez and the Ohio Agricultural Research and Development Center Molecular and Cellular Imaging Center staff for technical support; and the Ohio Supercomputing Center for computing time. This study was funded by the Pollinator Partnership's Corn Dust Research Consortium grant to R.M.J. and an Ohio State University–Newark Scholarly Activity Grant to K.G.