The Antarctic marine environment, although rich in life, is predicted to experience rapid and significant effects from climate change. Despite a revolution in the approaches used to document biodiversity, less than one percent of Antarctic marine invertebrates are represented by DNA barcodes and we are at risk of losing biodiversity before discovery. The ease of sequencing mitochondrial DNA barcodes has promoted this relatively ‘universal’ species identification system across most metazoan phyla and barcode datasets are currently readily used for exploring questions of species-level taxonomy. Here we present the most well-sampled phylogeny of the direct-developing, Southern Ocean nudibranch mollusc, Doris kerguelenensis to date. This study sampled over 1000 new Doris kerguelenensis specimens spanning the Southern Ocean and sequenced the mitochondrial COI gene. Results of a maximum likelihood phylogeny and multiple subsequent species delimitation analyses identified 27 new species in this complex (now 59 in total). Using rarefaction techniques, we infer more species are yet to be discovered. Some species were only collected from southern South America or the sub-Antarctic islands, while at least four species were found spanning the Polar Front. This is contrary to dispersal predictions for species without a larval stage such as Doris kerguelenensis. Our work demonstrates the value of increasing geographic scope in sampling and highlights what could be lost given the current global biodiversity crisis.
Introduction
Biodiversity and natural ecosystems provide humankind with significant economic benefits (e.g. Wallace 1997; McClintock and Baker 2001; Pertierra et al. 2021) by way of indirect essential services (e.g. maintaining water cycles – McNeely et al. 1990; CO2 emission mitigation – Domke et al. 2020). Despite all known benefits, the rapid destruction of the world’s most diverse ecosystems has led most experts to conclude that the earth’s biological diversity is in danger (Singh 2002). Although the total number of extant species is currently estimated to be between 5.3 million and 1 trillion, only 1.85 million species have been formally described (Mora et al. 2011; Locey and Lennon 2016). We also know that our oceans house a large array of marine species (146 969 accepted species; Ocean Biodiversity Information System, see https://obis.org/), with another 1.4–1.6 million species hypothesised to exist, and are awaiting discovery and description (Bouchet 2006). Concurrent with the evolution of human societies (c. 11 000 years ago) in the form of advanced infrastructure, farming and transport, there has been continual biodiversity loss, both of species, and wider ecosystem integrity and functionality (e.g. Rogers-Bennett and Catton 2019; Prates and Perez 2021). This loss is underestimated due to taxonomic uncertainty and data gaps. Throughout geological time, at least five putative mass extinctions have occurred (Barnosky et al. 2011). Unlike past natural events, however, that spanned millions of years, the present anthropogenically driven mass extinction is likely to occur over only hundreds of years (as few as 200 years) (see: Ceballos et al. 2015). This loss of biodiversity has significant effects on ecosystem function that has implications for human populations and these may still be vastly underestimated.
Despite being the most isolated continent on Earth, Antarctica has not escaped the negative effects of human activity (Vaughan et al. 2003; Aronson et al. 2011; Chown et al. 2015; Stephens 2018). Such impacts include pollution, overfishing, increased melting of ice and the introduction of invasive or alien species (Tin et al. 2009; Aronson et al. 2011; Stark et al. 2019; Avila et al. 2020; De Castro-Fernández et al. 2021). Globally, the Antarctic region is known to act as a significant carbon sink and surrounding ecosystems are experiencing temperature increases due to rising global atmospheric carbon dioxide concentrations (Ito et al. 2010). Parts of the Antarctic Peninsula, including the west Antarctic Peninsula for example, are particularly vulnerable to these human-induced ecological disasters, as these are already experiencing the greatest increases in mean annual atmospheric temperatures on Earth (Chapman and Walsh 2007; Clarke et al. 2007; Ingels et al. 2012; Torre et al. 2017). Failure to address these global threats will not only result in the degradation of Antarctic marine ecosystems but will potentially lead to enormous losses of global biodiversity.
Technical advances in molecular phylogenetics over the past several decades have resulted in the development of many more sensitive rapid tools for detecting new species (Féral 2002; Goetze 2003; Baird et al. 2011). The ease of sequencing mitochondrial DNA barcodes has promoted a relatively ‘universal’ species identification system across most metazoan phyla and now barcode datasets are more readily used for exploring questions of species-level taxonomy (Hebert et al. 2003; Puillandre et al. 2012; Taberlet et al. 2012; Eberle et al. 2020). In current phylogenetic studies the types of molecular information extracted for rapid assessments of animal diversity have evolved from interpreting single-locus (e.g. Hebert et al. 2003) to whole-genome datasets (e.g. Jensen et al. 2021). This expansion of available sequences, facilitated by the automation of sequencing and decrease in sequencing costs, has accelerated phylogenetic studies and whole-genome-based research (e.g. Johnson et al. 2008; Layton et al. 2018; Cai et al. 2019). This is particularly significant when barcoding cryptic and pseudocryptic species (organisms that are first discerned by non-morphological methods, e.g. Brasier et al. 2016; Matsuda and Gosliner 2018; Tyagi et al. 2019). When divergent interspecific traits are not morphologically obvious, traditional taxonomic methods fail to detect speciation events and biodiversity will remain under-reported (Knowlton 1993; Baird et al. 2011).
The Antarctic continental shelf is one marine realm that has recently revealed apparent high levels of cryptic species (e.g. Linse et al. 2007; Wilson et al. 2009; Baird et al. 2011; Brasier et al. 2016), specifically in organisms with poor dispersal abilities (Griffiths 2010; De Broyer and Danis 2011; Grant et al. 2011; Neusser et al. 2011; Brandt et al. 2012). The growth and decay of ice sheets during Antarctica’s history has been one of the most significant disturbances acting at an ecosystem level across the shelf and underpins one of the fundamental frameworks for understanding Antarctic diversity, the ‘Antarctic Biodiversity Pump’ hypothesis (Clarke and Crame 1989, 2010; Gutt and Starmans 2002; Thatje et al. 2005). This hypothesis proposes that overall, increased speciation rates are the result of Milankovitch-cycle driven glacial oscillations that drive cryptic and allopatric speciation by gene flow inhibition and speciation events (Clarke and Crame 1989; Crame 1997; Griffiths 2010). A substantial amount of Antarctic species diversity is hypothesised to be the result of species flocks that have been identified as monophyletic taxa displaying high levels of endemic species that are ecologically diverse and abundant in relation to the surrounding habitat (Ribbink 1984; Eastman and McCune 2000; Lecointre et al. 2013; Chenuil et al. 2018). During these glacial oscillations and ensuing periodic habitat disruptions, ice-free refugia are thought to have sustained reduced populations of once widely distributed species (e.g. Smith et al. 2010; Lau et al. 2020), allowing many to diverge during this time.
One Southern Ocean marine invertebrate that is emerging as a case study in cryptic speciation is Doris kerguelenensis (Bergh, 1884). Recent studies on this nominal species have highlighted either new genetic lineages (Wilson et al. 2009, 2013) or new chemical compounds (Iken et al. 2002; Maschek et al. 2012; Avila 2020). Doris kerguelenensis is a direct-developing sea slug with limited dispersal potential and long generation times (Hain and Arnaud 1992; Moles et al. 2017a ). Belonging to the Dorididea, this nudibranch is a simultaneous hermaphrodite that feeds exclusively on sponges and synthesises secondary metabolites de novo (Maschek et al. 2012). Wägele (1990) reviewed and ultimately synonymised ten Southern Ocean dorid nudibranch species within the single, morphologically variable species Austrodoris kerguelenensis (Bergh, 1884). That work also designated another two species as nomina dubia due to misplaced holotypes and inadequate descriptions. The genus Austrodoris Odhner, 1926 was subsequently revised and synonymised along with five other cryptobranch dorid nudibranch genera into Doris Linnaeus, 1758 (Valdes 2001). Wilson et al. (2009) examined the mitochondrial protein-coding gene Cytochrome Oxidase I (COI) and revealed 29 putative lineages within this nominal species. These lineages were subsequently corroborated with nuclear and metabolomic trait data to infer biological species (Wilson et al. 2013); therefore, this sets a sound basis for using COI clades as a proxy for species in this study. Interestingly, three new species were recovered by resampling in the same geographic regions (Wilson et al. 2013), hinting at further undetected diversity. In this study, we explored the identity of over 1000 new Doris kerguelenensis specimens using mitochondrial DNA sequence data, with the aim of (i) creating an expanded phylogenetic hypothesis for the group, (ii) testing for additional cryptic species within this species complex and (iii) exploring the distributional patterns of these species.
Methods
Specimen collection and preservation
In this study, 1275 individuals of Doris kerguelenensis were included from 146 sites from depths between the intertidal and 798 m (Fig. 1, Supplementary Table S1 (IS21073_AC.zip)). Our specimens were collected during various Antarctic field expeditions using a Blake trawl, Smith-McIntyre grab, Agassiz trawl, wire dredge, epibenthic sled or hand collected by SCUBA diving. Samples were collected from various locations in the Southern Ocean between 2006 and 2018 (Fig. 1). Nineteen geographical regions were defined a priori (Table 1). Owing to differing depths, distances between regions, coastal currents and ocean circulation patterns (Smith et al. 1999) we separated the Antarctic Peninsula region into four regions: (i) Palmer Archipelago, (ii) Bransfield Strait, (iii) South Shetland Islands and (iv) Elephant Island. Tissue subsamples were taken from specimens preserved in 96–100% ethanol or frozen. Sequenced specimens were housed in the Western Australian Museum (WAM), Scripps Institution of Oceanography, Benthic Invertebrates Collection (SIO-BIC), Yale Peabody Museum (YPM), California Academy of Sciences (CAS), National Museum of Natural History, Smithsonian Institution (USNM), University of Barcelona and the Baker Laboratory, University of South Florida. Data for all specimens sequenced in this study are available in Supplementary Table S1 (IS21073_AC.zip) including GenBank accession numbers for outgroup taxa.
Table 1.
Table of COI diversity of Doris kerguelenensis individuals and species by region.
In addition to the ingroup specimens, a range of outgroup sequences were selected from GenBank (Wollscheid-Lengeling et al. 2001; Shields 2009; Pola and Gosliner 2010; Jung et al. 2014; Palomar et al. 2014; Hulett et al. 2015; Mahguib and Valdés 2015; Goodheart et al. 2018) (see Supplementary Table S1 (IS21073_AC.zip)) along with other available sequences for Doris kerguelenensis (Wilson et al. 2009, 2013). This was done as a robust sister group for Dorididae, across the Nudibranchia phylogeny is not yet certain (e.g. Wollscheid-Lengeling et al. 2001; Shields 2009; Mahguib and Valdés 2015; Korshunova et al. 2020); however, the tree was ultimately rooted with Prodoris clavigera.
DNA extraction, amplification and DNA sequencing
Total genomic DNA was extracted from 1077 ethanol-fixed or frozen samples using a DNeasy Blood & Tissue Kit (Qiagen) according to the manufacturer’s instructions (excluding the optional repeat of a final elution step). For analysis, an additional 198 previously published sequences were also incorporated (see Wilson et al. 2009, 2013). The same primers were used for PCR and sequencing (LCO1490/HCO2198, Folmer et al. 1994; or modified degenerate versions of the former, jgLCO1490/jgHCO2198, Geller et al. 2013). These primers amplified a fragment of the cytochrome oxidase I gene (COI). Each PCR reaction included: 16.8 μL of molecular grade (deionised) water, 5.0 μL 5 × MyTaq PCR buffer (1 × = 1 mM of dNTPs, 3 mM of MgC12) (Bioline), 0.8 μL of forward primer, 0.8 μL of reverse primer, 0.2 μL of Platinum Taq polymerase (Invitrogen), with 1.5 μL of DNA template added. The thermocycling protocol for COI using primers LCO1490/HCO2198 followed the procedure: 5 min. at 95°C, 7 cycles 30 s at 95°C, 30 s at 45°C, 1 min. at 72°C followed by 35 cycles of 30 s at 95°C, 30 s at 50°C, 1 min. at 72°C with a final extension time of 10 min. with a temperature of 72°C. The thermocycling protocol using primers jgLCO1490/jgHCO2198 was as follows: 3 min. at 95°C, 8 cycles of 30 s at 95°C, 30 s at 50°C, 45 s at 72°C followed by 32 cycles of 30 s at 95°C, 30 s at 48°C and 45 s at 72°C with a final extension time of 5 min at 72°C. Amplicons were screened on E-gels (Invitrogen) and the positive reactions, determined through Bio-Rad Image Laboratory software, were outsourced to the Australian Genome Research Facility (Perth) for enzyme purification and Sanger sequencing using an Applied Biosystems 3730 capillary sequencer. All sequences were assembled and edited in Geneious Prime 2020.1 (Kearse et al. 2012). In cases where amplicon bands were deemed weak or if amplicons failed to sequence, experiments were run to improve sequencing success either by increasing DNA or DNA dilutions (1:10, 1:20, 1:50 and 1:100).
Phylogenetic reconstruction and primary species hypotheses (PSH)
COI sequences from Doris kerguelenensis and outgroups were aligned with the MAFFT plugin for Geneious (ver. 1.4.0, see https://www.geneious.com/plugins/mafft-plugin/; Katoh and Standley 2013) using default settings. The sequences were used to create a single gene, maximum likelihood (ML) phylogeny using IQ-TREE (ver. 1.6.12, see http://www.iqtree.org/; Nguyen et al. 2015; Trifinopoulos et al. 2016), herein referred to as the Primary Species Hypotheses (PSH) (Fig. 2, Supplementary Fig. S1 (IS21073_AC.zip)) that will be further tested with delimitation methods. The –m TEST (Kalyaanamoorthy et al. 2017) option in IQ-TREE identified TN + F + I + G4 as the best-fit model chosen according to Bayesian Information Criterion (BIC). Nodal support was assessed with 1000 ultrafast bootstrap replicates (Hoang et al. 2018). This phylogenetic tree was used to make semi-arbitrary decisions about inferred species clades, much in the same way that taxonomists intuitvely assign names based on morphological variation (Dayrat 2005). Some of the ‘arbitrary’ species-level clades had already been corroborated with nuclear and trait data (see Wilson et al. 2013).
In previous publications assessing diversity in D. kerguelenensis, species-level clades were numbered (Wilson et al. 2009, 2013) from 1 to 32. These original numbers have been incorporated into this new work and we have continued numbering new clades from 33 onwards, with one exception. The original clade 12, originally consisting of a single specimen (USNM1120712) has now been synonymised within clade 11. The newly named clade 12 in this study comprises a previously unknown clade.
Pairwise distances (uncorrected p-distances) or Tamura and Nei (1993) (TN93) corrected genetic distances were both calculated from the mtDNA sequences using the ‘dist.dna’ function from the R.cran (ver. 4.2.0, R Foundation for Statistical Computing, Vienna, Austria, see https://cran.r-project.org/) package Ape (ver. 5.6, see https://cran.r-project.org/web/packages/ape/index.html; Paradis and Schliep 2019) with either ‘raw’ or ‘TN93’ selected as the evolutionary model ( Supplementary Fig. S2 (IS21073_AC.zip)). The gamma shape α and base frequencies were also parameters specified for the distance calculations. Intra- and interspecific (PSH clades) p-distances and TN93 corrected genetic distances were also calculated in MEGA X (ver. 10.2.6, see https://www.megasoftware.net/; Kumar et al. 2018) ( Supplementary Table S2 (IS21073_AC.zip)).
Species delimitation
The process of delimitation was carried out in a series of steps. We tested our PSH through a range of species delimitation methods that included the Multi-rate Poisson Tree Process analysis ((m)PTP, Kapli et al. 2017), statistical parsimony networks (TCS Java program, ver. 1.21, see https://bio.tools/tcs; Clement et al. 2000; Hart and Sunday 2007), the Automatic Barcode Gap Definition (ABGD, Puillandre et al. 2012) and Assemble Species by Automatic Partitioning (ASAP, Puillandre et al. 2021). All delimitation analyses were run without outgroups included in the input files.
Multi-rate PTP is designed to better accommodate sampling and population specific characteristics of a broad range of datasets as this incorporates different levels of intraspecific genetic diversity (based on specific branching events or species-specific sampling) (Kapli et al. 2017). This analysis has been shown to outperform PTP (Zhang et al. 2013) in yielding more accurate delimitations with respect to taxonomy (i.e. identifies more taxonomically accepted species).
TCS (Clement et al. 2000) bins sequences into haplotypes and calculates the frequencies of the haplotypes in the sample (Clement et al. 2000). Relationships among haplotype networks were explored using this method using ‘parsimnet’, a function in the R.cran package Haplotypes (ver. 1.1.2, C. Aktas, see https://cran.r-project.org/web/packages/haplotypes/index.html) that finds the most parsimonious networks and is an implementation of the TCS methods proposed by Templeton et al. (1992). TCS (Clement et al. 2000) has been shown to be useful for species delimitation as the parsimony connection limit (95%) successfully delimits the same number of subnetworks as taxa (see Hart and Sunday 2007).
ABGD and ASAP are both ascending hierarchal clustering programs that merge sequences into groups defined as partitions. The partitions are defined by probability and barcode gap width however, unlike ABGD that is solely based on pairwise distances (Puillandre et al. 2012), there is no need for an a priori defined P (when P is the prior maximum divergence of intraspecific diversity) within ASAP’s input functions. ASAP produces a scoring system that ranks the partitions based on a combination of both probability and gap width (‘asap-score’) (Puillandre et al. 2021). In ABGD and ASAP analyses, there are two available substitution models, Jukes and Cantor (1969) (JC69) and Kimura (1980) (K80). However, the evolutionary model selected by IQ-TREE (a variant of Tamura and Nei 1993) is not an available model option when partitioning sequences in these programs, so Tamura and Nei (1993) corrected genetic distances were imported into these delimitation analyses for comparison. We used ABGD alongside ASAP because although ASAP has been developed to replace ABGD, comparisons between the two programs will allow for direct comparision to results from previously published literature. The default parameters employed within ABGD were P min = 0.001, P max = 0.10, 10 steps, X = 1.5, Nb bins of 20. Only the initial partitions were examined from the ABGD algorithm outputs (following Puillandre et al. 2021). In ASAP, the default parameters were also used.
When assessing ABGD and ASAP partitions, we examined a range of partitioning definitions. This was done to encapsulate all inter- and intraspecific variability and accommodate for over-splitting or coalescing of recently radiated taxa, rather than only selecting a single partition to represent species boundaries. ABGD initially divides the data into groups based on a range of prior intraspecific divergences and a statistically inferred barcode gap, and subsequently recursively applies the same procedure to the initial groups (Puillandre et al. 2012). For this reason, the initial partitions generally represent species or operational taxonomic units (OTUs) whereas the recursive partitions generally reflect population level diversity. Here we only examined the initial partition outputs and only considered ABGD partition 2 (P = 0.001; to highlight the intraspecific genetic variability) and partition 6 (P = 0.01), as defining species based on less than 1% sequence divergence is not appropriate. Partition 1 was omitted from reports as the primary partition assumes that a single gap can be defined for the entire dataset however the gap distances are highly likely to differ between groups within a dataset (Puillandre et al. 2012). We considered these two partitions from both substitution model outputs (JC69 and K80) ( Supplementary Fig. S3 (IS21073_AC.zip)). ASAP-scores are the average of the P-value and relative barcode gap width of the partitions. In terms of the ASAP partition outputs, three input files were examined. The original, uncorrected alignment file was entered into the webserver ( https://bioinfo.mnhn.fr/abi/public/asap/asapweb.html#) and both substitution models were applied to the file (=uncorrected p-distances). The Tamura and Nei (1993) corrected MEGA X CSV genetic distance file was imported and tested against both JC69 and K80. A MEGA X CSV TN93 corrected distance file with the gamma shape α (0.692) and base frequencies (A = 0.245, C = 0.174, G = 0.189 and t = 0.391) specified, was also tested with both substitution models. ASAP 1st and ASAP 2nd partitions were considered due to the low (and thus better supported) ASAP scores of between 5.00 and 8.00 for all input files and substitution models (see Supplementary Fig. S4 (IS21073_AC.zip)).
Species diversity estimates
Considering the large percentage of species clades consisting of single specimens (Fig. 3), we assessed whether further diversity remained hidden due to undersampling, by generating species rarefaction–extrapolation (R–E) curves and the associated 95% confidence intervals, using iNEXT (ver. 2.0.2, see https://cran.r-project.org/web/packages/iNEXT/index.html; Hsieh et al. 2016). Through iNEXT (interpolation and extrapolation), an R.cran package (Hsieh et al. 2016), our diversity estimates used abundance-based data (Gotelli and Colwell 2011) that tally the abundance of each species (SSH results were utilised as input) across pre-determined geographic regions (Table 1). A sample-size R–E curve was produced to determine whether the rate of discovery of new species slowed down or reached saturation with an increase in sample size. To avoid discarding data, a sample-size R–E curve that rarefies to smaller sample sizes or extrapolates to larger sample sizes (i.e. plots with respect to sample size) was utilised (Colwell et al. 2012; Hsieh et al. 2016). We conducted both whole dataset and region-based R–E curves that are represented by the means of repeated resampling (1000 bootstrap replicates). Examining the slope of these curves allowed us to assess the impact of undersampling on estimating diversity.
Results
Species delimitation
A total of 1275 COI sequences (433 haplotypes) was generated and the final alignment constituted 658 base pairs (bp) in length. The primary species hypotheses (PSH) generated with a maximum likelihood (ML) phylogeny recovered 59 species-level clades within the monophyletic Doris kerguelenensis species complex, 27 more than previously known. The ML phylogeny generated through IQ-TREE showed low bootstrap support at interior nodes but mostly high support for terminal clusters (ultrafast bootstrap support >95–100), herein referred to as species. The only two species-level clades that were not supported by high bootstrap values were species 15 and 59 (bootstrap support values 69 and 58 respectively) (Fig. 2, Supplementary Fig. S1 (IS21073_AC.zip)). The proportion of invariable sites was 0.475, the gamma shape α distribution parameter was 0.692 and the base frequencies were A = 0.245, C = 0.174, G = 0.189, t = 0.391 (unequal base frequencies). Rates for the six substitution types estimated from the dataset were AC = 1.0000, AG = 10.6386, AT = 1.0000, CG = 1.0000, CT = 7.1609 and GT = 1.0000 (unequal transition rates, equal transversion rates and unequal purine and pyrimidine rates).
To assess the robustness of the PSH, results were compared to those from four delimitation methods (mPTP, TCS, ABGD and ASAP) (Fig. 2, 4, Supplementary Table S1, Supplementary Fig. S1–S4 (IS21073_AC.zip)). The Multi-rate Poisson Tree Process (mPTP) yielded 62 taxonomic units and TCS recovered 56 groups. Species delimitation based on genetic distance using ABGD analysis detected between 49 and 52 groups respectively (initial partitions 6 (P = 0.01) and 2 (P = 0.001)) using JC69 and K80 models ( Supplementary Fig. S3 (IS21073_AC.zip)). All ABGD results were identical between both substitution models (JC69 and K80, TS/TV = 2.0) (see Supplementary Fig. S3 (IS21073_AC.zip)). The ASAP species hypothesis partition, ranked by best ASAP score (ASAP 1st) delimited 59 (JC69) and 61 (K80) species, and ASAP 2nd delimited 57 (JC69) and 63 (K80) species. When the TN93 corrected genetic distance files were incorporated into analyses and a model subsequently applied, 59 species were delimited (for both JC69 or K80) in ASAP 1st, and 48 or 60 (JC69 and K80 respectively) in ASAP 2nd. When the additional gamma shape α and base frequency parameters were defined, ASAP 1st and 2nd (JC69) delimited 59 and 60 species respectively, and ASAP 1st and 2nd (K80) delimited 59 and 48 species (see all ASAP results: Supplementary Fig. S4 (IS21073_AC.zip)).
The species delimitation results for mPTP, TCS, ABGD (initial partitions 2 and 6) and ASAP (ASAP 1st and 2nd (all input file structures)); are plotted against the ML phylogeny (Fig. 2, 4, Supplementary Fig. S1 (IS21073_AC.zip)). Overall, for the Secondary Species Hypotheses (SSH), we chose to accept 59 species (Fig. 4), as this number was recovered by the best scoring ASAP partition in five of six conditions. This number of partitions was always recovered by ASAP 1st when the input alignment was adjusted with corrected distances, and for gamma shape and base frequencies.
Intraspecific colour variation (and to a lesser extent, dorsal tubercle exaggeration) was assessed to determine whether colour could be utilised as a taxonomically distinguishing characteristic. Live specimen photos ( Supplementary Fig. S5 (IS21073_AC.zip)) showed substantial colour variation within an example species (24) and was therefore deemed an uninformative trait for taxonomic purposes.
Sampling effort and geographic distributions
Most clades contained between 1 and 40 samples (mean = 21) and overall, 14 species were represented by single specimens (Fig. 3). The slope of the R–E curve approached saturation (an asymptotic shape) at around 68 species (Fig. 5), indicating that our sampling coverage was relatively high (accurately sampled enough of the landscape to capture species diversity) and we have therefore approached a realistic estimate for the number of species in this complex. The R–E curves reflecting sampling effort for specific locations such as the Palmer Archipelago (PAL), Elephant Island (EL) (Fig. 6a ), Shag Rocks (SR) and Herdman Bank (HB) (Fig. 6b ) reached an asymptote. However, the opposite result was detected for regions such as Discovery Bank (DB), the Weddell Sea (WS), Davis Station (DS), Casey Station and others.
One of the most well-sampled species, species 29 (n = 280), has a circum-Antarctic distribution (collected from over 11 000 km) and contains samples collected from Prydz Bay, the Ross Sea and the Antarctic Peninsula. An additional six species have very large distributions (defined here as >2000 km) and data show that at least four species distributions span the Polar Front (clades 14, 24, 26 and 42) (grey highlighted boxes within the PSH, Fig. 3). Thirteen clades, including the aforementioned four species (1, 2, 12, 28, 35, 44, 47, 48 and 50) have been collected above the Polar Front (Falkland Islands, Burdwood Bank, Bouvet Island and Kerguelen Plateau) and three of these species have an Antarctic continental shelf sister-species (1 with 10 + 55, 2 with 29 and 44 with 30, Supplementary Table S1 (IS21073_AC.zip)). The deepest depth at which a specimen was collected was 798 m.
Many species occurred in sympatry. Twenty-one species were collected from the Bransfield Strait, nineteen on the South Orkney microcontinent and thirteen from the Burdwood Bank, Palmer Archipelago and South Shetland Islands (Table 1). Overall, species richness remains a direct reflection of sampling intensity within these results, e.g. seven ‘well sampled’ regions (each with over 40 samples) had more than 10 species within that corresponding location (Table 1).
Discussion
Doris kerguelenensis was long thought to consist of a single, widespread species (since Wägele 1990). With the application of molecular data, multiple species delimitation algorithms and increased sampling, the species complex is currently recognised to comprise at least 59 geographically overlapping species. Here, we show: (i) 27 previously undiscovered species, (ii) further evidence that more species remain to be discovered within previously sampled regions of the Southern Ocean and (iii) new insights into the distributions of these animals.
Even more new species delimited
The primary species hypotheses (PSH) outlined 59 species within the Doris kerguelenensi s complex, of which 27 were new species-level clades. Subsequent species delimitation analyses showed highly congruent results (48 – 63 total range), with the best supported ASAP scores converging on 59 species. Our results corroborated the D. kerguelenensis clades that were previously detected through mitochondrial, nuclear and metabolomic datasets (Wilson et al. 2009, 2013), with one exception (the original clade 12, Wilson et al. 2009).
As expected, species delimitation methods based on different underlying assumptions rendered varying species hypotheses. Overall, ASAP 2nd (K80) showed the highest evidence for over-splitting (63 species), relative to the clades recognised in the PSH and all other partition analyses. We suspected that ASAP 2nd (K80) may have been over-splitting clades as mPTP also suggested there were 62 species and mPTP performed poorly when the number of species was small or consisted of a small sample size (Puillandre et al. 2021). Also, many of these new D. kerguelenensis species comprised one or only a few individuals and this may explain the similar result of 62 species delimited through this method. Our TCS analyses recovered 56 species and this was ultimately identical to the PSH with the single exception that clades 15 + 31 + 46 + 60 were merged into one. The ABGD initial partitions delimited between 49 and 52 species (JC69 and K80). This outcome was generally expected as the initial partitions of ABGD results will delimit a more conservative number of taxa, while recursive have been suggested to represent population level diversity rather than interspecific diversity (Puillandre et al. 2012). Overall, ASAP's 2nd partitions delimited between 48 and 63 species. Finally, ASAP 1st partition results (from all input file structures, except K80) all converged on 59 species and this was accepted as the most well-supported delimitation estimation.
Despite some colour variations between specimens collected (ranging from white, yellow, orange, pink, including white with pink colouration around the mouth; N. G. Wilson, pers. obs. and Supplementary Fig. S5 (IS21073_AC.zip)), most were morphologically similar. This means that gill or body colour, or tubercle structure could not be used to visually differentiate species. Additionally, when preserved, the colour of specimens disappears and the animals become a uniform off-white colour (first noted in Wägele 1990). Specimens also ranged considerably in length, from ~1 to 20 cm. Morphological studies mention that external traits such as the size and density of the dorsal papillae should only be tentatively relied on for species determination, because these change in appearance when preserved (Wägele 1990; Cattaneo-Vietti 1991). Wägele (1990) highlighted considerable morphological variation in the nervous system, the digestive system, and the colour and shape of the radula. None of these variations were consistent within those species concepts, therefore all animals examined were considered to be one species. Morphology was, therefore, not further considered in species delimitation efforts throughout this study. A phylogenetic framework does offer an opportunity to re-examine morpho-anatomical systems for additional characters, to correlate traits with molecular clades. However, given the relatively uninformative radular and reproductive systems, finding enough informative characters that could act as synapomorphies to resolve more than 59 species is unlikely (Fišer et al. 2018). Also, although synonyms proposed by Wägele (1990) may represent genetically distinguishable units, these are mostly preserved in ways that are not conducive to connecting specimens to these numbered species clades. Possibilities may however, still lie in sequencing holotypes and fixing those names to clades.
Still more species to be discovered
To date, expansion in either sampling or geographic scope (Wilson et al. 2009, 2013) has increased the number of species known in this complex. Here, we assessed the impact of potential undersampling by extrapolating known species records and found that the true number of cryptic species within this complex may be even greater, as the R–E curves illustrate undersampling of genetic diversity and helped identify geographic regions that require more investigation. The R–E curve results did not reach an asymptotic shape, but ~90% of D. kerguelenensis species expected in the sampled areas appeared to be recovered for this study. At a smaller scale, the Palmer Archipelago R–E curve reaches an asymptote at ~12 species and 250 sampling units. Other well-sampled locations that are approaching an asymptote include the Bransfield Strait and Elephant Island. Even locations such as Shag Rocks, Herdman Bank and the South Sandwich Islands, that consist of 50 specimens or less, still approach asymptotic levels, and likely reflect lower diversity in these areas, perhaps due to size or oceanic isolation (separated from other shallow subsea land masses by deep sea or abyssal plains).
This is interesting compared to regions that are well sampled (<100 specimens collected) but that do not approach an asymptotic shape within the R–E curves, such as Burdwood Bank, the South Orkney Islands or the South Shetland Islands. This likely indicates that high levels of D. kerguelenensis species occur in these three regions. Owing to the location (and ancient supercontinent link), the Burdwood Bank plays an important role in diverting circumpolar ocean flow (Fraysse et al. 2018), and is associated with the uplift of nutrient rich bottom water that in turn supports an abundant production of phytoplankton and subsequently rich levels of biodiversity (Schejter et al. 2020). The South Orkney Islands are bordered by two current regimes (the ACC and Weddell Sea Gyre) and have been recorded to host approximately one-fifth (Barnes et al. 2009; Brasier et al. 2018) of all benthic marine species recorded for the entire Southern Ocean (as estimated by Clarke and Johnston 2003). The South Shetland Islands are considered a transitional ecosystem from which new species are still regularly recorded despite being documented as one of the most well-sampled regions across the Antarctic (Barnes et al. 2008). Given the indications that further Doris species diversity is to be found in these areas, these are likely to also be very rich in general benthic marine fauna and should be considered for future biodiversity work (e.g. Barnes et al. 2009). For D. kerguelenensis, areas that should be of interest for future exploration (due to the low sample numbers but high levels of species diversity detected) include Herdman Bank, the Falkland Islands, Bouvet Island, Kerguelen Plateau, Prydz Bay, and the areas surrounding Davis and Casey stations. Totally unexplored locations for this species also include the south-west of South America where the Humboldt Current comes into contact with this coast upon breaking from the northern ACC.
Large distributions for a directly developing slug
Some of the cryptic species in the D. kerguelenensis complex documented here spanned thousands of kilometres. The Southern Ocean, while connected by major currents, still contains numerous potential dispersal barriers for benthic invertebrates, even for those that do have planktonic larvae. Barriers can include temperature and salinity changes, plankton availability, potentially uninhabitable deep sea areas, and the strength and direction of currents and gyres. These large-scale distributions therefore raise many questions regarding the mechanisms of dispersal in D. kerguelenensis.
In aquatic ecosystems, connectivity can be related to dispersal ability between populations (Jablonski 1991; Palumbi 1994; Hellberg et al. 2002; Shanks et al. 2003; Cowen and Sponaugle 2009; González-Wevar et al. 2021). Broadcast spawners and species with pelagic larval development should exhibit higher levels of connectivity and less genetic structure when compared to benthic or direct developing organisms (Ronce 2007; Gillespie et al. 2012). However, there are examples that contradict this prediction (e.g. Marko 2004; Weersing and Toonen 2009; Mercier et al. 2013; Segovia et al. 2017). There are also many benthic Southern Ocean species with large distributions and benthic development (typically showing small-scale and geographically structured distributions) that have been subject to phylogeographic studies. These include gastropods (Nikula et al. 2011a ; Cumming et al. 2014; González-Wevar et al. 2021), chitons (Nikula et al. 2011b ) and crustaceans (Nikula et al. 2010).
Dayton et al. (1970) proposed several dispersal mechanisms for benthic organisms including (i) adults rafting on detached benthic organisms, (ii) rafting on mobile organisms, either as egg masses or adults, or (iii) anchor ice removing organisms from the benthos and depositing these elsewhere. Alternatively, egg masses could be laid on sessile organisms that could be subsequently dislodged (see Wilson et al. 2009). All of these aforementioned examples are based on dispersal by forms of rafting and rafting appears to be unlikely to be the major method of dispersal for adult individuals in the D. kerguelenensis complex due to weak adhesive ability of the foot (N. G. Wilson, pers. obs.). Doris kerguelenensis species are direct developers (embryonic period recorded up to 21 months; Hain 1989; Moles et al. 2017a ) that feed on sedentary organisms such as demosponges and hexactinellid sponges (reviewed by McDonald and Nybakken 1997; Iken et al. 2002). Although there is a small possibility that some species in the D. kerguelenensis complex have planktonic larvae, there is very little evidence in support of this. If rafting is the method of dispersal, this most likely involves egg masses or juveniles.
Current geological evidence shows that small, transient, ice-free areas with reduced levels of primary production did persist, even through glacial maxima across the Southern Ocean continental shelves (including during the Last Glacial Maxima, Poulin et al. 2002; Thatje et al. 2005; Convey et al. 2009; Pearse et al. 2009). Direct development appears to have experienced strong positive selection within these refugia due to low food conditions, long developmental times and speciation driven by allopatry (Poulin et al. 2002; Pearse et al. 2009; Lau et al. 2020). As ice-free areas on the continental shelf could have enabled in situ survival of benthic fauna, populations may have experienced demographic bottlenecks within these refugia instead of complete eradication (Allcock and Strugnell 2012). If secondary contact of populations occurred before reproductive isolation and post glacial expansion, species would demonstrate widespread distributions (Allcock and Strugnell 2012).
In addition to the exentsive geographic distributions, these nudibranchs have been widely reported from depths of up to 1550 m (Iken et al. 2002). The morphologically identified sample from the deepest depth was collected from Halley Bay, in 1998 from 1549 m (Avila et al. 1999). This sample is stored at the University of Barcelona and was fixed in formalin (C. Avila, pers. obs.) and was therefore not used in this study. Here we provide the deepest barcoded record of D. kerguelenensis (798 m). Wägele (1987) similarly reports specimens from 788 m from the Weddell Sea. Although one D. kerguelenensis specimen was reported from New Caledonia from a depth of 680 m (Valdes 2001), given the large geographic gap between this single specimen and all other collected records of this species complex, this specimen likely represents a different taxon. The general absence of D. kerguelenensis from the deep sea areas surrounding Antarctica is worth noting; throughout the ANDEEP expeditions (I, II and III: Brandt et al. 2004, 2007), no D. kerguelenensis were collected from abyssal areas. The ANDEEP project set out to understand the abyssal diversity and faunal exchange between South America and Antarctica (Brandt et al. 2004). This question remains unanswered for this group of organisms and how these benthic organisms disperse across the abyssal plains of the Southern Ocean is currently unknown.
Evidence for dispersal across the Polar Front
Despite all records of D. kerguelenensis being restricted to the continental shelf (see above), some species within this complex have clearly dispersed long distances, including across the APF. These trans-APF species appear in our COI tree in four separate places and nine species were restricted to the Southern American continental shelf. The Antarctic Polar Front represents a strong geographic and oceanographic divide that across time, has split evolutionary lineages between the Antarctic and northern oceans (Page and Linse 2002; Lee et al. 2004; Hunter and Halanych 2008; Thornhill et al. 2008; Wilson et al. 2009, 2013; Krabbe et al. 2010). Very few benthic marine taxa have been recorded to span this region except a sea star (Moore et al. 2018), a brittle star (Galaska et al. 2017), sea spiders (e.g. Linse et al. 2006; Munilla and Membrives 2009; Dietz et al. 2019), an isopod species (Leese et al. 2010) and a tritoniid nudibranch (Moles et al. 2021). Most of these species have a dispersive larval stage except for the isopod that has had long-distance dispersal linked to rafting. Additionally, the only directly developing nudibranch example listed is Tritonia vorax, a sub-Antarctic species that has been recorded from the Southern South American continental shelf and South Georgia in the Scotia Arc (Moles et al. 2021). All of these individuals or populations, unless already in deep water, also overcame the temperature gradient at the Antarctic Polar Front (APF) (3–4°C) that differentiates the SO from more northerly oceans (see Park et al. 2014). The APF is one of the world’s strongest open ocean barriers that is known to isolate the Southern Ocean from warmer waters at lower latitudes and was formed after the opening of a deep-water passage (the Drake Passage) between South America and Antarctica c. 35 Ma (Livermore et al. 2004; Pfuhl and McCave 2005; Barker et al. 2007). The Antarctic Circumpolar Current (ACC) and major Antarctic gyres (in the Weddell and Ross Seas), however, can act as large-scale dispersal vectors that either jet eastward around Antarctica and meet the southernmost region of Southern America (ACC) or rotate clockwise (gyres) due to the interaction between the APF and ACC. In principle, these currents may have driven the dispersal of D. kerguelenensis (Barker and Thomas 2004), however this mechanism remains unclear.
Future directions
Antarctica has long been associated with the diversification and speciation of many benthic marine taxa (Briggs 2003; Rogers 2007; Strugnell et al. 2008), including heterobranch gastropods (Wägele et al. 2008; Martynov and Schrödl 2009; Moles et al. 2017b ). In support of this, we found that there is a multitude of undescribed species within the Antarctic species complex D. kerguelenensis (Bergh, 1884) (59 detected and more than 68 estimated). Although the monophyly of the complex is well supported, the interspecific relationships remain unresolved. Genetic barcoding of the complex, that is characterised by intra- and interspecific morphological similarities, appears effective in delimiting species and highlights how molecular phylogenetics can uncover cryptic diversity. To gain phylogenetic resolution within the complex, we now need large quantities of sequence data from across the genome. Future work could incorporate transcriptomes, exon capture or ultra-conserved elements (UCEs). Several recent studies have shown the power of these methods for marine invertebrate taxa (e.g. Horowitz et al. 2020; Layton et al. 2020; Richards et al. 2020). Better knowledge of this species complex will help us to understand the evolutionary dynamics of benthic taxa in Antarctica and provide a phylogenetic scaffold for tracing the evolution of secondary metabolites (chemical defence compounds) through time and across species.
Also, very few Antarctic marine studies have sampled widely enough to assess whether benthic Antarctic invertebrates are truly circumpolar (but see Allcock et al. 2011; Arango et al. 2011; Hemery et al. 2012; Moore et al. 2018; Moles et al. 2021). Most studies are restricted by the logistical challenges that occur when sampling such a vast and remote ecosystem. Future research should have a strategic focus on exploring distributions and understudied regions, specifically the Kerguelen Plateau and East Antarctica as these are likely to contain additional undiscovered diversity. Understanding the extent of diversity is especially important because marine systems worldwide are being affected by climate change, pollution, biological invasions and a host of other anthropogenic disturbances. If no immediate action is taken to address the increasing loss to biodiversity, Antarctic marine systems, along with other unique marine environments spanning the globe, will be at risk of degradation and homogenisation.
Data availability
Supplementary data for this article can be accessed in the Supplementary material. COI sequences are available through GenBank ( https://www.ncbi.nlm.nih.gov/): ON419127-ON419135.
Conflicts of interest
The authors declare that Dr Nerida Wilson and Dr Greg Rouse are both Editors of Invertebrate Systematics, but did not at any stage have editor-level access to this manuscript while in peer review, as is the standard practice when handling manuscripts submitted by an editor to this journal. Invertebrate Systematics encourages editors to publish in the journal and are kept totally separate from the decision-making processes for manuscripts. The authors have no further conflicts of interest to declare.
Declaration of funding
The work was supported by the Antarctic Circumnavigation Expedition (carried out by the Swiss Polar Institute, supported by the ACE Foundation and Ferring Pharmaceuticals), the French Polar Institute and LTSER ZATA (#1044), the US National Science Foundation (PLR-1341485, ANT-0551969 to A. L. Moran, ANT-0440577 to H. A. Woods, and DEB-1846174 to K. M. Kocot), BLUEBIO (CTM2016–78901/ANT), the Society of Australian Systematic Biologists (SASB), the University of Western Australia Oceans Institute (UWA-OI) Robson and Robertson award, the Malacological Society of Australasia (MSA) and the Antarctic Science Foundation (ASF). This work was also supported by ARC SRIEAS Grant SR200100005 Securing Antarctica’s Environmental Future.
Acknowledgements
We thank staff and students at these institutions for helping make samples available: Western Australian Museum (WAM), the Baker Laboratory; University of South Florida (USF), Scripps Institution of Oceanography (SIO-BIC), the Australian Antarctic Division (AAD), the Alfred Wegener Institute (AWI), the Yale Peabody Museum (YPM), the California Academy of Sciences (CAS) and the National Museum of Natural History, Smithsonian Institution (USNM) for making samples available. In particular, we appreciate the efforts of Charlotte Seid (SIO-BIC), Sarah Dietrick and Nicole Avalon (USF), Elie Poulin (UCHILE), Javier Naretto (Costa Humboldt) and Christoph Held (AWI) in supporting this work. We are grateful to Mia Hillyer, Alex Hickling and Linette Umbrello (all WAM) for extraction and sequencing assistance, and also to Linette Umbrello for analytical advice and assistance. We thank all expedition participants for help with collecting samples. Finally, we thank Dr Ana Riesgo, the Invertebrate Systematics publishing team and the two anonymous reviewers for their insight and advice.