Pollen development is a biological process involving many events, including anther cell division and differentiation, microsporocyte meiosis, tetrad microspore release, microspore mitosis, and pollen coat development. These events rely on the functions of numerous genes from the microspore, tapetum, and other sporophytic anther tissues (Ma, 2005; Zhu et al., 2011); therefore, dysfunction of these genes may lead to male sterility. Global transcriptome profiling can provide insights into the functional elements of the genome and reveal the molecular constituents of male sterility–related tissues. Thus, thousands of transcripts expressed in flowers and anthers have been identified in brassica species such as Brassica oleracea L. (Kim et al., 2014; Ma et al., 2015), B. rapa L. (Tong et al., 2013), B. juncea (L.) Czern. (Paritosh et al., 2014), and B. napus L. (An et al., 2014; Qu et al., 2015). Currently, high-throughput RNA sequencing (RNA-Seq) is a powerful and cost-effective tool for transcription profiling (Bancroft et al., 2011; Tong et al., 2013; An et al., 2014; Zhao et al., 2014; Ma et al., 2015; Qu et al., 2015). RNA-Seq is superior to other technologies in detecting low-abundance transcripts, differentiating biologically critical isoforms, and identifying genetic variants (Zhao et al., 2014), including alternative splicing (Marioni et al., 2008), gene fusion (Ozsolak and Milos, 2011), simple sequence repeats (SSRs), and single-nucleotide polymorphisms (SNPs).
Male sterility in plants is widely studied because it can facilitate the manipulation of hybrid vigor (heterosis). Among various male sterility types, environment-sensitive male sterility has been considered a special model for examining the interactions between genes and temperature or photoperiod (Ding et al., 2012; Pan et al., 2014; Zhou et al., 2014). Male sterility contributes greatly to sustainable increases of rapeseed production to meet the growing demands for both edible and biofuel oils. A new type of rapeseed thermo-sensitive genic male sterility (TGMS) mutation, SP2S, was discovered by us in 2007 (Yu et al., 2015). SP2S is male fertile under cool conditions (temperature <18°C) but sterile under warm conditions (>20°C). TGMS SP2S is controlled by at least two pairs of recessive genes, and all other cultivars can restore SP2S fertility. The combining ability of SP2S on seed yield and seed quality was satisfactory, and the male sterility genes in SP2S showed no adverse effect on the performance of F1 hybrids (Yu et al., 2015). Thus, SP2S can be used as a promising pollination control system for hybrid production. Some structural abnormalities in SP2S were observed, including the premature breakdown of an extremely vacuolated tapetum and a delayed dissolution of the tetrad wall covering the microspore (Yu et al., 2015). We attempted to find some molecular clues to the TGMS system using a proteomic comparison of the floral buds of SP2S with its near-isogenic line (NIL) SP2F, but we identified only 28 differentially accumulated peptides (Zhang et al., 2016). Transcriptomic analysis of a physiological process may obtain additional information that differs from proteomic analysis due to the very high resolution of transcriptome profiling and a limited correspondence between the mRNA level and protein abundance (Kubala et al., 2015). Therefore, in this study, we performed RNA-Seq and de novo transcriptome assembly to generate a flower bud transcriptome of rapeseed TGMS line SP2S. Numerous SNPs, SSRs, and differentially expressed transcripts (DETs) were identified in the transcriptome. The transcriptome assembly provides a platform for our future study of reproductive responses of rapeseed TGMS to temperature changes.
Primers designed for the selected differentially expressed transcripts.
MATERIALS AND METHODS
Plant materials —The plants of the TGMS line SP2S and its NIL SP2F, originating from same genetic background (Yu et al., 2015), were grown in the experimental field of Northwest A&F University (Yangling, China) in September 2013 and transplanted into a greenhouse in December 2013. When the plants were bolting, both SP2S and SP2F plants were divided into two groups and assigned to cool (16°C) or warm (22°C) treatments in temperature-controlled rooms where the photoperiod was longer than 14 h. The four treatments were designated S16 (SP2S at 16°C), S22 (SP2S at 22°C), F16 (SP2F at 16°C), and F22 (SP2F at 22°C). The plants of S22 would be male sterile when blossoming, but the other three treatments would be fertile. Young floral buds 1–3 mm long (from microsporocyte to uninucleate microspore stage) were excised from a minimum of 10 plants from each treatment. The samples were preserved in RNAlater (LC-Bio, Hangzhou, China) for later use.
RNA extraction and RNA-Seq —Samples were crushed in liquid nitrogen, and the total RNA was extracted using TRIzol Reagent (Invitrogen, Carlsbad, California, USA). After purification using a TRK1001 Kit (LC Science, Houston, Texas, USA), the quantity and purity of the total RNA were analyzed using a Bioanalyzer 2100 (Agilent, Palo Alto, California, USA). Then, 10 µg RNA of each sample with an RNA integrity number (RIN) ≥7.0 was used to isolate mRNA using poly(T) oligo-attached magnetic beads. The mRNA was fragmented into small pieces using divalent cations, and the cleaved mRNA fragments were reverse-transcribed to produce a normalized cDNA library using the mRNA-Seq Sample Preparation Kit (Illumina, San Diego, California, USA). The average insert size for libraries was 300 ± 50 bp. The cDNAs were sequenced on an Illumina HiSeq 2500 platform as 150-bp paired-end reads, with at least five technical replications. The generated raw data were trimmed to remove nonsense sequences including adapters, low-quality reads, vectors, and very short sequences. The clean reads of four samples were pooled together to assemble a transcriptome in the Trinity software package (Haas et al., 2013) using the default parameters ( http://trinitymaseq.sourceforge.net/). Trinity is a well-known method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. After assembly, the longest transcript at each locus (designed as comp*_c*_) was considered the unigene for subsequent annotation.
Summary of the transcriptomic data of the four samples.
Analysis of sequence variations—To obtain information about gene mutants and RNA editing sites, SNPs from sample S22, compared to S16 and F22, were screened using Isaac Variant Caller (Starling2) in the Bowtie package (Langmead et al., 2009), and highly repeated loci were further designated using SAMtools (Li et al., 2009). We also searched for cDNA-derived SSRs in the assembled transcripts using the MIcroSAtellite Identification Tool (MISA) (Thiel et al., 2003).
Functional annotation and classification —We searched the assembled transcripts in the databases of the National Center for Biotechnology Information (NCBI; https://www.ncbi.nlm.nih.gov/) and the Joint Genome Institute (JGI; U.S. Department of Energy; https://phytozome.jgi.doe.gov/pz/portal.html#) using Bowtie (Langmead et al., 2009) to find the best hits in Brassica genomes. In addition, the coding sequences (CDS) of the assembled transcripts were identified using GENSCAN (Burge and Karlin, 1997). Then, the translated amino acid sequences were compared with the proteins in the NCBI nonredundant (nr), Swiss-Prot, Pfam, Kyoto Encyclopedia of Genes and Genomes (KEGG), and Clusters of Orthologous Groups (COG) protein databases using Blast2GO software (Götz et al., 2008) with an E-value threshold of 10-5. The expression of each transcript was measured by the FPKM (fragments per kilobase of exon model per million mapped reads) value, which was calculated using Cufflinks (Trapnell et al., 2010), and the statistical analyses were performed using Cuffdiff program in Cufflinks. Functional categorization of Gene Ontology (GO) terms corresponding to the selected genes was performed by searching the best protein match against the GO database ( http://www.geneontology.org/) according to their molecular function, biological process, and cellular component ontologies. The KEGG pathway assignments were carried out by sequence searching against the database retrieved from http://www.genome.jp/kegg/. The GOs and KEGGs among the four samples were collected and the significances of changes were tested using a hyper-geometric test (one-tailed Fisher's exact test). A hierarchical clustering analysis of selected DETs across the four samples was performed based on their FPKM values using Cluster 3.0 software ( http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm).
Statistics of transcript length of the transcriptome assembly.
Validation of gene expression level by real-time PCR —To validate the gene expression estimated by FPKM value in RNA-Seq, specific primers (Table 1) were designed for several DETs using Beacon Designer 7.0 (Bio-Rad Laboratories, Hercules, California, USA). Quantitative real-time PCR assays were performed in triplicate using SYBR Green PCR Master Mix (Applied Biosystems, Waltham, Massachusetts, USA) on a QuantStudio 3 thermal cycler (Thermo Fisher Scientific, Waltham, Massachusetts, USA). The B. napus beta-actin7 gene was used as the internal control, and the variations in the four samples were calculated using a well-known delta-delta threshold cycle relative quantification method 2−ΔCT (Livak and Schmittgen, 2001), where ΔCT = (CTgene − CTactin).
RNA-Seq and de novo transcriptome assembly —The earliest cytological abnormalities in the SP2S anther occurred during the stage from microsporocyte meiosis to tetrad microspore (Yu et al., 2015), and the length of the corresponding young floral buds was ≤3 mm. Thus, the young floral buds of the four treatments, which were designated S16 (SP2S at 16°C), S22 (SP2S at 22°C), F16 (SP2F at 16°C), and F22 (SP2F at 22°C), were collected simultaneously to extract mRNA. The four transcribed cDNA libraries were sequenced using an Illumina HiSeq 2500 system. We assessed the quality of the raw reads and trimmed them to form four sets of data, which had an average length of 95.16 bp. In total, we obtained 239.7 million clean reads, corresponding to 19.95 Gbp of sequences (Table 2). The sequencing depth of the four samples with an average of 4.98 Gbp data was approximately 49× because it was presumed that the total coding nucleotides of the 101,040 predicted genes in B. napus was 101.157 million bp (Chalhoub et al., 2014). The raw sequencing data of the above samples were submitted to NCBI (Sequence Read Archive [SRA] accessions SRR2052475, SRR2052499, SRR2052502, and SRR2052505).
The reads from all four samples were pooled together for de novo assembly using Trinity, and 322,134 fragments corresponding to 135,702 unigenes were produced (Table 3). The sequences of all 135,702 transcripts have been deposited in the NCBI Transcriptome Shotgun Assembly (TSA) database (accession GDFQ00000000). The total length was 107,029,193 bp, with an average length of 784 bp (201–11871 bp). There were 37,350 short fragments 200–300 bp in length and 9070 long transcripts (≥2000 bp), as shown in a vertical histogram (Fig. 1) of the size distribution of the transcripts.
Identification of cDNA-derived SSRs —We searched the de novo transcriptome assembly for SSRs that were defined as monomer, dimer, trimer, tetramer, pentamer, and hexamer repeats, with at least five repeats and being at least 18 bp in length. The results revealed a total of 24,157 potential cDNA-derived SSRs that were distributed in 22,927 unigenes (Fig. 2; Appendix S1 (apps.1700077_s1.xlsx)). Most of these SSRs were dimers (9312), trimers (7792), and monomers (6274), but only a small portion of them were tetramers (191), pentamers (329), and hexamers (259). There were 200 transcripts containing two or more SSRs, and 1141 SSRs represented compound formation. Among all SSRs, the motif AG/CT had the highest frequency (792), followed by motifs GA/TC (715), A/T (426), CTC/GAG (271), AAG/CTT (218), and GAA/TTC (190). The SSRs identified in this study are valuable resources for identifying polymorphic SSR markers for the genetic analysis of rapeseed.
Analysis of SNPs in the transcriptome —We searched for SNPs between the alignments of S22 vs. S16 and S22 vs. F22 using Bowtie software (Langmead et al., 2009). In total, 137 and 195 SNPs were identified in the comparisons of S22 vs. S16 and S22 vs. F22 ( Appendix S2 (apps.1700077_s2.xlsx)), respectively. Some SNPs found among the four samples may be produced by DNA mutation, alternative splicing, or RNA editing.
Functional annotation of the assembled transcripts —We searched the 135,702 transcripts in the databases of NCBI and JGI using Bowtie (Langmead et al., 2009) and obtained 117,366 and 98,087 positive hits ( Appendix S3 (apps.1700077_s3.xlsx)), respectively. The remaining 13.51% and 27.72% of the 135,702 contigs had no matches in the NCBI and JGI Brassica databases. More than 90% of the identified transcripts had the highest homologies with A. thaliana (L.) Heynh. or A. lyrata (L.) O'Kane & Al-Shehbaz (Fig. 3), and fewer than 5% of the top matches hit B. rapa, B. napus, or B. oleracea due to the limited number of the Brassica protein sequences available in the NCBI database. We also searched the putative proteins transformed from the CDS of the 135,702 transcripts in various databases, and 33.77–55.59% of them had positive hits in the NCBI nr protein, Swiss-Prot, Pfam, KEGG, and COG databases (Table 4). Moreover, the 43,680 identified proteins in COG were categorized into 25 clusters. Among these categories, the cluster for “general function prediction only” was the largest group, followed by the categories “posttranslational modification, protein turnover, chaperones” and “signal transduction mechanisms.” The categories “cell motility” and “nuclear structures” had the fewest corresponding genes (Fig. 4).
Percentage of BLAST hits of the 135,702 transcripts in the databases of NCBI (nucleotide and nonredundant protein), JGI, Swiss-Prot, Pfam, KEGG, and COG.
Analysis of differentially expressed transcripts and KEGG pathways—The expression level of each transcript in the four samples was estimated by FPKM and the significant DETs; the expression patterns were different among the four samples, with P value ≤0.05 and fold change ≥2 extracted ( Appendix S4 (apps.1700077_s4.xlsx)). When the transcript expressions of the same line treated under warm (22°C) and cool (16°C) conditions were compared, we found 2532 DETs in S22 vs. S16 (286 downregulated and 2235 upregulated in S22), whereas in the NIL SP2F, there were only 1224 DETs in F22 vs. F16 (221 downregulated and 988 upregulated in F22) (Table 5). There were 2320 DETs between SP2S (S22) and SP2F (F22) at 22°C (448 upregulated and 1782 downregulated in S22), whereas at 16°C, there were only 576 DETs between samples S16 and F16 (237 downregulated and 306 upregulated in S16) (Table 5). These DET numbers suggest that the transcriptional profile in TGMS SP2S was more affected by temperature than it was in SP2F and that warm conditions resulted in more DETs than cool temperatures. The comparisons of S22 vs. F22 and S22 vs. S16 indicated that 1013 DETs were shared (Fig. 5). In the sample of SP2S at 22°C, we found an abnormal regulation of some interesting genes that are important for cell wall construction, cell division and growth, lipid metabolism, hormone and temperature response, autophagy, RNA splicing, and nuclease activity (Fig. 6). For example, three genes encoding callose synthase GSL2, GSL10, and GSL12 were upregulated in SP2S at 22°C. Of these, GSL2 is required for the formation of the cell wall surrounding the microsporocyte and GSL10 is involved in pollen development at the mitotic division stage (Shi et al., 2015). Moreover, the genes encoding A6 and other beta-glucanases involved in callose dissolution were also highly upregulated. Some genes encoding the proteins that control meiosis and the cell cycle, including cyclin-dependent kinase inhibitor 7, ribonuclease H2 subunit B, S-phase kinase-associated protein 1, ribosomal protein, sister chromatid cohesion protein PDS, and cohesin complex subunit SA-1/2, were downregulated. In contrast, cyclin A and cell division cycle 20 were upregulated in the male sterile plants. These transcriptional regulations corresponded well to the cytological abnormalities, including asynchronous microsporocyte meiosis (Liu et al., unpublished data) and a delayed degradation of the tetrad wall, which led to aborted microspores with defective exine (Yu et al., 2015).
The number of differentially expressed transcripts (DETs) among the four samples.
Because the DETs were identified without biological replication, these results are only preliminary and require additional verification. For example, the expression of 10 selected genes involved in callose degradation, transcription regulation, pollen development, and signal transduction were further assayed by quantitative real-time PCR. The change of FPKM values of each selected gene across the four samples (Fig. 7) had a similar trend to the relative fold change estimated by the ΔCt method. Thus, we believe that the RNA-Seq experiment can produce a useful gene expression profile.
The further evaluation of each of the numerous DETs individually would be challenging. Alternatively, we considered a rough comparison of GO and KEGG among those samples because GO and KEGG enrichment can reflect the overall changes of most transcripts. There were a total of 270 GOs ( Appendix S5 (apps.1700077_s5.xlsx)) that were influenced in SP2S under warm vs. cool temperature conditions. A comparison between SP2S and SP2F under the warm temperature condition also revealed 357 GOs being disturbed ( Appendix S5 (apps.1700077_s5.xlsx)). There were 51 and 49 plant KOs (KEGG orthology) showing significant differences between S22 and F22 and between S22 and S16 (Table 6, Appendix S6 (apps.1700077_s6.xlsx)), respectively, including cell division (meiosis, circadian rhythm, cell cycle, and MAPK signaling pathway), transcription and translation related (ribosome, spliceosome, RNA polymerase, basal transcription factors), and energy supply related (oxidative phosphorylation, CO2 fixation, citrate cycle, pyruvate metabolism, photosynthesis, and sugar metabolism). Some pathways related to amino acid metabolism, especially proline metabolism, which is important for pollen development (Mattioli et al., 2012), and protein modification and degradation-related KOs (ubiquitin-mediated proteolysis, proteasome, endocytosis, and lysosome), were also influenced. Additionally, lipid metabolism and the biosynthesis of flavonoids and carotenoids were disturbed.
De novo transcriptome assembly revealed new information for future study —De novo assembly of B. napus transcriptome data has not been previously reported, with the exception of some related species such as B. juncea and B. oleracea, which have transcriptome assemblies containing 133,641 and 205,046 transcripts, respectively (Kim et al., 2014; Sinha et al., 2015). Alternatively, alignment of shotgun sequencing RNAs against a reference genome (reference-based strategy) is facilitated by the establishment of reference genome sequences of some Brassica species ( http://brassicadb.org/brad/blastPage.php?, http://www.genoscope.cns.fr/brassicanapus/, and https://phytozome.jgi.doe.gov/pz/portal.html#). Although the reference-based strategy is powerful, it is unable to detect structural alterations that are not present in the reference sequence data; this is particularly true when the read lengths are short (Birol et al., 2009). Spliced reads that span large introns can be missed because aligners often allow only a fixed length of introns (Martin and Wang, 2011). The success of the reference-based strategy depends on the quality of the reference genome, but many genome assemblies contain misassemblies and large deletions (Martin and Wang, 2011). Thus, even if a reference genome is available for Brassica plants, de novo transcriptome assembly should be performed (Xu et al., 2016). Moreover, transcriptome sequencing of different genotypes can provide novel insights into the diverse rearrangements of chromosome fragments, which may be caused by gene translocation, deletion, fusion, and/or recombination, underlying the genetic diversity among various genetic resources. In addition, the sequencing data obtained in this study can be used to develop cDNA-derived molecular markers, including SSRs and SNPs.
De novo transcriptome assembly can recover transcripts that are transcribed from segments of the genome that are missing from the genome assembly (Martin and Wang, 2011). Our assembly, containing 135,702 contigs, was larger than a previous estimation of approximately 101,040 gene models in the B. napus genome (Chalhoub et al., 2014), as were the assemblies of B. juncea and B. oleracea (Kim et al., 2014; Sinha et al., 2015). In addition to redundant information, overlapping of contigs, alternative splicing, and mRNA edition, we believed that the assembly should also contain some novel genes and/or transcripts because it was unlikely that the rest of the unmatched contigs (13.51% in the NCBI nr protein database and 27.72% in JGI) in the 135,702 transcripts were false assemblies. Although some unavoidable mistakes might exist in our assembly, it should be useful as a reference for the further analysis of gene expression and functional genomics studies. For instance, in the transcriptomic profiling comparison between the male sterile plants induced by the herbicide amidosulfuron and the control using a digital gene expression tag method, we identified 516 DETs that aligned well in the de novo transcriptome assembly (Liu et al., 2017). Of those, 486 DETs had perfect match in the NCBI database. Moreover, our transcriptome sequences will enhance the quality of gene annotation and functional analysis of the rapeseed genome because most genes in the shotgun sequencing assemblies were not yet annotated.
The aberrant transcriptional regulations corresponded to the abnormal phenotypes —Although it is difficult to anchor the male sterility gene itself from analysis of global differential expression between the male sterile mutant and wild type, this approach is useful to unveil some subsequent genes involved in anther abortion (i.e., male sterility—associated genes). The results of this study suggest that although male sterility genes are the trigger, the subsequent pollen abortion often involves many genes and multilevel biological pathways, as most male sterilities do (Ma, 2005; Zhu et al., 2011; An et al., 2014; Kim et al., 2014; Ma et al., 2015; Qu et al., 2015). In other studies, some common pathways were disrupted, such as the ubiquitin-proteasome system in the TGMS system considered in the current study, photoperiod-sensitive male sterility cotton (Ma et al., 2013), and rice tms5 mutation (Zhou et al., 2014). This finding suggests that the pollen-abortion patterns of different species could share some common steps (Parish et al., 2012; De Storme and Geelen, 2014). Moreover, microsporogenesis of plants undergoing abiotic stress typically shows a reduced level of energy supply in the anthers (Kubala et al., 2015). The differentially expressed transcripts in such pathways as citrate cycle, pyruvate metabolism, oxidative phosphorylation, photosynthesis, CO2 fixation, and sucrose metabolism will affect pollen development in SP2S because carbohydrates and energy supply are crucial to anther and pollen development. Environmental stress and, more specifically, temperature stress impact the subsequent processes in meiotic cell division (De Storme and Geelen, 2014). The impact of temperature stress on meiosis, circadian rhythm, cell cycle, basal transcription factors, RNA polymerase, and purine and pyrimidine metabolism will arrest the anther and pollen development of TGMS SP2S. Moreover, the biosynthesis of flavonoids and carotenoids, as well as fatty acid metabolism, which were impaired in SP2S, are also thought to be crucial for pollen wall construction (Parish et al., 2012; Ma et al., 2013). These aberrant genetic regulations in SP2S corresponded well to abnormal phenotypes such as an asynchronous microsporocyte meiosis (Liu et al., unpublished data), delayed dissolution of the tetrad wall, exine fusion, and premature breakdown of the tapetum (Yu et al., 2015).
Plant KEGG Orthologies (KOs) that were significantly influenced in the comparisons between S22 and S16 and between S22 and F22.
In summary, a total of 135,702 transcripts were revealed in the young flower buds after RNA-Seq and de novo assembly using Trinity software. To our knowledge, this is the first largescale transcriptome sequencing of a rapeseed TGMS line. These transcriptome data may serve as a reference and provide important insights for future studies of genetic regulation of rapeseed microsporogenesis.
This work was financially supported by the National Transgenic Research Projects of China (2018ZX08020001) and Yangling Sci-Tech Plan Program (2016NY-04). The authors thank Mr. Zhiyuan Huang at LC-Bio, Hangzhou, China, for his assistance with data analysis.