A new member belonging to the family of growth-promoting glycoproteins referred to as imaginal disc growth factors, IDGF, was identified from the root weevil Diaprepes abbreviatus (L.), (Coleoptera: Curculionidae). The imaginal disc growth factor full length cDNA transcript, designated as idgf-DRW, was cloned and identified from tissue of adult, teneral DRW females. Sequencing and subsequent homology comparisons of the nucleotide sequence (GenBank accession no. AY821658) indicated that the open reading frame (ORF) consisted of 1329 bases and encoded a putative protein of 442 amino acid residues with a calculated molecular weight of 49.5 kDa and a pI value of 6.68. BLASTX comparisons of the idgf-DRW cDNA sequence showed that the deduced amino acid sequence designated as IDGF-DRW (AAV68692.1) had 43% to 51% similarity with IDGF1-5 and DS47 from Drosophila melanogaster, D. simulans, and D. yakuba, 51% similarity with IDGF in Pieris rapae, and 51% similarity with an IDGF-like protein in Bombyx mori. Signal P analysis revealed that the predicted IDGF-DRW contained a signal peptide of 23 amino acid residues located at the N-terminus, similar to other known IDGF proteins. The structure of IDGF-DRW was predicted based on the characterized IDGF2 from D. melanogaster as the model. The deduced amino acid sequence for the IDGF-DRW protein had 48% similarity with Drosophila melanogaster IDGF2. The predicted IDGF-DRW displays the characteristics found in Drosophila IDGFs, the fold of family 18 glycosyl hydrolases, with an insertion (Gly304 to Phe392) in the beta barrel between strand β7 and helix alpha-7 that forms an additional α β domain similar to that of Serratia marcescens chitinases A and B. An identified nucleotide change which results in an amino acid change within the active binding site in IDGF-DRW also was observed. The significant similarities of IDGF-DRW to other members within the family of IDGFs support its classification as a new member of the invertebrate growth factors and the first IDGF to be identified from a coleopteran.
Abbreviations: EST, expressed sequence tag; DS47, chitinase-like protein precursor; CHIT, chitinase; IDGF, imaginal disc growth factor protein; DRW, diaprepes root weevil; idgf, imaginal disc growth factor transcript.
Information about the action of insect growth-regulating hormones is of interest for two reasons. First, the functions and cellular actions of invertebrate and vertebrate hormones have been shown to be remarkably conserved; thus, what we learn about insects may also enhance our understanding of hormonal regulatory processes in vertebrates. Second, by understanding how insect hormones function and interact in the regulation of insect development, we may be able to devise safe and specific agents to disrupt the insect life cycle, thus increasing the efficiency of efforts to manage agricultural pests and disease vectors. The pest system herein is the combination of Diaprepes abbreviatus (L.), Diaprepes root weevil, DRW, and Phytophthora which can cause severe tree decline and destroy groves within a few years (Graham et al. 1996, 2003). Diaprepes continues to be a major concern of citrus growers in Florida due to the difficulty of timely detection of larval infestations in the soil, and the paucity of effective management options (Lapointe 2000). To identify genes which regulate DRW development we constructed a gene expression cDNA library from DRW.
Little information on insect growth factors was available until 1996 when an insect growth factor was purified from the conditioned medium of an embryonic cell line, NIH-Sape-4, of Sarcophaga peregrine (flesh fly) (Homma et al. 1996). In 1999, a new developmental gene family of growth-promoting glycoproteins, referred to as the imaginal disc growth factors (idgf), was identified in Drosophila melanogaster (Kawamura et al. 1999). The IDGF proteins promote cell proliferation in the imaginal discs and were the first polypeptide growth factors to be reported from invertebrates. Throughout all developmental stages in variable patterns, IDGF proteins are expressed suggesting that idgf is an important gene having multiple functions during the entire life cycle (Kawamura et al. 1999). Though little is known about their precise mode of action, idgf genes appear to cooperate with insulin to stimulate the growth of imaginal disc cells. IDGF proteins are structurally related to chitinases from which they may have evolved, rather than to known growth factors, but have no known catalytic activity. Several nonenzymatic proteins with sequence homology to chitinases have been described in vertebrates indicating that the typical chitinase-like fold may be present in proteins with a wider range of biological functions other than chitin degradation (Hakala et al. 1993; Morrison & Leder 1994; Shackelton et al. 1995; Hu et al. 1996; Owhashi et al. 2000; Chang et al. 2001; Varela et al. 2002). The current idgf gene family in Drosophila is comprised of six members (idgf1-5 and ds47) which encode proteins sharing 40-50% similarity to one another in amino acid sequence. Fortunately the IDGF2 member has had the crystal structure described from X-ray crystallography (Varela et al. 2002). Studies have shown that computational predictions of protein structures to elucidate functions and interactions have reached an acceptable level of accuracy, especially when there are related proteins that have been well characterized (Martelli et al. 2003; Aloy & Russell 2003; Bell & Ben-Tal 2003; Valencia 2003). So far, IDGF proteins have been reported from the lepidopterans Bombyx mori (Tsuzuki et al. 2001) and Pieris rapae (Asgari & Schmidt 2004), and the dipterans Drosophila melanogaster, D. simulans, and D. yakuba (Zurovcova & Ayala 2002). Rigorous evaluations of current hypotheses explaining growth promoting activation by IDGFs are still awaiting evidence from biochemical studies that will define the specificity of the ligand binding of these novel invertebrate growth factors; however, Varela et al. (2002) has postulated that the observed stimulation of imaginal disc cell proliferation by the cooperation of IDGFs and insulin may be a requirement to achieve optimal signaling of the insulin receptor. Invertebrate imaginal discs express an insulin receptor which is homologous to that of vertebrates (Garofalo & Rosen 1988; Fernandez et al. 1995; Ruan et al. 1995) and which is required for normal growth (Chen et al. 1996). An in-depth review is provided by Held (2002).
Herein we present the discovery of a coleopteran IDGF and its full length idgf transcript which was isolated and cloned from the Diaprepes root weevil, DRW. The genetic information from DRW will aid our understanding of the developmental and biological pathways important to the survival of DRW, and supports the development of new management strategies against DRW by identifying genes critical in developmental pathways.
Materials and Methods
Diaprepes Root Weevil Rearing and Collection
Larvae of DRW were obtained from a colony maintained at the U.S. Horticultural Research Laboratory (USHRL), Ft. Pierce, Florida. Individuals were reared in cups containing artificial diet at 26°C as described by Lapointe & Shapiro (1999). Callow teneral adults recently emerged from the pupal exuvium were selected and separated by gender for processing.
Library Construction
Seventeen whole teneral female DRW were used in the construction of an expression library. The insects were ground in liquid nitrogen and the total RNA extracted with guanidinium salt-phenol-chloroform procedure as previously described by Strommer et al. (1993). Poly(A)+RNA was purified with Micropoly(A) Pure™ according to the manufacturer's instructions (Ambion, Austin, TX, USA). A directional cDNA library was constructed in the Lambda Uni-ZAP® XR vector with Stratagene's ZAP-cDNA Synthesis Kit (Stratagene, CA, USA). The resulting DNA was packaged into Lambda particles with Gigapack® III Gold Packaging Extract (Stratagene, CA, USA). An amplified library was generated with a titer of 1.0 × 109 plaque-forming units per mL. Mass excision of the amplified library was accomplished by Ex-Assist® helper phage (Stratagene, CA, USA). An aliquot of the excised, amplified library was used for infecting XL1-Blue MRF' cells with subsequent plating on LB agar containing 100 μg/mL ampicillin. Bacterial clones containing excised pBluescript SK(+) phagemids were recovered by random colony selection.
Sequencing of Clones and Computer Analysis
pBluescript SK(+) phagemids were grown overnight at 37°C and 240 rpm in 96-well culture plates containing 1.7 mL of LB broth/well, supplemented with 100 μg/mL ampicillin. Archived stocks were prepared from the cell cultures with 75 μL of a LB-amp-glycerol mixture and 75 μL of cells. These archived stocks are held at the USHRL in an ultra low temperature freezer set at -80°C. Plasmid DNA was extracted by using the Qiagen 9600 liquid handling robot and the QIAprep 96 Turbo miniprep kit according to the recommended protocol (QIAGEN, Inc., Valencia, CA, USA). Sequencing reactions were performed with the ABI PRISM® BigDye™ Primer Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA) along with a universal T3 primer. Reactions were prepared in 96-well format with the Biomek2000™ liquid handling robot (Beckman Coulter, Inc., USA). Sequencing reaction products were precipitated with 70% isopropanol, resuspended in 15 μL sterile water, and loaded onto an ABI 3700 DNA Analyzer (Applied Biosystems, Foster City, CA, USA).
Base confidence scores were calculated by TraceTuner® (Paracel, Pasadena, CA, USA). Low-quality bases (confidence score <20) were trimmed from both ends of sequences. Quality trimming, vector trimming, and sequence fragment alignments were executed by Sequencher® software (Gene Codes, Ann Arbor, MI, USA). cDNA sequences arising from rRNA and mitochondrial DNA were identified with BLASTN and were excluded from analysis along with sequences less than 200 nucleotides in length after both vector and quality trimming. Additional ESTs that corresponded to vector contaminants were removed from the dataset. To estimate the number of genes represented in the library and the redundancy of specific genes, ESTs were assembled into “contigs” by Sequencher®. Contig assembly parameters were set with a minimum overlap of 50 bases and 95% identity match.
Sequence Analysis and Structure Prediction
The idgf cDNA sequence was covered in eight overlapping clones and then bi-directionally sequenced five more times for sequence validation. The status of the gene sequence was determined based on TBLASTX homology searches by the National Center for Biotechnology Information BLAST server ( http://www.ncbi.nlm.nih.gov) with the protein sequence comparisons made to protein databases (BLASTP). CLUSTALX was used for multiple alignments of amino acid sequences of the homologous proteins (Thompson et al. 1994), and an unrooted phylogenetic tree was constructed from this alignment with PAUP 4.0 and 1,000 bootstraps (Swofford 2002). The theoretical molecular weight and pI value of the predicted protein was calculated by ExPASy ( http://au.expasy.org). SignalP 3.0 server ( http://www.cbs.dtu.dk/services/SignalP) was used to analyze the presence and location of potential signal peptide cleavage sites in the amino acid sequence. Structural modeling of the IDGF protein was performed with ROBETTA ( http://www.robetta.com). ROBETTA is a full-chain protein structure prediction server which also performs domain parsing, 3-D modeling, fragment library generation, and protein-protein interaction studies by using an interface alanine scanning method. The programs utilize the Ginzu domain parsing and fold detection method developed by Dylan Chivian, David Kim, Lars Malmstrom, and David Baker, and the ROSETTA fragment insertion method (Simons et al. 1997). ROBETTA scans protein chains to identify homologs with PDB-BLAST, FFAS03, 3D-Jury, and the Pfam-A protein family databases and assigns z-scores by MAMMOTH, a computer program that provides consistent protein model quality ranking by comparing modeled structures and their experimental counterparts (Ortiz et al. 2002). PSI-BLAST multiple sequence alignment then assigns regions of increased likelihood of including an adjoining domain with sequence clusters (Bowers et al. 2000; Rohl & Baker 2002). Loop regions are assembled from fragments and optimized to fit the aligned template structure. Protein domain predictions were identified by using the full prediction protocol (Kim et al. 2004). Graphic rendering of the predicted 3-D structures were done by the PDB file created by ROBETTA and Protein Explorer ( http://www.molvis.sdsc.edu/protexpl/frntdoor.htm).
A phylogenetic tree was generated by bootstrap neighbor joining, 1000×, PAUP 4.0, unrooted (Swofford 2002). Topology was similar when analyzed by maximum parsimony (branch-and-bound search method); data not shown. The phylogenetic tree of amino acid sequences was constructed with five members of Drosophila IDGF, and a bacterial chitinase.
Results
Isolation of IDGF-DRW
An initial 10,000 clones were sequenced from the 5' end. These sequences were trimmed of vector and low-quality sequence and filtered for minimum length (200 contiguous bp, Phred 20 quality score) producing the final set of 8,480 high-quality ESTs. These ESTs were assembled by the program Sequencher® (Gene Codes, Ann Arbor, MI 48105) to produce 5,508 contiguous sequences (contigs) with 1,240 ESTs remaining as singlets. Of the assembled sequences analyzed, one contig (representing eight ESTs) was similar to the known IDGF protein sequences based on the TBLASTX results. The nucleotide sequence of this contig contained an intact ORF and was bi-directionally sequenced for sequence validation, providing an average 5× coverage for 90% of the transcript sequence, which was registered in GenBank (accession no. AY821658). The mRNA transcript was designated as idgf-DRW. A second cDNA library produced from DRW adults caught in the field also identified the idgf-DRW transcript (Hunter, data not shown).
Sequence Analysis of IDGF-DRW
Analysis of the nucleotide sequence revealed an ORF in idgf-DRW, consisting of 1329 bases and encoding a putative protein of 442 amino acid residues (Fig. 1) with a predicted molecular weight of 49.5 kDa and a pI value of 6.68. TBLASTX comparison and multiple sequence alignment of the idgf-DRW cDNA sequence (Fig. 2) indicated that the deduced amino acid sequence for the IDGF-DRW protein (accession no. AAV68692) had 43% similarity with Drosophila melanogaster IDGF1 (accession no. AAC99417), 48% with Drosophila melanogaster IDGF2 (accession no. AAC99418), 43% with Drosophila melanogaster IDGF3 (accession no. AAC99419), 51% with Drosophila melanogaster IDGF4 (accession no. AAC99420), 44% with Drosophila melanogaster IDGF5 (accession no. AAF57703), 51% with Drosophila melanogaster DS47 (accession no. AAC48306), 51% with Pieris rapae IDGF (accession no. AAT36640), and 51% with Bombyx mori IDGF-like protein (accession no. BAB16695). SignalP analysis of the predicted IDGF-DRW showed the most likely signal sequence cleavage site was between Ser23 and Ala24 (Fig. 1, •), which will generate a signal peptide of 23 amino acid residues and a predicted mature protein with a calculated molecular weight of 47.1 kDa and a pI value of 6.76. Presence of a signal peptide was common to all IDGF proteins described to date although the length of the signal peptide varied (Fig. 2; refs for variation in signal sequence length: Chou 2002; Martoglio & Dobberstein 1998). Additionally, a single consensus motif for N-linked glycosylation (Asn227), as previously reported for DS47 (Asn233), also was identified (Fig. 1, •).
Phylogenetic Relationship Between IDGF-DRW and Other Members of the IDGF Family
Analysis of the sequence and predicted protein structure of IDGF-DRW strongly supports its classification as a new member of the growth-promoting glycoproteins IDGF family. The IDGF proteins, encoded by members of the idgf gene family, were the first soluble polypeptide growth factors to be reported from invertebrates. These proteins are structurally related to chitinases rather than to known growth factors, and were shown to possess no catalytic activity (Kirkpatrick et al. 1995; Kawamura et al. 1999). The tree topology revealed that the Drosophila IDGF1, IDGF2, and IDGF3 members form a clade separate from the other insect IDGF4, IDGF5, and chitinase (Zurovcova & Ayala 2002) (Fig. 3). Herein, a phylogenetic tree of the multiple amino acid sequence alignment of IDGF-DRW, Drosophila IDGF, P. rapae IDGF, and B. mori IDGF-like proteins is shown (Fig. 3). In pairwise comparisons, these IDGF proteins showed 38% to 79% similarity. The phylogenetic tree showed that the three protein ortholog groups (IDGF1, IDGF2, and IDGF3) were more similar to each other than to the other six.
Structure was predicted by using ROBETTA, which is a full-chain protein structure prediction server (Fig. 4). The view of conserved residues (in green space-fill) predicted to be essential to maintain the barrel folding are shown. Residues are Gly109, Gly110, Asp152, Gly153, Leu218, Asp242, Lys295, Gly412, and Asp421. The residue at position 159 is Glu (in brown space-fill) which is commonly replaced by Gln in other known IDGF proteins. There are two disulfide bridges, Cys31-Cys58 and Cys345-Cys427, in the predicted IDGF-DRW structure which are conserved in all known IDGF family members (Fig.2), in human chitotriosidase, and in mammalian chitinase-like proteins with no chitinase activity, but are not found in family 18 glycosyl hydrolases from plants or bacteria (Varela et al. 2002). The putative binding site of IDGF2 (Varela et al. 2002) is composed of the ten residues, Tyr65, Asp111, His112, Gln159, Phe160, Lys162, Asp250, Tyr303, Phe416, and Tyr420. These residues are not strictly conserved in all known IDGF family members, however, the predicted binding site of IDGF-DRW contains a subset of these residues (Tyr65, Phe160, Asp250, and Tyr303) and the remaining residues are replaced by the same or complementary category of amino acids.
Discussion
Overall Structure of IDGF-DRW
The structure of IDGF-DRW was modeled against Drosophila IDGF2 (Varela et al. 2002). The formation of the clade of IDGF members from Drosophila species separate from the other insects, implies an evolutionary divergence of all the Drosophila sequences after the evolutionary divergence of the insects for which IDGF sequence data is presented. This would suggest a very recent expansion of the IDGF gene family. Further examination is needed before statements of evolutionary divergence within the remaining insect species, DRW, P. rapae, and B. mori can be ascertained.
The structure of IDGF-DRW displays the characteristic fold of the family 18 glycosyl hydrolases. An insertion (Gly304 to Phe392) in the beta barrel motif between strand β7 and helix α7 forms an additional α+β domain similar to that of Serratia marcescens chitinases A and B (Perrakis et al. 1994; Van Aalten et al. 2000). The feature is common to other IDGF proteins and to most chitinase-like proteins described to date, although the insertion length varies (Varela et al. 2002). A few conserved residues, present in family 18 glycosyl hydrolases and other IDGF proteins, also were found in IDGF-DRW and are apparently essential to maintain the barrel folding residues shown in red (Fig. 2). Characteristically in IDGF2, there are three cis peptide bonds--Gly64-Tyr65, Pro319-Val320, and Phe416-Asp417. The first and third are conserved in all family 18 members and appear to be necessary for correct folding of the barrel. The second is located in the inserted α+&beta domain and is not conserved in S. marcescens chitinase A or B (Perrakis et al. 1994; Varela et al. 2002). The corresponding peptide bonds in IDGF-DRW are Gly64-Tyr65, Pro319-Pro320, and Val416-Asp417. A conserved amino acid change in the third peptide bond, Phe416→Val416, may preserve its function as valine is an aliphatic-hydrophobic amino acid and capable of binding substrate. Additionally, the aromatic residues (Tyr65 and Phe416) involved in the two conserved cis peptide bonds have been reported to be important for binding of substrates in all glycosyl hydrolases with triosephosphate isomerase barrel folds (Jabs et al. 1999). Thus the changes in amino acids were predicted to not affect the binding site structure.
The cooperation between IDGFs and insulin in promoting cell proliferation makes these interactions a possible genetic target for disruption, especially in a long-lived insect like DRW which has a subterranean larval stage. Disruption of critical developmental pathways, such as reduction of the signaling of the insulin receptor, may provide a means for the development of novel management methods to reduce DRW growth and/or survival.
Our work has identified a new member of growth-promoting glycoprotein IDGF protein family, IDGF-DRW. The idgf-DRW sequence was amplified from both cultured and field caught adult DRW. Further investigations are needed to examine the expression of these proteins during development to identify when they are maximally expressed, and their interactions with downstream effector signals in developmental pathways. Identification of the genes and proteins functioning in developmental pathways increases our understanding and aids the development of essential tools to conduct future experiments on the functions and interactions of growth-factor like receptors in the DRW.
Acknowledgments
We gratefully thank Laura E. Hunnicutt, Biological Science Technician, for library construction and Anna Sarah Hill, Biological Science Technician, for technical assistance, USDA, ARS, U.S. Horticultural Research Lab, Fort Pierce, Florida, U.S.A.; Phat Dang for sequencing and critical comments on the manuscript, Genomic Laboratory, USDA, ARS, U.S. Horticultural Research Lab, Fort Pierce, Florida, U.S.A.; and Drs. Catherine Katsar for critical review and Xiomara Sinisterra for critical review and Spanish translation. The use or mention of a trademark or proprietary product does not constitute an endorsement, guarantee, or warranty of the product by the U.S. Department of Agriculture and does not imply its approval to the exclusion of other suitable products.