This fully integrated effort resulted in the first large-scale prediction of function for chain elongation members of the IS superfamily and showed full actualization the EFI pipeline. The Superfamily/Genome Core carried out large-scale sequence analysis for selection of IS targets. Proteins produced in the Protein Core were provided to Structure Core and IS Bridging Project for experimental characterization. Their results enabled benchmarking of methodology and validation of blind predictions made by the Computation Core.
The number of available protein sequences has increased exponentially with the advent of high-throughput genomic sequencing, creating a significant challenge for functional annotation. Here, we describe a large-scale study on assigning function to unknown members of the trans-polyprenyl transferase (E-PTS) subgroup in the isoprenoid synthase superfamily, which provides substrates for the biosynthesis of the more than 55,000 isoprenoid metabolites. Although the mechanism for determining the product chain length for these enzymes is known, there is no simple relationship between function and primary sequence, so that assigning function is challenging. We addressed this challenge through large-scale bioinformatics analysis of >5,000 putative polyprenyl transferases; experimental characterization of the chain-length specificity of 79 diverse members of this group; determination of 27 structures of 19 of these enzymes, including seven cocrystallized with substrate analogs or products; and the development and successful application of a computational approach to predict function that leverages available structural data through homology modeling and docking of possible products into the active site. The crystallographic structures and computational structural models of the enzyme–ligand complexes elucidate the structural basis of specificity. As a result of this study, the percentage of E-PTS sequences similar to functionally annotated ones (BLAST e-value ≤ 1e−70) increased from 40.6 to 68.8%, and the percentage of sequences similar to available crystal structures increased from 28.9 to 47.4%. The high accuracy of our blind prediction of newly characterized enzymes indicates the potential to predict function to the complete polyprenyl transferase subgroup of the isoprenoid synthase superfamily computationally.
Figure 1. Crystal structure of GGPP synthase (PDB ID 1RQI). (A) Active site S1 with DMAPP, Mg2+ ions, and Asp-rich motifs and active site S2 with IPP are highlighted. The electrophilic attack of the C1 atom of DMAPP against the double bond of IPP after cleavage of diphosphate is indicated by the black arrow. (B and C) Side view (B) and top view (C) of the bioactive dimer with the active site and elongation cavity displayed. Helices D–H are identified by capital letters.
Figure 2. Sequence similarity map of the E-PTS subgroup with (A) BLAST e-value cutoff = 1e−50 and (B) zoom at cutoff = 1e−70. Template sequences are tagged by PDB identifiers, and colored sequence nodes indicate experimentally assigned product chain length determined either in this study (large nodes) or previously, based on GOA (small nodes).
Figure 3. Error in C5 units of computationally predicted compared vs. experimentally determined product chain length for (A) training set and targetsknown, and (B) targetsblind. Circles represent predictions using homology models constructed based on holo crystal structure templates; triangles represent apo structures or homology models based on apo structures. The larger symbols indicate multiple predictions that have the same sequence identity and prediction error.
Figure 4. Superposition of the 4FP4 crystal structure and the homology model of the same protein based on PDB 3AQ0 with 29% sequence identity, created before the structure was available. The computationally predicted ligand conformation is shown in red, and side chains of the elongation cavity are in orange. The partial ligand observed in the crystal structure is shown in green, and the elongation-cavity side chains are in blue.
Figure 6. (A) Computational model of a C50 ligand (red) in the elongation channel of polyprenyl transferase 3OYR. The cavity volume is colored according to partial charges of surrounding residues, with neutral (hydrophobic) shown in green, negative in red, and positive in blue. (B) Conformational changes of residues in the elongation channel through displacement by the ligand (chains A and B of crystal structure 3OYR are shown in blue and maroon, respectively; chains A and B of the long-chain model of 3OYR are shown in orange and yellow, respectively; C50 is shown in red). (C) Superposition of the predicting binding modes of C25 in 3OYR (orange) and C15 in a structural model of protein GI 126458776 (green).
Copyright (2013) National Academy of Sciences, USA