Active Site Profiling to Identify Protein Functional Sites in Sequences and Structures Using the Deacon Active Site Profiler (DASP)

互联网2013-12-31

875

Abstract
Table of Contents
Figures
Literature Cited

Abstract

Methods for the annotation and analysis of functional sites in proteins are an area of active research, and those methods that allow detailed characterization of functional site features are much needed. A Web site application, DASP, which implements a previously described method (Cammer, et al., 2003) to allow users to create an active site profile for any protein family, is described. Two protocols for functional site analysis of protein families using DASP are presented: 1) creation of functional site signatures and a profile from proteins of known structure and 2) utilization of the active site profile to search sequences that contain fragments similar to those found in the functional site signatures. The active site profile produced by Basic Protocol 1 allows the user to analyze the features of the functional site, i.e., those characteristics that are common across the family and those that are unique to one or several members of the family. The characteristics that are unique to a subfamily might be described as specificity determinants i.e., features that impart specificity to a particular function. Basic Protocol 2 provides instructions for searching for sequences that might contain a similar functional site.

Keywords: active site profiling; fuzzy functional form; protein function prediction; active site; functional site; functional specificity determinants

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Basic Protocol 1: Construction of the Active Site Profile for a Functional Site
Basic Protocol 2: Use of the Functional Site Profile to Search the Sequence Database
Guidelines for Understanding Results
Commentary
Literature Cited
Figures
Tables

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 8.10.1 Schematic representation of the user and algorithm steps in Basic Protocols 1 and 2. Pink boxes and arrows indicate steps performed by the program algorithm, blue boxes and arrows indicate steps performed by the user. Gray boxes indicate two ways to use DASP: (1) searching for signatures in known structures as in and (2) using an ASP to search sequences for similar signatures as in . The green and yellow boxes on the right illustrate some steps applied to the mandelate racemase protein family. (see Fig. for the ASPs identified). Protein structures for three mandelate racemases are shown at the upper right, labeled with their pdb filenames 1mns, 2mnr, and 1mdr. A closer view of the active site for 1mns is shown underneath.

View Image

Figure 8.10.2 Screen shot of the DASP Web site and data input page. On the left is the Web page that the user should see upon going to the Web site http://dasp.deac.wfu.edu. The data input page that the user sees upon clicking the “Continue to DASP” button is shown on the right. The data necessary for applying to the mandelate racemases is shown in the input fields. These input data were used to obtain the ASP shown in Figure A (top).

View Image

Figure 8.10.3 Examples of files that are e‐mailed to the user as a result of Basic Protocols 1 and 2. The files containing the results for applying to the mandelate racemases protein family are: MR.dnd (upper left), MR.fasta.cw (upper right), MR.aln (lower left), and MR.fasta (lower right). Contents of the files are described in the text.

View Image

Figure 8.10.4 Examples of files that are e‐mailed to the user as a result of only. The extra files containing the results for applying to the mandelate racemases protein family are: MR_0.0010_PDB_search.out (left) and MR_newsigs.fasta (right). Contents of the files are described in the text. The identification of each sequence is indicated with “>” at beginning of lines and the p ‐value at the end of the line in the MR_0.0010_PDB_search_out file. Note: in GenBank sequence files, all sequences which share 100% sequence identity are listed together in the output, so their score is listed only once, and their names are concatenated.

View Image

Figure 8.10.5 Example of applying Basic Protocols 1 and 2 to the mandelate racemase protein family. (A ) Mandelate racemase active site profiles: original set with three PDBs, with ASP score of 0.86 (top); complete profile identified after the bootstrap procedure described in the text, with ASP score of 0.56 (middle); and profile resulting from sequence search of GenBankNR, with ASP score of 0.27 (bottom). The known key residues are identified from structural information and are shown as red letters, while the hypothesized key residues identified from the sequence searches (no known structure) are shown as blue letters. (B ) Distributions of the p ‐values for the mandelate racemase searches of the PDB sequences (top) and GenBankNR sequences (bottom). The x ‐axis represents the negative of the exponent of the p ‐value (e.g., 30 represents a p ‐value of 10⁻³⁰ ).

View Image

Figure 8.10.6 Part of the glutaredoxin/thioredoxin superfamily active site profile showing different subfamilies within this large superfamily. From the top, the functional site signatures are as follows (listed as pdb filename, protein name): 1aaz, T4 glutaredoxin; 1aba, T4 glutaredoxin; 1ac1, DsbA (disulphide bond forming protein); 1acv, DsbA; 1dsb, DsbA; 1auc, thioredoxin; 2trx, thioredoxin; 1ego, glutaredoxin. The four subfamilies that are visible by eye (from the overall sequence similarity and the alignment of the key residue proline, shown in red) correlate with the biologically relevant subfamilies in this superfamily.

View Image

Videos

Literature Cited

	Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Shang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucl. Acids Res. 25:3389‐3402.
	Attwood, T.K., Beck, M.E., Flower, D.R., Scordis, P., and Selly, J. 1998. The PRINTS protein fingerprints database in its fifth year. Nucl. Acids Res. 26:304‐308.
	Bailey, T.L. and Gribskov, M. 1998a. Combining evidence using p‐values: Application to sequence homology searches. Bioinformatics 14:48‐54.
	Bailey, T.L. and Gribskov, M. 1998b. Methods and statistics for combining motif match scores. J. Comput. Biol. 5:211‐221.
	Baxter, S.M., Rosenblum, J.S., Knutson, S., Nelson, M.R., Montimurro, J.S., Di Gennaro, J.A., Speir, J.A., Burbaum, J.J., and Fetrow, J.S. 2004. Synergistic computational and experimental proteomics approaches for more accurate detection of active serine hydrolases in yeast. Mol. Cell. Proteomics 3:209‐225.
	Cammer, S.A., Hoffman, B.T., Speir, J.A., Canady, M.A., Nelson, M.R., Knutson, S., Gallina, M., Baxter, S.M., and Fetrow, J.S. 2003. Structure‐based active site profiles for genome analysis and functional family subclassification. J. Mol. Biol. 334:387‐401.
	Fetrow, J.S. and Skolnick, J. 1998. Method for prediction of protein function from sequence using the sequence‐to‐structure‐to‐function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. J. Mol. Biol. 281:949‐968.
	Fetrow, J.S., Godzik, A., and Skolnick, J. 1998. Functional analysis of the Escherichia coli genome using the sequence‐to‐structure‐to‐function paradigm: Identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity. J. Mol. Biol. 282:703‐711.
	Gerlt, J.A. and Babbitt, P.C. 2001. Divergent evolution of enzymatic function: Mechanistically diverse superfamilies and functionally distinct suprafamilies. Annu. Rev. Biochem. 70:209‐246.
	Gribskov, M., McLachlan, A.D., and Eisenberg, D. 1987. Profile analysis: Detection of distantly related proteins. Proc. Natl. Acad. Sci. U.S.A. 84:4355‐4358.
	Hegyi, H. and Gerstein, M. 2001. Annotation transfer for genomics: Measuring functional divergence in multi‐domain proteins. Genome Res. 11:1632‐1640.
	Henikoff, S., Henikoff, J.G., and Pietrokovski, S. 1999. Blocks+: A non‐redundant database of protein alignment blocks derived from multiple compilations. Bioinformatics 15:471‐479.
	Higgins, D.G., Thompson, J.D., and Gibson, T.J. 1996. Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 266:383‐402.
	Hofmann, K., Bucher, P., Falquet, L., and Bairoch, A. 1999. The Prosite database, its status in 1999. Nucl. Acids Res. 27:215‐219.
	Huff, R.G. 2005. DASP. Active Site Profiling for Identification of Functional Sites in Protein Sequences and Structures. Thesis, Wake Forest University, Winston‐Salem, N.C.
	Huff, R.G., Bayram, E., Tan, H., Knutson, S.T., Knaggs, M.H., Richon, A.B., Santago, P., II, and Fetrow, J.S. 2005. Chemical and structural diversity in cyclooxygenase protein active sites. Chem. and Biodiversity 2:1533‐1552.
	Rost, B. 2002. Enzyme function less conserved than anticipated. J. Mol. Biol. 318:595‐608.
	Siddiqi, F., Bourque, J.R., Jiang, H., Gardner, M., St. Maurice, M., Blouin, C., and Bearne, S.L. 2005. Perturbing the hydrophobic pocket of mandelate racemase to probe phenyl motion during catalysis. Biochemistry 44:9013‐9021.
	St. Maurice, M. and Bearne, S.L. 2004. Hydrophobic nature of the active site of mandelate racemase. Biochemistry 43:2524‐2532.
Key References
	Cammer et al., 2003. See above.
	Describes original research leading to the development of the active site profiling method. Details are given about scoring and validation.
	Baxter et al., 2004. See above.
	Describes a computational method for profiling sequences with an experimental proteomics method. Detailed analysis of serine hydrolases in yeast is presented.
Internet Resources
	http://dasp.deac.wfu.edu
	This DASP Web site allows access to the active site profiling software.