Using GFS to Identify Encoding Genomic Loci from Protein Mass Spectral Data
互联网
- Abstract
- Table of Contents
- Figures
- Literature Cited
Abstract
Genome?based peptide fingerprint scanning (GFS) directly maps several types of protein mass spectral (MS) data to the loci in the genome that may have encoded for the protein. This process can be used either for protein identification or for proteogenomic mapping, which is gene?finding and annotation based on proteomic data. Inputs to the program are one or more mass spectrometry files from peptide mass fingerprinting and/or tandem MS (MS/MS) along with one or more sequences to search them against, and the output is the coordinates of any matches found. This unit describes the use of GFS and subsequent results analysis. Curr. Protoc. Bioinform. 21:13.9.1?13.9.20. © 2008 by John Wiley & Sons, Inc.
Keywords: mass spectrometry; protein identification; proteogenomics
Table of Contents
- Introduction
- Basic Protocol 1: Using GFS on a Local Machine with PMF and Optional MS/MS Data
- Basic Protocol 2: Using GFS on a Local Machine with Shotgun MS Data
- Alternate Protocol 1: Using the GFS Website
- Support Protocol 1: Obtaining and Installing GFS on a Local Machine
- Guidelines for Understanding Results
- Commentary
- Literature Cited
- Figures
- Tables
Materials
Figures
-
Figure 13.9.1 Screenshot of the GFS Website layout, taken from the GFS home page. Note the tabs on the left. In other screenshots the banner and tabs are not shown, for brevity; only the main data frame is shown in those figures. View Image -
Figure 13.9.2 Inputting PMF/tandem data prior to a run. Most values have been left at their defaults, which are also key values to be monitored when using GFS from the command line. Acetylation has been chosen as a single fixed modification, though multiple fixed and variable modifications can be specified. For demonstration purposes, a peptide mass list has been both pasted from the clipboard and uploaded prior to clicking the GFS button (a warning notice is issued, and the job can be continued with a second click). Note that a tandem data file is also selected for upload. View Image -
Figure 13.9.3 Partial display output while GFS is running. This output is similar regardless of whether you are running GFS from the command line or the Website. This figure shows the first main step of the analysis, wherein the genomic sequence is expressed and proteolytically digested in silico into fragments whose locations and masses are stored into a temporary database. View Image -
Figure 13.9.4 Search results summary in PMF/tandem mode. The major input parameters are shown for safety and reproducibility. View Image -
Figure 13.9.5 GFS hit summary in PMF/tandem mode. This section gives overview information about each significant hit cluster, and a hyperlink to match details. View Image -
Figure 13.9.6 Start of a GFS genomic hit region in PMF/tandem mode. This begins the detail for a given match region. See details in the Understanding Results section. View Image -
Figure 13.9.7 End of a GFS genomic hit region in PMF/tandem mode. Further information about an individual hit cluster, notably including ORF coverage. See details in the Understanding Results section. View Image -
Figure 13.9.8 Inputting shotgun data on the GFS Website. The .pkl or .dta shotgun file is selected for upload; tolerances, intensity threshold and other parameters are chosen. View Image -
Figure 13.9.9 Partial display output during processing in shotgun mode. This is a portion of output unique to shotgun processing. Digestion occurs first in silico , as shown in Figure . View Image -
Figure 13.9.10 GFS shotgun results. These are the fields and a few sample hits present in shotgun output, regardless of whether the command line or the Website is used. View Image -
Figure 13.9.11 A sample entry from the genomeDefinitions.plist file. The top‐level key name, in this case “ecoli‐mono‐seqs,” is an arbitrary string that can be used with the “‐genome” command line switch to provide the program with many search options using one short tag. View Image
Videos
Literature Cited
The ENCODE Project Consortium. 2004. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306: 636‐640. | |
Fenyö, D. and Beavis, R.C. 2003. A method for assessing the statistical significance of mass spectrometry based protein identifications using general scoring schemes. Anal. Chem. 075: 768‐774. | |
Khatun, J., Hamlett, E., and Giddings, M.C. 2008. Incorporating sequence information into the scoring function: A hidden Markov model for improved peptide identification. Bioinformatics [Advance Access]. In press. | |
Mann, M., Hojrup, P., and Roepstorff, P. 1993. Use of mass spectrometric molecular weight information to identify proteins in sequence databases. Biol. Mass Spectrom. 22: 338‐345. | |
Smith, T.F., Waterman, M.S., and Fitch, W.M. 1981. Comparative biosequence metrics. J. Mol. Evol. 18: 38‐46. | |
Washburn, M.P., Wolters, D., and Yates, J.R. 3rd. 2001. Large‐scale analysis of the yeast proteome by multidimensional protein identification technology. Nature Biotech. 19: 242‐247. | |
Yates, J.R. 3rd, Eng, J.K., and McCormack, A.L. 1995. Mining genomes: Correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. Anal. Chem. 67: 3202‐3210. | |
Internet Resources | |
http://gfs.unc.edu | |
The Website for GFS executables, documentation, source code, and Web‐based peptide searching. | |
http://gnustep.org | |
If using the GFS application locally on Windows or Linux, the Website to obtain the GNUstep runtime libraries. Not needed for Mac OS X or if searching through the GFS Website. |