我要登录|
免费注册
|
我的丁香通
- 企业机构：
- 成为企业机构
- 个人用户：
- 个人中心
移动端

大家都在搜

0 人通过求购买到了急需的产品

免费发布求购

发布求购

Using GFS to Identify Encoding Genomic Loci from Protein Mass Spectral Data

互联网2013-12-31

840

Abstract
Table of Contents
Figures
Literature Cited

Abstract

Genome?based peptide fingerprint scanning (GFS) directly maps several types of protein mass spectral (MS) data to the loci in the genome that may have encoded for the protein. This process can be used either for protein identification or for proteogenomic mapping, which is gene?finding and annotation based on proteomic data. Inputs to the program are one or more mass spectrometry files from peptide mass fingerprinting and/or tandem MS (MS/MS) along with one or more sequences to search them against, and the output is the coordinates of any matches found. This unit describes the use of GFS and subsequent results analysis. Curr. Protoc. Bioinform. 21:13.9.1?13.9.20. © 2008 by John Wiley & Sons, Inc.

Keywords: mass spectrometry; protein identification; proteogenomics

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Introduction
Basic Protocol 1: Using GFS on a Local Machine with PMF and Optional MS/MS Data
Basic Protocol 2: Using GFS on a Local Machine with Shotgun MS Data
Alternate Protocol 1: Using the GFS Website
Support Protocol 1: Obtaining and Installing GFS on a Local Machine
Guidelines for Understanding Results
Commentary
Literature Cited
Figures
Tables

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 13.9.1 Screenshot of the GFS Website layout, taken from the GFS home page. Note the tabs on the left. In other screenshots the banner and tabs are not shown, for brevity; only the main data frame is shown in those figures.

View Image

Figure 13.9.2 Inputting PMF/tandem data prior to a run. Most values have been left at their defaults, which are also key values to be monitored when using GFS from the command line. Acetylation has been chosen as a single fixed modification, though multiple fixed and variable modifications can be specified. For demonstration purposes, a peptide mass list has been both pasted from the clipboard and uploaded prior to clicking the GFS button (a warning notice is issued, and the job can be continued with a second click). Note that a tandem data file is also selected for upload.

View Image

Figure 13.9.3 Partial display output while GFS is running. This output is similar regardless of whether you are running GFS from the command line or the Website. This figure shows the first main step of the analysis, wherein the genomic sequence is expressed and proteolytically digested in silico into fragments whose locations and masses are stored into a temporary database.

View Image

Figure 13.9.4 Search results summary in PMF/tandem mode. The major input parameters are shown for safety and reproducibility.

View Image

Figure 13.9.5 GFS hit summary in PMF/tandem mode. This section gives overview information about each significant hit cluster, and a hyperlink to match details.

View Image

Figure 13.9.6 Start of a GFS genomic hit region in PMF/tandem mode. This begins the detail for a given match region. See details in the Understanding Results section.

View Image

Figure 13.9.7 End of a GFS genomic hit region in PMF/tandem mode. Further information about an individual hit cluster, notably including ORF coverage. See details in the Understanding Results section.

View Image

Figure 13.9.8 Inputting shotgun data on the GFS Website. The .pkl or .dta shotgun file is selected for upload; tolerances, intensity threshold and other parameters are chosen.

View Image

Figure 13.9.9 Partial display output during processing in shotgun mode. This is a portion of output unique to shotgun processing. Digestion occurs first in silico , as shown in Figure .

View Image

Figure 13.9.10 GFS shotgun results. These are the fields and a few sample hits present in shotgun output, regardless of whether the command line or the Website is used.

View Image

Figure 13.9.11 A sample entry from the genomeDefinitions.plist file. The top‐level key name, in this case “ecoli‐mono‐seqs,” is an arbitrary string that can be used with the “‐genome” command line switch to provide the program with many search options using one short tag.

View Image

Videos

Literature Cited

	The ENCODE Project Consortium. 2004. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306: 636‐640.
	Fenyö, D. and Beavis, R.C. 2003. A method for assessing the statistical significance of mass spectrometry based protein identifications using general scoring schemes. Anal. Chem. 075: 768‐774.
	Khatun, J., Hamlett, E., and Giddings, M.C. 2008. Incorporating sequence information into the scoring function: A hidden Markov model for improved peptide identification. Bioinformatics [Advance Access]. In press.
	Mann, M., Hojrup, P., and Roepstorff, P. 1993. Use of mass spectrometric molecular weight information to identify proteins in sequence databases. Biol. Mass Spectrom. 22: 338‐345.
	Smith, T.F., Waterman, M.S., and Fitch, W.M. 1981. Comparative biosequence metrics. J. Mol. Evol. 18: 38‐46.
	Washburn, M.P., Wolters, D., and Yates, J.R. 3rd. 2001. Large‐scale analysis of the yeast proteome by multidimensional protein identification technology. Nature Biotech. 19: 242‐247.
	Yates, J.R. 3rd, Eng, J.K., and McCormack, A.L. 1995. Mining genomes: Correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. Anal. Chem. 67: 3202‐3210.
Internet Resources
	http://gfs.unc.edu
	The Website for GFS executables, documentation, source code, and Web‐based peptide searching.
	http://gnustep.org
	If using the GFS application locally on Windows or Linux, the Website to obtain the GNUstep runtime libraries. Not needed for Mac OS X or if searching through the GFS Website.