Using the Saccharomyces Genome Database (SGD) for Analysis of Genomic Information

互联网2013-12-31

1372

Abstract
Table of Contents
Figures
Literature Cited

Abstract

Analysis of genomic data requires access to software tools that place the sequence?derived information in the context of biology. The Saccharomyces Genome Database (SGD) integrates functional information about budding yeast genes and their products with a set of analysis tools that facilitate exploring their biological details. This unit describes how the various types of functional data available at SGD can be searched, retrieved, and analyzed. Starting with the guided tour of the SGD Home page and Locus Summary page, this unit highlights how to retrieve data using YeastMine, how to visualize genomic information with GBrowse, how to explore gene expression patterns with SPELL, and how to use Gene Ontology tools to characterize large?scale datasets. Curr. Protoc. Bioinform. 35:1.20.1?1.20.23. © 2011 by John Wiley & Sons, Inc.

Keywords: genome database; gene expression; gene ontology; InterMine; SPELL; GBrowse; high?throughput data

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Introduction
Basic Protocol 1: Exploring the SGD Pages
Basic Protocol 2: Using YeastMine to Retrieve and Analyze Multiple Data Types for Sets of Genes
Basic Protocol 3: Exploring Genome Features with GBrowse
Basic Protocol 4: Using SPELL to Analyze Microarray Gene Expression Data
Basic Protocol 5: Using the GO Slim Mapper to Group Sets of Genes According to Their Function or Location in the Cell
Guidelines for Understanding Results
Commentary
Literature Cited
Figures

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 1.20.1 The SGD home page (http://www.yeastgenome.org) provides access to most features of the Web site through the gene or keyword search box at the top of the page, as well as from links to data analysis tools, download data directories, and community information.

View Image

Figure 1.20.2 The Locus Summary page displays the outline of the current knowledge about the gene. More detailed information is available by selecting one of the tabs near the top of the page.

View Image

Figure 1.20.3 Data retrieval with YeastMine. (A ) Results table obtained after creating a sample gene list. The primary identifier is the SGDID for each gene, followed by the systematic name (referred to as the secondary identifier), followed by the standard gene name (called the symbol). Each of these is linked to the Locus Summary page for that gene. The fourth column, name, explains the 3‐letter acronym used for the standard name. This is called the Name Description on the Locus Summary page. The fifth column, Alias names, contains other names used for the gene, besides the standard name. Additional rows of the table can be viewed by clicking on the “Next >“ link below the table. (B ) Partial figure of the interactive widgets displaying properties of the sample list (note that the Publication Enrichment list is truncated). These displays were generated by YeastMine during creation of the list, by automatic running of several premade template queries. The Chromosome Distribution graph shows the number of genes in the list found on each chromosome (Actual) compared to the number expected to be found on each chromosome (Expected). Clicking on a bar in the graph generates the list of Actual genes on that chromosome. The Gene Ontology Enrichment table displays the BP, MF, or CC GO terms annotated multiple times for genes in the list, the number of times that term appears, and the p value, which is the probability that that count occurs by chance. GO term IDs are linked to term pages of the AmiGO database. The Publication Enrichment table displays a similar arrangement of columns, except that publications used for annotation of the genes in the list, rather than GO terms, are counted. PubMed IDs are linked to the abstract of the paper from NCBI's PubMed database.

View Image

Figure 1.20.4 “Gene → Phenotype” query displayed in the QueryBuilder tool. (A ) The Model Browser displays the options of classes and attributes for building a query. Click on the “+” sign next to a class (such as “chromosome”) to display the attributes pertaining to that class. Clicking “summary” adds all the attributes of that class to the query. A single attribute can be restricted from the query by clicking “constrain.” Single attributes can be added by clicking “show.” (B ) The Query Overview displays the query as you build it. To remove an item, click on the red “X”. (C ) The “Columns to Display” section allows you to choose the content and order of the columns for the results table. Click on the red “X” to remove a column; drag the box to a different position to determine the order of the columns. Note that in this figure the Columns to Display section from the Gene → Phenotype query has been truncated.

View Image

Figure 1.20.5 GBrowse main window displaying a 20‐kbp fragment of Chromosome 11 centered on the BUD2 gene (highlighted in yellow). The browser display is divided into several panels that provide (a) search and navigation tools, (b) overview of the region landmarks, and (c) details for selected tracks, including annotated chromosomal features, unannotated transcripts, and a graph depicting Pol II occupancy data.

View Image

Figure 1.20.6 SPELL Results page showing expression patterns for the top 20 genes and the top 10 datasets for the two‐gene ( CDC48 , UFD1 ) query set. The arrows point to the elements of the display that are explained in the text: (a) rank of datasets, (b) ACS and Contribution, (c) rank of genes, (d) expression data as a “heat map” (red for increased, green for decreased), (e) the GO Term Enrichment table (truncated for simplicity).

View Image

Figure 1.20.7 GO Slim Mapper results table for a sample set of 45 genes analyzed for Cellular Component GO Slim annotations. Cluster frequency refers to the frequency with which each term is associated with the genes in the list, either directly, or indirectly through the relationship of that term with more granular terms actually used for annotation. Genome frequency refers to the frequency with which that term is associated with genes in the entire genome, directly or indirectly.

View Image

Videos

Literature Cited

Literature Cited
	Arnaud, M.B., Chibucos, M.C., Costanzo, M.C., Crabtree, J., Inglis, D.O., Lotia, A., Orvis, J., Shah, P., Skrzypek, M.S., Binkley, G., Miyasato, S.R., Wortman, J.R., and Sherlock, G. 2010. The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community. Nucleic Acids Res. 38:D420‐D427.
	Cherry, J.M., Ball, C., Weng, S., Juvik, G., Schmidt, R., Adler, C., Dunn, B., Dwight, S., Riles, L., Mortimer, R.K., and Botstein, D. 1997. Genetic and physical maps of Saccharomyces cerevisiae. Nature 387:67‐73.
	Costanzo, M.C., Skrzypek, M.S., Nash, R., Wong, E., Binkley, G., Engel, S.R., Hitz, B., Hong, E.L., Cherry, J.M., and the Saccharomyces Genome Database Project. 2009. New mutant phenotype data curation system in the Saccharomyces Genome Database. Database (Oxford) 2009:bap001.
	Degtyarenko, K., de Matos, P., Ennis, M., Hastings, J., Zbinden, M., McNaught, A., Alcántara, R., Darsow, M., Guedj, M., and Ashburner, M. 2008. ChEBI: A database and ontology for chemical entities of biological interest. Nucleic Acids Res. 36:D344‐D350.
	Dwight, S.S., Balakrishnan, R., Christie, K.R., Costanzo, M.C., Dolinski, K., Engel, S.R., Feierbach, B., Fisk, D.G., Hirschman, J., Hong, E.L., Issel‐Tarver, L., Nash, R.S., Sethuraman, A., Starr, B., Theesfeld, C.L., Andrada, R., Binkley, G., Dong, Q., Lane, C., Schroeder, M., Weng, S., Botstein, D., and Cherry, J.M. 2004. Saccharomyces Genome Database: Underlying principles and organisation. Brief. Bioinform. 5:9‐22.
	Engel, S.R., Balakrishnan, R., Binkley, G., Christie, K.R., Costanzo, M.C., Dwight, S.S., Fisk, D.G., Hirschman, J.E., Hitz, B.C., Hong, E.L., Krieger, C.J., Livstone, M.S., Miyasato, S.R., Nash, R., Oughtred, R., Park, J., Skrzypek, M.S., Weng, S., Wong, E.D., Dolinski, K., Botstein, D., and Cherry, J.M. 2010. Saccharomyces Genome Database provides mutant phenotype data. Nucleic Acids Res. 38:D433‐D436.
	Fey, P., Gaudet, P., Curk, T., Zupan, B., Just, E.M., Basu, S., Merchant, S.N., Bushmanova, Y.A., Shaulsky, G., Kibbe, W.A., and Chisholm, R.L. 2009. dictyBase: A Dictyostelium bioinformatics resource update. Nucleic Acids Res. 37:D515‐D519.
	Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M., Louis, E.J., Mewes, H.W., Murakami, Y., Philippsen, P., Tettelin, H., and Oliver, S.G. 1996. Life with 6000 genes. Science 274:546‐567.
	Harris, M. and the Gene Ontology Consortium. 2008. The Gene Ontology project in 2008. Nucleic Acids Res. 36:440‐444.
	Hibbs, M.A., Hess, D.C., Myers, C.L., Huttenhower, C., Li, K., and Troyanskaya, O.G. 2007. Exploring the functional landscape of gene expression: Directed search of large microarray compendia. Bioinformatics 23:2692‐2699.
	Hunter, S., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Das, U., Daugherty, L., Duquenne, L., Finn, R.D., Gough, J., Haft, D., Hulo, N., Kahn, D., Kelly, E., Laugraud, A., Letunic, I., Lonsdale, D., Lopez, R., Madera, M., Maslen, J., McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Mulder, N., Natale, D., Orengo, C., Quinn, A.F., Selengut, J.D., Sigrist, C.J., Thimma, M., Thomas, P.D., Valentin, F., Wilson, D., Wu, C.H., and Yeats, C. 2009. InterPro: The integrative protein signature database. Nucleic Acids Res. 37:D224‐D228.
	Lyne, R., Smith, R., Rutherford, K., Wakeling, M., Varley, A., Guillier, F., Janssens, H., Ji, W., Mclaren, P., North, P., Rana, D., Riley, T., Sullivan, J., Watkins, X., Woodbridge, M., Lilley, K., Russell, S., Ashburner, M., Mizuguchi, K., and Micklem, G. 2007. FlyMine: An integrated database for Drosophila and Anopheles genomics. Genome Biol. 8:R129.
	MacIsaac, K.D., Wang, T., Gordon, D.B., Gifford, D.K., Stormo, G.D., and Fraenkel, E. 2006. An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 7:113.
	Mancera, E., Bourgon, R., Brozzi, A., Huber, W., and Steinmetz, L.M. 2008. High‐resolution mapping of meiotic crossovers and non‐crossovers in yeast. Nature 454:479‐485.
	Miura, H., Tomaru, Y., and Hayashizaki, Y. 2006. Innovation of transcriptome analysis. Tanpakushitsu Kakusan Koso 51:2413‐2419.
	Skrzypek, M.S., Arnaud, M.B., Costanzo, M.C., Inglis, D.O., Shah, P., Binkley, G., Miyasato, S.R., and Sherlock, G. 2010. New tools at the Candida Genome Database: Biochemical pathways and full‐text literature search. Nucleic Acids Res. 38:D428‐D432.
	Stark, C., Su, T.C., Breitkreutz, A., Lourenco, P., Dahabieh, M., Breitkreutz, B.J., Tyers, M., and Sadowski, I. 2010. PhosphoGRID: A database of experimentally verified in vivo protein phosphorylation sites from the budding yeast Saccharomyces cerevisiae. Database (Oxford) 2010:bap026.
	Stark, C., Breitkreutz, B.J., Chatr‐Aryamontri, A., Boucher, L., Oughtred, R., Livstone, M.S., Nixon, J., Van Auken, K., Wang, X., Shi, X., Reguly, T., Rust, J.M., Winter, A., Dolinski, K., and Tyers, M. 2011. The BioGRID Interaction Database: 2011 update. Nucleic Acids Res. 39:D698‐D704.
	Stein, L.D., Mungall, C., Shu, S., Caudy, M., Mangone, M., Day, A., Nickerson, E., Stajich, J.E., Harris, T.W., Arva, A., and Lewis, S. 2002. The generic genome browser: A building block for a model organism system database. Genome Res. 12:1599‐1610.
	Steinmetz, E.J., Warren, C.L., Kuehner, J.N., Panbehi, B., Ansari, A.Z, and Brow, D.A. 2006. Genome‐wide distribution of yeast RNA polymerase II and its control by Sen1 helicase. Molecular Cell 24:735‐746.
	Stover, N.A., Krieger, C.J., Binkley, G., Dong, Q., Fisk, D.G., Nash, R., Sethuraman, A., Weng, S., and Cherry, J.M. 2006. Tetrahymena Genome Database (TGD): A new genomic resource for Tetrahymena thermophila research. Nucleic Acids Res. 34:D500‐D503.
	Xu, Z., Wei, W., Gagneur, J., Perocchi, F., Clauder‐Münster, S., Camblong, J., Guffanti, E., Stutz, F., Huber, W., and Steinmetz, L.M. 2009. Bidirectional promoters generate pervasive transcription in yeast. Nature 457:1033‐1037.
	Yassour, M., Kaplan, T., Fraser, H.B., Levin, J.Z., Pfiffner, J., Adiconis, X., Schroth, G., Luo, S., Khrebtukova, I., Gnirke, A., Nusbaum, C., Thompson, D.A., Friedman, N., and Regev, A. 2009. Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc. Natl. Acad. Sci. U.S.A. 106:3264‐3269.