DRAGON and DRAGON View: Information Annotation and Visualization Tools for Large‐Scale Expression Data
互联网
- Abstract
- Table of Contents
- Materials
- Figures
- Literature Cited
Abstract
The Database Referencing of Array Genes ONline (DRAGON) database system consists of information derived from publicly available databases including UniGene, SWISS?Prot, Pfam, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). Through a Web?accessible interface, the DRAGON Annotate tool rapidly supplies information pertaining to a range of biological characteristics of all the genes in any large?scale gene expression data set. The subsequent inclusion of this information during data analysis and visualization allows for deeper insight into gene expression patterns. The set of DRAGON View tools provides methods for the analysis and visualization of expression patterns in relation to annotated information. Instead of incorporating the standard set of clustering and graphing tools available in many large?scale expression data analysis software packages, DRAGON View has been specifically designed to allow for the analysis of expression data in relation to the biological characteristics of gene sets.
Table of Contents
- Basic Protocol 1: Preparing Data for Use with the DRAGON Database and Analyzing Data with Dragon View
- Support Protocol 1: Analyzing Data with the DRAGON Families Tool
- Guidelines for Understanding Results
- Commentary
- Literature Cited
- Figures
Materials
Basic Protocol 1: Preparing Data for Use with the DRAGON Database and Analyzing Data with Dragon View
Necessary Resources
Support Protocol 1: Analyzing Data with the DRAGON Families Tool
Necessary Resources
|
Figures
-
Figure 7.4.1 The DRAGON home page provides links to all available tools and data sources contained in DRAGON and DRAGON View. The page also contains links to all of the public data files that are used by DRAGON to generate its database. View Image -
Figure 7.4.2 The DRAGON Annotate page. (A ) The user is allowed to input data into a dialog box, or a tab‐delimited text file can be uploaded from a local file. (B ) The user selects options, then sends a request for annotation to the DRAGON database. Results may be returned as an HTML table, as a tab‐delimited text file (suitable for import into a spreadsheet such as Microsoft Excel), or as an E‐mail. View Image -
Figure 7.4.3 The DRAGON Families page. View Image -
Figure 7.4.4 As the final step in the analysis of the demonstration data, each time point contained in the Iyer et al. () data set, after having been associated with SWISS‐PROT keyword information by DRAGON Annotate, is analyzed using the DRAGON Families tool. The most coordinately up‐regulated gene families are shown here for three time points (15 min, 6 hr and 24 hr). Each gene is represented in its corresponding family as a box that is clickable and hyperlinked to the NCBI LocusLink entry for that gene. Across each row, all the boxes correspond to genes in a given family. Each box is also color‐coded on a scale from red (up‐regulated) to green (down‐regulated). A scale at the top of the analysis page (not shown) gives the association of colors with ratio values. For all the functional families that are annotated, the program returns the families ranked in order according to the average ratio expression value for all of the genes in that group. Note that overall there is less differential regulation occurring at the 15‐min time point since there are no bright red squares present. By 6 hr certain gene families, particularly those associated with inflammatory responses, are coordinately up‐regulated. Finally by 24 hr, cell cycle and mitotic gene families are coordinately differentially regulated, indicating that the cells are progressing through the cell cycle. View Image -
Figure 7.4.5 Examples of the graphical outputs of the three types of DRAGON View tools. (A ) DRAGON Families produces rows of green (down‐regulated), red (up‐regulated), and gray (unchanged) boxes (see scale for the range of ratio values represented by each color). Each box represents one gene and is hyperlinked to its corresponding UniGene entry. Each row has a type identifier to its right that is hyperlinked to its description. To the far right is the average ratio expression value for all of the genes in that family. All rows are sorted from the most up‐regulated family to the most down‐regulated family. (B ) DRAGON Order produces rows of black lines. Each line represents one gene and its location in the row represents its position on a gene list sorted by ratio expression values. Lines at the far left of represent the most up‐regulated genes (+) and lines at the far right represent the most down‐regulated (–). Each row's type (e.g., SWISS‐PROT keywords) is listed to the right. (C ) DRAGON Paths maps the location and ratio expression value of genes from the submitted gene list on to KEGG cellular pathway diagrams. A green (down‐regulated), red (up‐regulated) or gray (unchanged) circle followed by the ratio expression value is mapped to the upper left corner of each corresponding protein box. Each protein box is hyperlinked to its corresponding LocusLink entry. View Image -
Figure 7.4.6 Database architecture for DRAGON. The data contained in the DRAGON database is derived from Web‐accessible databases that are downloaded by FTP, parsed using Perl scripts, and stored in tables in the MySQL relational database management system. The DRAGON database is housed on a Dell PowerEdge 6300 dual processor server. The front end consists of a Web site that is searched using Perl (.cgi) scripts to allow for user‐defined queries of the database. View Image -
Figure 7.4.7 Overview of the information in DRAGON. This diagram represents a subset of the tables now available in DRAGON and the possible connections between them. Depending upon what type of information is desired different sets of tables are joined with the table containing microarray gene expression data that is as example, “Incyte Array Data” and “Incyte Numbers” in this diagram. Two “UniGene Human Numbers” tables are used to expand the “GenBank #s” from the “Incyte Numbers” table into all “GenBank #s” associated with each “UniGene ID” thereby providing a bridge between “GenBank #s” from the “Incyte Numbers” table and the “Swissprot Numbers”, “TrEMBL Numbers”, “Transfac Factors” and “Transfac Sites” tables. Further characterization of the proteins that genes from the microarray encode occurs by joining with tables derived from the SWISS‐PROT, Pfam, Interpro and OMIM databases. View Image -
Figure 7.4.8 DRAGON uses accession numbers to define biological characteristics of genes and proteins. A microarray is a regular array of thousands of unique cDNAs or oligonucleotides spotted on a solid support. Each spot contains cDNA corresponding to a specific gene that encodes a protein. Accession numbers derived from publicly available databases provide information about the biological characteristics of both the gene and its corresponding protein. At the gene level, “Transfac Site” and “Transfac Factor” numbers indicate the presence of promoter regions on the gene and factors that bind to those promoter regions respectively. The “GenBank no.” and “UniGene ID” refer to EST sequences corresponding to fragments of the gene and a cluster of those EST sequences respectively. The “UniGene Cytoband” indicates the chromosomal location of the gene. The “UniGene Name” is the name of the gene. The “OMIM no.” indicates whether the gene is known to be involved in any human diseases. At the protein level, “Pfam no.” and “Interpro no.” indicate which functional domains the protein contains. The “SWISS‐PROT no.” is a unique identifier for the protein and can be derived from either the SWISS‐PROT or TrEMBL databases. “SWISS‐PROT Keywords” are derived from a controlled vocabulary of 827 words that are assigned to proteins in the SWISS‐PROT database according to their function(s). “SWISS‐PROT Sequence” is the amino acid sequence for the protein. “SWISS‐PROT Name” is the SWISS‐PROT database name for the protein. View Image
Videos
Literature Cited
Bailey, S.N., Wu, R.Z., and Sabatini, D.M. 2002. Applications of transfected cell microarrays in high‐throughput drug discovery. Drug Discov. Today 7:S113‐S118. | |
Bouton, C.M. and Pevsner, J. 2001. DRAGON: Database Referencing of Array Genes Online. Bioinformatics 16:1038‐1039. | |
Bouton, C.M. and Pevsner, J. 2002. DRAGON View: Information visualization for annotated microarray data. Bioinformatics 18:323‐324. | |
Bowtell, D.D.L. 1999. Options available‐from start to finish‐for obtaining expression data by microarray. Nat. Genet. Suppl. 21:25‐32. | |
Cheung, V.G., Morley, M., Aguilar, F., Massimi, A., Kucherlapati, R., and Childs, G. 1999. Making and reading microarrays. Nat. Genet. Suppl. 21:15‐19. | |
Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P.O., and Herskowitz, I. 1998. The transcriptional program of sporulation in budding yeast. Science 282:699‐705. | |
Colantuoni, C., Henry, G., Zeger, S., and Pevsner, J. 2002. SNOMAD (Standardization and NOrmalization of MicroArray Data): Web‐accessible gene expression data analysis. Bioinformatics 18:1540‐1541. | |
Duggan, D.J., Bittner, M., Chen, Y., Meltzer, P., and Trent, J.M. 1999. Expression profiling using cDNA microarrays. Nat. Genet. Suppl. 21:10‐14. | |
Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome‐wide expression patterns. Proc. Natl. Acad. Sci. U.S.A.. 95:14863‐14868. | |
Frishman, D., Heumann, K., Lesk, A., and Mewes, H‐W. 1998. Comprehensive, comprehensible, distributed and intelligent databases: Current status. Bioinformatics 14:551‐561. | |
Gawantka, V., Pollet, N., Delius, H., Vingron, M., Pfister, R., Nitsch, R., Blumenstock, C., and Niehrs, C. 1998. Gene expression screening in Xenopus identifies molecular pathways, predicts gene function and provides a global view of embryonic gene expression. Mech. Dev. 77:95‐141. | |
Gibbons, F.D. and Roth, F.P. 2002. Judging the quality of gene expression‐based clustering methods using gene annotation. Genome Res. 12:1574‐81. | |
Heyer, L.J., Kruglyak, S., and Yooseph, S. 1999. Exploring expression data: Identification and analysis of coexpressed genes. Genome Res. 9:1106‐1115. | |
Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T., Lee, J.C., Trent, J.M., Staudt, L.M., Hudson, J. Jr., Boguski, M.S., Lashkari, D., Shalon, D., Botstein, D., and Brown, P.O. 1999. The transcriptional program in the response of human fibroblasts to serum. Science 283:83‐87. | |
Kanehisa, M. and Goto, S. 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28:27‐30. | |
Kanehisa, M. et al. 2002. The KEGG databases at GenomeNet. Nucleic Acids Res. 30:42‐46. | |
Lal, S.P., Christopherson, R.I., and dos Remedios, C.G. 2002. Antibody arrays: An embryonic but rapidly growing technology. Drug Discov. Today 7:S143‐S149. | |
Liang, S., Fuhrman, S., and Somogyi, R. 1998. Reveal, a general reverse engineering algorithm for inference of genetic network architectures. Pac. Symp. Biocomput. 3:18‐29. | |
Lipshutz, R.J., Fodor, S.P.A., Gingeras, T.R., and Lockhart, D.J. 1999. High density synthetic oligonucleotide arrays. Nat. Genet. Suppl. 21:20‐24. | |
Macauley, J., Wang, H., and Goodman, N. 1998. A model system for studying the integration of molecular biology databases. Bioinformatics 14:575‐582. | |
Michaels, G.S., Carr, D.B., Askenaki, M., Fuhrman, S., Wen, X., and Somogyi, R. 1998. Cluster analysis and data visualization of large‐scale gene expression data. Pacific Symp. Biocomp. 3:42‐53. | |
Somogyi, R., Fuhrman, S., Askenazi, M., and Wuensche, A. 1997. The gene expression matrix: Towards the extraction of genetic network architectures. Proc. Second World Cong. Nonlinear Analysts 1996. 30:1815‐1824. | |
Spellman, P.T. and Rubin, G.M. 2002. Evidence for large domains of similarly expressed genes in the Drosophila genome. J. Biol. 1:5.1‐5.8. | |
Szallasi, Z. 1999. Genetic network analysis in light of massively parallel biological data acquisition. Pac. Symp. Biocomp. 4:5‐16. | |
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., and Golub, T.R. 1999. Interpreting patterns of gene expression with self‐organizing maps: Methods and applications to hematopoetic differentiation. Proc. Natl. Acad. Sci. U.S.A. 96:2907‐2912. | |
Toronen, P., Kolehmainen, M., Wong, G., and Castren, E. 1999. Analysis of gene expression data using self‐organizing maps. FEBS Lett. 451:142‐146. | |
Velculescu, V.E., Zhang, L., Vogelstein, B., and Kinzler, K.W. 1995. Serial analysis of gene expression. Science 270:484‐7. | |
Wen, X., Fuhrman, S., Michaels, G.S., Carr, D.B., Smith, S., Barker, J.L., and Somogyi, R. 1998. Large‐scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. U.S.A. 95:334‐339. | |
Zhang, M.Q. 1999. Large‐scale gene expression data analysis: A new challenge to computational biologists. Genome Res. 9:681‐688. | |
Key References | |
Bouton and Pevsner, 2001. See above. | |
Original publication concerning the DRAGON database. | |
Bouton and Pevsner, 2002. See above. | |
Original publication concerning the DRAGON View visualization tools. | |
Bouton, C.M., Hossain, M.A., Frelin, L.P., Laterra, J., and Pevsner, J. 2001. Microarray analysis of differential gene expression in lead‐exposed astrocytes. Toxicol. Appl. Pharmacol. 176:34‐53. | |
Research publication that reports use of DRAGON and DRAGON View in the context of a toxicogenomic microarray study. | |
Iyer et al. 1999. See above. | |
Reports the microarray study from which the example data sets for this unit were derived. |