丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

Data Storage and Analysis in ArrayExpress and Expression Profiler

互联网

757
  • Abstract
  • Table of Contents
  • Figures
  • Literature Cited

Abstract

 

ArrayExpress at the European Bioinformatics Institute is a public database for MIAME?compliant microarray and transcriptomics data. It consists of two parts: the ArrayExpress Repository, which is a public archive of microarray data, and the ArrayExpress Warehouse of Gene Expression Profiles, which contains additionally curated subsets of data from the Repository. Archived experiments can be queried by experimental attributes, such as keywords, species, array platform, publication details, or accession numbers. Gene expression profiles can be queried by gene names and properties, such as Gene Ontology terms, allowing expression profiles visualization. The data can be exported and analyzed using the online data analysis tool named Expression Profiler. Data analysis components, such as data preprocessing, filtering, differentially expressed gene finding, clustering methods, and ordination?based techniques, as well as other statistical tools are all available in Expression Profiler, via integration with the statistical package R. Curr. Protoc. Bioinform. 23:7.13.1?7.13.27. © 2008 by John Wiley & Sons, Inc.

Keywords: gene expression; microarrays; transcriptomics; public repository; data analysis

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Querying Gene Expression Profiles
  • Basic Protocol 2: Query the AE Repository of Microarray and Transcriptomics Data
  • Basic Protocol 3: How to Upload, Normalize, Analyze, and Visualize Data in Expression Profiler
  • Basic Protocol 4: How to Perform Clustering Analysis in Expression Profiler
  • Basic Protocol 5: How to Calculate Gene Ontology Term Enrichment in Expression Profiler
  • Basic Protocol 6: How to Calculate Chromosome Co‐Localization Probability in Expression Profiler
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure Figure 7.13.1 The ArrayExpress query windows (http://www.ebi.ac.uk/arrayexpress).
    View Image
  •   Figure Figure 7.13.2 Output window after querying the AE Warehouse for the expression profiles of a particular gene (e.g., nfkbia).
    View Image
  •   Figure Figure 7.13.3 Zoomed‐in view of a particular experiment. The main graph shows the expression profile of the selected gene (e.g., nfkbia), for all experimental samples, based on the selected experimental factor.
    View Image
  •   Figure Figure 7.13.4 Similarity search output window. The expression profile of the selected gene (e.g., nfkbia) is plotted together with the ones of the 3 genes showing the closest similarity in expression pattern, within the same experiment. The corresponding gene symbols are listed on the right (Ier2, Fos, and Jun).
    View Image
  •   Figure Figure 7.13.5 Gene selection page. When more than one gene matches the query, this window allows refining the search, querying for multiple genes or restricting the search to perfect matches only.
    View Image
  •   Figure Figure 7.13.6 Output window after querying the AE Repository for a particular set of experiments, using a word or phrase (e.g., cell cycle) and selecting a species (e.g., Schizosaccharomyces pombe ). The total number of experiments and corresponding samples retrieved appears at the bottom of the page.
    View Image
  •   Figure Figure 7.13.7 Expanded view of a single experiment with links to several experiment annotation files and data retrieval page.
    View Image
  •   Figure Figure 7.13.8 Top: Data retrieval page, Processed data group detail—Experimental conditions. This section of the page allows the user to select the experimental conditions to be included in the data matrix for further analysis. Bottom: Data retrieval page, Processed data group detail—Quantitation Types and Design Element Properties. This section of the page allows the user to select the format of normalized data and the type of annotation to be included in the data matrix for further analysis.
    View Image
  •   Figure Figure 7.13.9 The Expression Profiler main page (http://www.ebi.ac.uk/expressionprofiler/).
    View Image
  •   Figure Figure 7.13.10 Upload/Expression data windows in EP. The user can directly upload data in a variety of tabular formats (top) or in Affymetrix format (bottom).
    View Image
  •   Figure Figure 7.13.11 Data selection view in EP. This window is divided in 3 mains sections: current dataset (top), descriptive statistics (middle), and subselection menu (bottom).
    View Image
  •   Figure Figure 7.13.12 Data normalization output graphs. The results of data normalization can be viewed as a box plot of Perfect Match (PM) log intensities distribution (top) or in the descriptive statistic view (bottom). Above the line graph, the post‐normalization mean and standard deviation values are displayed.
    View Image
  •   Figure Figure 7.13.13 Data transformation output graph. The transformed data is now shown in the descriptive statistic view. At the top of the graph, the post‐transformation mean and standard deviation values are displayed.
    View Image
  •   Figure Figure 7.13.14 t ‐test analysis output graphs. The t ‐test analysis results are summarized in a table, where the genes are ranked according to the p ‐value, with the most significant genes at the top (left). The top 15 genes are also plotted in a graph (right).
    View Image
  •   Figure Figure 7.13.15 Define new factor window. When running an ordination‐based technique, the user might need to create a new experimental factor in order to identify the genes differentially expressed between 2 conditions. In this example, the genotype is the discriminating factor (wild type versus knock‐out) and the new factor can be created filling the table as shown.
    View Image
  •   Figure Figure 7.13.16 Between Group Analysis window in EP. The user can select which factor determines the group for the analysis, the type of transformation to use, and the output options.
    View Image
  •   Figure Figure 7.13.17 A comparison between hierarchical clustering (correlation‐based distance, average linkage) and k‐means clustering (correlation‐based distance, k = 5) in the S. pombe stress response dataset E‐MEXP‐29. The normalized data was retrieved from Array Express as described in , step 2 and loaded into EP as tab‐delimited file. Data were log transformed and the 140 most varying genes (>0.9 SD in 60% of the hybridization) selected for clustering comparison. For additional information refer to Torrente el al. (). Line thickness is proportional to the number of elements common to both sets. By placing the mouse cursor over a line, a Venn diagram is displayed showing the number of elements in the 2 clusters and the overlap.
    View Image
  •   Figure Figure 7.13.18 Gene ontology annotation output. The results of GO terms enrichment for a given gene list are summarized in this table.
    View Image
  •   Figure Figure 7.13.19 Chromosome co‐localization output. Probabilities of co‐localization of regulated genes are plotted onto a human karyogram. Chromosomal regions are colored according to decreasing probability of co‐localization occurring by chance, with red < = 0.01, orange < = 0.02, yellow < = 0.03, light blue < = 0.04 and green < = 0.05.
    View Image

Videos

Literature Cited

Literature Cited
   Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., Harris, M.A., Hill, D.P., Issel‐Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., and Sherlock, G. 2000. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25: 25‐29.
   Ball, C., Brazma, A., Causton, H., Chervitz, S., Edgar, R., Hingamp, P., Matese, J.C., Icahn, C., Parkinson, H., Quackenbush, J., Ringwald, M., Sansone, S.A., Sherlock, G., Spellman, P., Stoeckert, C., Tateno, Y., Taylor, R., White, J., and Winegarden, N. 2004. An open letter on microarray data from the MGED Society. Microbiology 150: 3522‐3524.
   Benjamini, Y. and Hochberg, Y. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57: 289‐300.
   Blake, J., Schwager, C., Kapushesky, M., and Brazma, A. 2006. ChroCoLoc: An application for calculating the probability of co‐localization of microarray gene expression. Bioinformatics 22: 765‐767.
   Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C.A., Causton, H.C., Gaasterland, T., Glenisson, P., Holstege, F.C., Kim, I.F., Markowitz, V., Matese, J.C., Parkinson, H., Robinson, A., Sarkans, U., Schulze‐Kremer, S., Stewart, J., Taylor, R., Vilo, J., and Vingron, M. 2001. Minimum information about a microarray experiment (MIAME)‐toward standards for microarray data. Nat. Genet. 29: 365‐371.
   Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., Vilo, J., Abeygunawardena, N., Holloway, E., Kapushesky, M., Kemmeren, P., Lara, G.G., Oezcimen, A., Rocca‐Serra, P., and Sansone, S.A. 2003. ArrayExpress‐a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31: 68‐71.
   Culhane, A.C., Perriere, G., Considine, E.C., Cotter, T.G., and Higgins, D.G. 2002. Between‐group analysis of microarray data. Bioinformatics 18: 1600‐1608.
   Culhane, A.C., Thioulouse, J., Perriere, G., and Higgins, D.G. 2005. MADE4: An R package for multivariate analysis of gene expression data. Bioinformatics 21: 2789‐2790.
   Edgar, R., Domrachev, M., and Lash, A.E. 2002. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30: 207‐210.
   Hochberg, Y. 1988. A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75: 800‐803.
   Holm, S. 1979. A simple sequentially rejective Bonferroni test procedure. Scand. J. Stat. 6: 65‐70.
   Huber, W., von Heydebreck, A., Sultmann, H., Poustka, A., and Vingron, M. 2002. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18: S96‐S104.
   Ihaka, R. and Gentleman, R. 1996. R: A language for data analysis and graphics. J. Comput. Graph. Stat. 5: 299‐314.
   Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y., and Barkai, N. 2002. Revealing modular organization in the yeast transcriptional network. Nat. Genet. 31: 370‐377.
   Ikeo, K., Ishi‐i, J., Tamura, T., Gojobori, T., and Tateno, Y. 2003. CIBEX: Center for information biology gene expression database. C. R. Biol. 326: 1079‐1082.
   Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L.M., Hobbs, B., and Speed, T.P. 2003. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31:e15.
   Johansson, P. and Hakkinen, J. 2006. Improving missing value imputation of microarray data by using spot quality weights. BMC Bioinformatics 7: 306.
   Kapushesky, M., Kemmeren, P., Culhane, A.C., Durinck, S., Ihmels, J., Korner, C., Kull, M., Torrente, A., Sarkans, U., Vilo, J., and Brazma, A. 2004. Expression Profiler: Next generation‐an online platform for analysis of microarray data. Nucleic Acids Res. 32: W465‐ W470.
   Li, C. and Wong, W.H. 2001. Model‐based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proc. Natl. Acad. Sci. U.S.A. 98: 31‐36.
   Manly, K.F., Nettleton, D., and Hwang, J.T. 2004. Genomics, prior probability, and statistical tests of multiple hypotheses. Genome Res. 14: 997‐1001.
   Pounds, S. 2006. Estimation and control of multiple testing error rates for microarray studies. Brief. Bioinform. 7: 25‐36.
   Quackenbush, J. 2001. Computational analysis of microarray data. Nat. Rev. Genet. 2: 418‐427.
   Quackenbush, J. 2002. Microarray data normalization and transformation. Nat. Genet. 32: 496‐501.
   Rayner, T.F., Rocca‐Serra, P., Spellman, P.T., Causton, H.C., Farne, A., Holloway, E., Irizarry, R.A., Liu, J., Maier, D.S., Miller, M., Petersen, K., Quackenbush, J., Sherlock, G., Stoeckert, C.J., White, J., Whetzel, P.L., Wymore, F., Parkinson, H., Sarkans, U., Ball, C.A., and Brazma, A. 2006. A simple spreadsheet‐based, MIAME‐supportive format for microarray data: MAGE‐TAB. BMC Bioinformatics 7: 489.
   Smyth, G.K. 2004. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3: Article3.
   Torrente, A., Kapushesky, M., and Brazma, A. 2005. A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings. Bioinformatics 21: 3993‐3999.
   Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R.B. 2001. Missing value estimation methods for DNA microarrays. Bioinformatics 17: 520‐525.
   Wu, Z., Irizarry, R., Gentleman, R., Martinez‐Murillo, F., and Spencer, F. 2004. A model‐based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc. 99: 909‐917.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
 
ad image
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
扫码领资料
反馈
TOP
打开小程序