我要登录|
免费注册
|
我的丁香通
- 企业机构：
- 成为企业机构
- 个人用户：
- 个人中心
移动端

大家都在搜

0 人通过求购买到了急需的产品

免费发布求购

发布求购

Data Storage and Analysis in ArrayExpress and Expression Profiler

互联网2013-12-31

809

Abstract
Table of Contents
Figures
Literature Cited

Abstract

ArrayExpress at the European Bioinformatics Institute is a public database for MIAME?compliant microarray and transcriptomics data. It consists of two parts: the ArrayExpress Repository, which is a public archive of microarray data, and the ArrayExpress Warehouse of Gene Expression Profiles, which contains additionally curated subsets of data from the Repository. Archived experiments can be queried by experimental attributes, such as keywords, species, array platform, publication details, or accession numbers. Gene expression profiles can be queried by gene names and properties, such as Gene Ontology terms, allowing expression profiles visualization. The data can be exported and analyzed using the online data analysis tool named Expression Profiler. Data analysis components, such as data preprocessing, filtering, differentially expressed gene finding, clustering methods, and ordination?based techniques, as well as other statistical tools are all available in Expression Profiler, via integration with the statistical package R. Curr. Protoc. Bioinform. 23:7.13.1?7.13.27. © 2008 by John Wiley & Sons, Inc.

Keywords: gene expression; microarrays; transcriptomics; public repository; data analysis

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Introduction
Basic Protocol 1: Querying Gene Expression Profiles
Basic Protocol 2: Query the AE Repository of Microarray and Transcriptomics Data
Basic Protocol 3: How to Upload, Normalize, Analyze, and Visualize Data in Expression Profiler
Basic Protocol 4: How to Perform Clustering Analysis in Expression Profiler
Basic Protocol 5: How to Calculate Gene Ontology Term Enrichment in Expression Profiler
Basic Protocol 6: How to Calculate Chromosome Co‐Localization Probability in Expression Profiler
Guidelines for Understanding Results
Commentary
Literature Cited
Figures
Tables

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 7.13.1 The ArrayExpress query windows (http://www.ebi.ac.uk/arrayexpress).

View Image

Figure 7.13.2 Output window after querying the AE Warehouse for the expression profiles of a particular gene (e.g., nfkbia).

View Image

Figure 7.13.3 Zoomed‐in view of a particular experiment. The main graph shows the expression profile of the selected gene (e.g., nfkbia), for all experimental samples, based on the selected experimental factor.

View Image

Figure 7.13.4 Similarity search output window. The expression profile of the selected gene (e.g., nfkbia) is plotted together with the ones of the 3 genes showing the closest similarity in expression pattern, within the same experiment. The corresponding gene symbols are listed on the right (Ier2, Fos, and Jun).

View Image

Figure 7.13.5 Gene selection page. When more than one gene matches the query, this window allows refining the search, querying for multiple genes or restricting the search to perfect matches only.

View Image

Figure 7.13.6 Output window after querying the AE Repository for a particular set of experiments, using a word or phrase (e.g., cell cycle) and selecting a species (e.g., Schizosaccharomyces pombe ). The total number of experiments and corresponding samples retrieved appears at the bottom of the page.

View Image

Figure 7.13.7 Expanded view of a single experiment with links to several experiment annotation files and data retrieval page.

View Image

Figure 7.13.8 Top: Data retrieval page, Processed data group detail—Experimental conditions. This section of the page allows the user to select the experimental conditions to be included in the data matrix for further analysis. Bottom: Data retrieval page, Processed data group detail—Quantitation Types and Design Element Properties. This section of the page allows the user to select the format of normalized data and the type of annotation to be included in the data matrix for further analysis.

View Image

Figure 7.13.9 The Expression Profiler main page (http://www.ebi.ac.uk/expressionprofiler/).

View Image

Figure 7.13.10 Upload/Expression data windows in EP. The user can directly upload data in a variety of tabular formats (top) or in Affymetrix format (bottom).

View Image

Figure 7.13.11 Data selection view in EP. This window is divided in 3 mains sections: current dataset (top), descriptive statistics (middle), and subselection menu (bottom).

View Image

Figure 7.13.12 Data normalization output graphs. The results of data normalization can be viewed as a box plot of Perfect Match (PM) log intensities distribution (top) or in the descriptive statistic view (bottom). Above the line graph, the post‐normalization mean and standard deviation values are displayed.

View Image

Figure 7.13.13 Data transformation output graph. The transformed data is now shown in the descriptive statistic view. At the top of the graph, the post‐transformation mean and standard deviation values are displayed.

View Image

Figure 7.13.14 t ‐test analysis output graphs. The t ‐test analysis results are summarized in a table, where the genes are ranked according to the p ‐value, with the most significant genes at the top (left). The top 15 genes are also plotted in a graph (right).

View Image

Figure 7.13.15 Define new factor window. When running an ordination‐based technique, the user might need to create a new experimental factor in order to identify the genes differentially expressed between 2 conditions. In this example, the genotype is the discriminating factor (wild type versus knock‐out) and the new factor can be created filling the table as shown.

View Image

Figure 7.13.16 Between Group Analysis window in EP. The user can select which factor determines the group for the analysis, the type of transformation to use, and the output options.

View Image

Figure 7.13.17 A comparison between hierarchical clustering (correlation‐based distance, average linkage) and k‐means clustering (correlation‐based distance, k = 5) in the S. pombe stress response dataset E‐MEXP‐29. The normalized data was retrieved from Array Express as described in , step 2 and loaded into EP as tab‐delimited file. Data were log transformed and the 140 most varying genes (>0.9 SD in 60% of the hybridization) selected for clustering comparison. For additional information refer to Torrente el al. (). Line thickness is proportional to the number of elements common to both sets. By placing the mouse cursor over a line, a Venn diagram is displayed showing the number of elements in the 2 clusters and the overlap.

View Image

Figure 7.13.18 Gene ontology annotation output. The results of GO terms enrichment for a given gene list are summarized in this table.

View Image

Figure 7.13.19 Chromosome co‐localization output. Probabilities of co‐localization of regulated genes are plotted onto a human karyogram. Chromosomal regions are colored according to decreasing probability of co‐localization occurring by chance, with red < = 0.01, orange < = 0.02, yellow < = 0.03, light blue < = 0.04 and green < = 0.05.

View Image

Videos

Literature Cited

Literature Cited
	Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., Harris, M.A., Hill, D.P., Issel‐Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., and Sherlock, G. 2000. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25: 25‐29.
	Ball, C., Brazma, A., Causton, H., Chervitz, S., Edgar, R., Hingamp, P., Matese, J.C., Icahn, C., Parkinson, H., Quackenbush, J., Ringwald, M., Sansone, S.A., Sherlock, G., Spellman, P., Stoeckert, C., Tateno, Y., Taylor, R., White, J., and Winegarden, N. 2004. An open letter on microarray data from the MGED Society. Microbiology 150: 3522‐3524.
	Benjamini, Y. and Hochberg, Y. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57: 289‐300.
	Blake, J., Schwager, C., Kapushesky, M., and Brazma, A. 2006. ChroCoLoc: An application for calculating the probability of co‐localization of microarray gene expression. Bioinformatics 22: 765‐767.
	Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C.A., Causton, H.C., Gaasterland, T., Glenisson, P., Holstege, F.C., Kim, I.F., Markowitz, V., Matese, J.C., Parkinson, H., Robinson, A., Sarkans, U., Schulze‐Kremer, S., Stewart, J., Taylor, R., Vilo, J., and Vingron, M. 2001. Minimum information about a microarray experiment (MIAME)‐toward standards for microarray data. Nat. Genet. 29: 365‐371.
	Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., Vilo, J., Abeygunawardena, N., Holloway, E., Kapushesky, M., Kemmeren, P., Lara, G.G., Oezcimen, A., Rocca‐Serra, P., and Sansone, S.A. 2003. ArrayExpress‐a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31: 68‐71.
	Culhane, A.C., Perriere, G., Considine, E.C., Cotter, T.G., and Higgins, D.G. 2002. Between‐group analysis of microarray data. Bioinformatics 18: 1600‐1608.
	Culhane, A.C., Thioulouse, J., Perriere, G., and Higgins, D.G. 2005. MADE4: An R package for multivariate analysis of gene expression data. Bioinformatics 21: 2789‐2790.
	Edgar, R., Domrachev, M., and Lash, A.E. 2002. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30: 207‐210.
	Hochberg, Y. 1988. A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75: 800‐803.
	Holm, S. 1979. A simple sequentially rejective Bonferroni test procedure. Scand. J. Stat. 6: 65‐70.
	Huber, W., von Heydebreck, A., Sultmann, H., Poustka, A., and Vingron, M. 2002. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18: S96‐S104.
	Ihaka, R. and Gentleman, R. 1996. R: A language for data analysis and graphics. J. Comput. Graph. Stat. 5: 299‐314.
	Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y., and Barkai, N. 2002. Revealing modular organization in the yeast transcriptional network. Nat. Genet. 31: 370‐377.
	Ikeo, K., Ishi‐i, J., Tamura, T., Gojobori, T., and Tateno, Y. 2003. CIBEX: Center for information biology gene expression database. C. R. Biol. 326: 1079‐1082.
	Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L.M., Hobbs, B., and Speed, T.P. 2003. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31:e15.
	Johansson, P. and Hakkinen, J. 2006. Improving missing value imputation of microarray data by using spot quality weights. BMC Bioinformatics 7: 306.
	Kapushesky, M., Kemmeren, P., Culhane, A.C., Durinck, S., Ihmels, J., Korner, C., Kull, M., Torrente, A., Sarkans, U., Vilo, J., and Brazma, A. 2004. Expression Profiler: Next generation‐an online platform for analysis of microarray data. Nucleic Acids Res. 32: W465‐ W470.
	Li, C. and Wong, W.H. 2001. Model‐based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proc. Natl. Acad. Sci. U.S.A. 98: 31‐36.
	Manly, K.F., Nettleton, D., and Hwang, J.T. 2004. Genomics, prior probability, and statistical tests of multiple hypotheses. Genome Res. 14: 997‐1001.
	Pounds, S. 2006. Estimation and control of multiple testing error rates for microarray studies. Brief. Bioinform. 7: 25‐36.
	Quackenbush, J. 2001. Computational analysis of microarray data. Nat. Rev. Genet. 2: 418‐427.
	Quackenbush, J. 2002. Microarray data normalization and transformation. Nat. Genet. 32: 496‐501.
	Rayner, T.F., Rocca‐Serra, P., Spellman, P.T., Causton, H.C., Farne, A., Holloway, E., Irizarry, R.A., Liu, J., Maier, D.S., Miller, M., Petersen, K., Quackenbush, J., Sherlock, G., Stoeckert, C.J., White, J., Whetzel, P.L., Wymore, F., Parkinson, H., Sarkans, U., Ball, C.A., and Brazma, A. 2006. A simple spreadsheet‐based, MIAME‐supportive format for microarray data: MAGE‐TAB. BMC Bioinformatics 7: 489.
	Smyth, G.K. 2004. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3: Article3.
	Torrente, A., Kapushesky, M., and Brazma, A. 2005. A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings. Bioinformatics 21: 3993‐3999.
	Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R.B. 2001. Missing value estimation methods for DNA microarrays. Bioinformatics 17: 520‐525.
	Wu, Z., Irizarry, R., Gentleman, R., Martinez‐Murillo, F., and Spencer, F. 2004. A model‐based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc. 99: 909‐917.