• 我要登录|
  • 免费注册
    |
  • 我的丁香通
    • 企业机构:
    • 成为企业机构
    • 个人用户:
    • 个人中心
  • 移动端
    移动端
丁香通 logo丁香实验_LOGO
搜实验

    大家都在搜

      大家都在搜

        0 人通过求购买到了急需的产品
        免费发布求购
        发布求购
        点赞
        收藏
        wx-share
        分享

        Using OrthoMCL to Assign Proteins to OrthoMCL‐DB Groups or to Cluster Proteomes Into New Ortholog Groups

        互联网

        1129
        • Abstract
        • Table of Contents
        • Figures
        • Literature Cited

        Abstract

         

        OrthoMCL is an algorithm for grouping proteins into ortholog groups based on their sequence similarity. OrthoMCL?DB is a public database that allows users to browse and view ortholog groups that were pre?computed using the OrthoMCL algorithm. Version 4 of this database contained 116,536 ortholog groups clustered from 1,270,853 proteins obtained from 88 eukaryotic genomes, 16 archaean genomes, and 34 bacterial genomes. Future versions of OrthoMCL?DB will include more proteomes as more genomes are sequenced. Here, we describe how you can group your proteins of interest into ortholog clusters using two different means provided by the OrthoMCL system. The OrthoMCL?DB Web site has a tool for uploading and grouping a set of protein sequences, typically representing a proteome. This method maps the uploaded proteins to existing groups in OrthoMCL?DB. Alternatively, if you have proteins from a set of genomes that need to be grouped, you can download, install, and run the stand?alone OrthoMCL software. Curr. Protoc. Bioinform. 35:6.12.1?6.12.19. © 2011 by John Wiley & Sons, Inc.

        Keywords: OrthoMCL; ortholog groups; paralog; proteome; Markov clustering; reciprocal best hits; MCL

             
         
        GO TO THE FULL PROTOCOL:
        PDF or HTML at Wiley Online Library

        Table of Contents

        • Introduction
        • Strategic Planning
        • Basic Protocol 1: Assign a Proteome to OrthoMCL‐DB Groups
        • Basic Protocol 2: Create Ortholog Groups from Your Proteomes Using the OrthoMCL Software
        • Support Protocol 1: Downloading, Installing, and Configuring the OrthoMCL Programs
        • Guidelines for Understanding Results
        • Commentary
        • Literature Cited
        • Figures
        • Tables
             
         
        GO TO THE FULL PROTOCOL:
        PDF or HTML at Wiley Online Library

        Materials

         
        GO TO THE FULL PROTOCOL:
        PDF or HTML at Wiley Online Library

        Figures

        •   Figure 6.12.1 Overview of the OrthoMCL algorithm. (1) Proteomes must each be in FASTA format where the file name and definition lines comply with simple requirements. (2) The proteome files are filtered to remove low‐quality sequences based on length and percent stop codons. (3) The proteomes are all compared to each other using BLASTP. They are masked with seg and an e‐value cutoff of 1e‐5 is applied. (4) For each pair of sequences that match, compute the “percent match length” score: count the number of amino acids in the shorter sequence that participate in any HSP, divide that by the length of the shorter sequence, and multiply by 100. Filter away matches with percent match < 50%. (5) For all pairs of proteomes, find all pairs of proteins across them that have hits as good as or better than any other hits between these proteins and other proteins in those species. (6) Find all pairs of proteins within a species that have mutual e‐values that are better than or equal to all of those proteins' hits to proteins in other species. (7) Find all pairs of proteins across two species that are connected through orthology and in‐parology. (8) Normalize in‐paralog e‐values by averaging all qualifying in‐paralog pairs in a genome and divide each pair by the average. Within a genome, in‐paralog pairs qualify if either of the proteins in the pair has an ortholog in any genome. If no in‐paralogs within a genome have any orthologs, all in‐paralogs in that genome qualify. Normalize ortholog and co‐ortholog pairs for any two species by averaging the e‐values across them, and normalize using that average. (9) Pass on all ortholog, in‐paralog, and co‐ortholog pairs, with their normalized e‐values, to the MCL program for clustering.
          View Image
        •   Figure 6.12.2 OrthoMCL‐DB home page with the Tools link circled.
          View Image
        •   Figure 6.12.3 A proteome mapped to OrthoMCL‐DB. The results are downloaded as a .zip file that contains five files. Shown here is the orthologGroups file obtained after submitting the Erwinia carotovora proteome (Bell et al., ).
          View Image

        Videos

        Literature Cited

           Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
           Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths‐Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., Studholme, D.J., Yeats, C., and Eddy, S.R. 2004. The Pfam protein families database. Nucleic Acids Res. 32:D138‐D141.
           Bell, K.S., Sebaihia, M., Pritchard, L., Holden, M.T., Hyman, L.J., Holeva, M.C., Thomson, N.R., Bentley, S.D., Churcher, L.J., Mungall, K., Atkin, R., Bason, N., Brooks, K., Chillingworth, T., Clark, K., Doggett, J., Fraser, A., Hance, Z., Hauser, H., Jagels, K., Moule, S., Norbertczak, H., Ormond, D., Price, C., Quail, M.A., Sanders, M., Walker, D., Whitehead, S., Salmond, G.P., Birch, P.R., Parkhill, J., and Toth, I.K. 2004. Genome sequence of the enterobacterial phytopathogen Erwinia carotovora subsp. atroseptica and characterization of virulence factors. Proc. Natl. Acad. Sci. U.S.A. 101:11105‐11110.
           Chen, F., Mackey, A.J., Stoeckert, C.J. Jr., and Roos, D.S. 2006. OrthoMCL‐DB: Querying a comprehensive multi‐species collection of ortholog groups. Nucleic Acids Res. 34:D363‐D368.
           Chen, F., Mackey, A.J., Vermunt, J.K., and Roos, D.S. 2007. Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS One. 2:e383.
           Enright, A.J., Van Dongen, S., and Ouzounis, C.A. 2002. An efficient algorithm for large‐scale detection of protein families. Nucleic Acids Res. 30:1575‐1584.
           The Gene Ontology Consortium. 2000. Gene ontology: Tool for the unification of biology. Nat. Genet. 25:25‐29.
           Li, L., Stoeckert, C.J. Jr., and Roos, D.S. 2003. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 13:2178‐2189.
           Webb, E., and International Union of Biochemistry and Molecular Biology. Enzyme nomenclature 1992: Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the nomenclature and classification of enzymes. 1984th ed. Academic Press, New York.
        Key References
           Li et al., 2003. See above.
           The original paper describing the OrthoMCL algorithm.
           Chen et al., 2006. See above.
           A paper describing the OrthoMCL‐DB.
           Chen et al., 2007. See above.
           A paper comparing OrthoMCL to other approaches.
        Internet Resources
           http://orthomcl.org
           The OrthoMCL‐Db site
           http://pfam.sanger.ac.uk/search#tabview=tab1
           Submit a set of proteins to find Pfam domains
           http://www.ebi.ac.uk/Tools/msa/clustalw2/
           Submit a set of proteins for multiple sequence alignment
           http://www.biolayout.org/
           Download software to visualize groups using Biolayout.
        GO TO THE FULL PROTOCOL:
        PDF or HTML at Wiley Online Library
         
        ad image
        提问
        扫一扫
        丁香实验小程序二维码
        实验小助手
        丁香实验公众号二维码
        扫码领资料
        反馈
        TOP
        打开小程序