• 我要登录|
  • 免费注册
    |
  • 我的丁香通
    • 企业机构:
    • 成为企业机构
    • 个人用户:
    • 个人中心
  • 移动端
    移动端
丁香通 logo丁香实验_LOGO
搜实验

    大家都在搜

      大家都在搜

        0 人通过求购买到了急需的产品
        免费发布求购
        发布求购
        点赞
        收藏
        wx-share
        分享

        Installing, Maintaining, and Using a Local Copy of BLAST for Intranet and Workstation Use

        互联网

        874
        • Abstract
        • Table of Contents
        • Materials
        • Figures
        • Literature Cited

        Abstract

         

        The Basic Local Alignment Search Tool (BLAST) is one of the widest used and most useful applications in sequence?based bioinformatics analysis. Frequently it is not practical or possible to use remote BLAST services through the Internet due to restrictions of a security or technical nature or the need for high?throughput analysis requiring greater amounts of processing power than are available from remote services. This unit describes the steps involved in obtaining and installing a copy of the BLAST software for use on a local intranet or stand?alone workstation. Once installed, the BLAST package can be used to create BLAST?searchable nucleotide and protein sequence databanks. Various popular hardware (PPC, Intel) and operating system (MacOSX, FreeBSD and Linux) options for running and maintaining the software are discussed. Finally, steps for indexing proprietary and third party (publicly available) sequence databanks for use with BLAST and managing these resources are discussed.

        Keywords: BLAST; sequence similarity searching; Unix?like operating systems

             
         
        GO TO THE FULL PROTOCOL:
        PDF or HTML at Wiley Online Library

        Table of Contents

        • Strategic Planning
        • Basic Protocol 1: Installing and Running Blast Locally under Unix‐Like Operating Systems such as Linux
        • Alternate Protocol 1: Installing and Running Blast Locally under Microsoft Windows
        • Commentary
        • Literature Cited
        • Figures
        • Tables
             
         
        GO TO THE FULL PROTOCOL:
        PDF or HTML at Wiley Online Library

        Materials

        Basic Protocol 1: Installing and Running Blast Locally under Unix‐Like Operating Systems such as Linux

          Necessary Resources
        • Hardware
          • The hardware requirements for running BLAST locally are modest indeed: any Intel or equivalent (e.g., AMD)–based architecture will be adequate for running this protocol. However a few considerations on hardware need to be addressed:
            • BLAST can be computationally intensive. For example, if searching large databases, using CPU‐intensive BLAST algorithms (e.g., TBLASTX) or searching with many sequences, more CPU power is better.
            • BLAST can be memory‐hungry; for ideal performance, one should have enough memory to load the entire indexed database comfortably into RAM.
            • Databases can be large and require large amounts of disk space. Researchers who decide to take on ambitious projects such as downloading the entire GenBank database for local searching should keep this in mind.
          • The nice thing about BLAST is that the hardware needs can be scaled easily by adding more disk space or RAM or moving to multiprocessor architectures. In addition, if using BLAST to query a large number of sequences against a common database, this process can very easily be parallelized by copying the indexed database, the relevant query sequences, and the blastall application to any number of machines on which the analyses are to be performed, and running them at the same time.
        • Software
          • There are several programs in the stand‐alone BLAST package. The main ones that are needed to run BLAST locally are formatdb to create BLASTable databases and blastall to query these databases using any of the favorite BLAST algorithms (blastn, blastp, blastx, tblastn, and tblastx). formatdb is a program for formatting FASTA formatted databases for searching using BLAST. Details on formatdb can be found in the file README.formatdb distributed with the BLAST package. The options for formatdb are listed in Table 3.11.1 . blastall is the main BLAST application. It is used for running queries against the indexed databases created with formatdb. Details on blastall can be found in the file README.bls distributed with the BLAST package. Some of the most commonly used blastall options are listed in Table 3.11.2 .
            Table 3.1.1   Necessary Resources   Options for formatdb a   Options for formatdb   Options for blastall b   Options for blastall

            Option Explanation
            ‐t Title for database file [String] (optional)
            ‐i Input file(s) for formatting [File In] (this parameter must be set)
            ‐l Logfile name: [File Out] (optional)
              default = formatdb.log
            ‐p Type of file [T/F] (optional):
              T = protein
              F = nucleotide
              default = T
            ‐o Parse options [T/F] (optional):
              T (true) = parse SeqID and create indexes
              F (false) = do not parse SeqID; do not create indexes
              default = F
            ‐a Input file is database in ASN.1 format (otherwise FASTA is expected) [T/F](optional):
              T = True
              F = False
              default = F
            ‐b ASN.1 database in binary mode [T/F] (optional):
              T = binary
              F = text mode
              default = F
            ‐e Input is a Seq entry [T/F] (optional)
              default = F
            ‐n Base name for BLAST files [String] (optional)
            ‐v Number of sequence bases to be created in the volume [Integer] (optional)
              default = 0
            ‐s Create indexes limited only to accessions: sparse [T/F] (optional)default = F
            ‐V Verbose: check for nonunique string IDs in the database [T/F] (optional)default = F
            ‐A Create ASN.1 structured deflines [T/F] (optional)default = F
            Option Explanation
            ‐p Program name [String]
              Input should be one of blastp, blastn, blastx, tblastn, or, tblastx
            ‐d Database [String]
              default = nr
              The database specified must first be formatted with formatdb. An example would be ‐d nr est, which will search both the nr and est databases, presenting the results as if one “virtual” database consisting of all the entries from both were searched. The statistics are based on the “virtual” database of nr and est.
            ‐i Query file [File In]
              default = stdin
              The query should be in FASTA format. If multiple FASTA entries are in the input file, all queries will be searched.
            ‐e Expectation value (E ) [Real]
              default = 10.0
            ‐o BLAST report output file [File Out] (optional)
              default = stdout
            ‐F Filter query sequence (dust with BLASTN, seg with others) [String]
              default = T
              BLAST 2.0 and 2.1 use the dust low‐complexity filter for BLASTN and seg for the other programs. Both dust and seg are integral parts of the NCBI Toolkit and are accessed automatically. If one uses ‐F T then normal filtering by seg or dust (for BLASTN) occurs (likewise ‐F F means no filtering whatsoever). This option also takes a string as an argument. One may use such a string to change the specific parameters of seg or invoke other filters.
            ‐S Query strands to search against database (for BLAST[NX], and TBLASTX). 3 is both, 1 is top, 2 is bottom [Integer]
              default = 3
            ‐T Produce HTML output [T/F]
              default = F
            ‐l Restrict search of database to list of GI's [String] (optional)
              This option specifies that only a subset of the database should be searched, determined by the list of GI's (i.e., NCBI identifiers) in a file. One can obtain a list of gi's for a given Entrez query from http://www.ncbi.nlm.nih.gov/Entrez/batch.html. This file should be in the same directory as the database, or in the directory from which BLAST is called.
            ‐U Use lowercase filtering of FASTA sequence [T/F] (optional)
              This option specifies that any lower‐case letters in the input FASTA file should be masked

             a For an example of using these options, see protocol 1 , step .
            Table 3.1.2   Necessary Resources   Options for formatdb a   Options for formatdb   Options for blastall b   Options for blastall

            Option Explanation
            ‐t Title for database file [String] (optional)
            ‐i Input file(s) for formatting [File In] (this parameter must be set)
            ‐l Logfile name: [File Out] (optional)
              default = formatdb.log
            ‐p Type of file [T/F] (optional):
              T = protein
              F = nucleotide
              default = T
            ‐o Parse options [T/F] (optional):
              T (true) = parse SeqID and create indexes
              F (false) = do not parse SeqID; do not create indexes
              default = F
            ‐a Input file is database in ASN.1 format (otherwise FASTA is expected) [T/F](optional):
              T = True
              F = False
              default = F
            ‐b ASN.1 database in binary mode [T/F] (optional):
              T = binary
              F = text mode
              default = F
            ‐e Input is a Seq entry [T/F] (optional)
              default = F
            ‐n Base name for BLAST files [String] (optional)
            ‐v Number of sequence bases to be created in the volume [Integer] (optional)
              default = 0
            ‐s Create indexes limited only to accessions: sparse [T/F] (optional)default = F
            ‐V Verbose: check for nonunique string IDs in the database [T/F] (optional)default = F
            ‐A Create ASN.1 structured deflines [T/F] (optional)default = F
            Option Explanation
            ‐p Program name [String]
              Input should be one of blastp, blastn, blastx, tblastn, or, tblastx
            ‐d Database [String]
              default = nr
              The database specified must first be formatted with formatdb. An example would be ‐d nr est, which will search both the nr and est databases, presenting the results as if one “virtual” database consisting of all the entries from both were searched. The statistics are based on the “virtual” database of nr and est.
            ‐i Query file [File In]
              default = stdin
              The query should be in FASTA format. If multiple FASTA entries are in the input file, all queries will be searched.
            ‐e Expectation value (E ) [Real]
              default = 10.0
            ‐o BLAST report output file [File Out] (optional)
              default = stdout
            ‐F Filter query sequence (dust with BLASTN, seg with others) [String]
              default = T
              BLAST 2.0 and 2.1 use the dust low‐complexity filter for BLASTN and seg for the other programs. Both dust and seg are integral parts of the NCBI Toolkit and are accessed automatically. If one uses ‐F T then normal filtering by seg or dust (for BLASTN) occurs (likewise ‐F F means no filtering whatsoever). This option also takes a string as an argument. One may use such a string to change the specific parameters of seg or invoke other filters.
            ‐S Query strands to search against database (for BLAST[NX], and TBLASTX). 3 is both, 1 is top, 2 is bottom [Integer]
              default = 3
            ‐T Produce HTML output [T/F]
              default = F
            ‐l Restrict search of database to list of GI's [String] (optional)
              This option specifies that only a subset of the database should be searched, determined by the list of GI's (i.e., NCBI identifiers) in a file. One can obtain a list of gi's for a given Entrez query from http://www.ncbi.nlm.nih.gov/Entrez/batch.html. This file should be in the same directory as the database, or in the directory from which BLAST is called.
            ‐U Use lowercase filtering of FASTA sequence [T/F] (optional)
              This option specifies that any lower‐case letters in the input FASTA file should be masked

             b For an example of using these options, see protocol 1 , step .
        • Files
          • Input data files must be in FASTA format (see appendix 1B )
        GO TO THE FULL PROTOCOL:
        PDF or HTML at Wiley Online Library

        Figures

        •   Figure Figure 3.11.1 Verbose listing of all the files in the BLAST distribution file for Linux as they are “un‐tarred” (see , step ).
          View Image
        •   Figure Figure 3.11.2 The file fungus.fasta, used as an example query file (see , step ).
          View Image
        •   Figure Figure 3.11.3 The file fungus.blastp, an example of output from BLAST (see , steps and ).
          View Image

        Videos

        Literature Cited

           Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
           Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402.
        Internet Resources
           http://www.ibiostation.com
           Web site of iBiostation, from which the book iBiostation Linux: Bioinformatics for Linux (2003), by M. Hobbs, T. G. Littlejohn and K. Castle (BioLateral Pty. Ltd., Sydney, Au.; ISBN 0‐9750583‐0‐4), may be purchased.
        GO TO THE FULL PROTOCOL:
        PDF or HTML at Wiley Online Library
         
        ad image
        提问
        扫一扫
        丁香实验小程序二维码
        实验小助手
        丁香实验公众号二维码
        扫码领资料
        反馈
        TOP
        打开小程序