丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

Installing, Maintaining, and Using a Local Copy of BLAST for Intranet and Workstation Use

互联网

855
  • Abstract
  • Table of Contents
  • Materials
  • Figures
  • Literature Cited

Abstract

 

The Basic Local Alignment Search Tool (BLAST) is one of the widest used and most useful applications in sequence?based bioinformatics analysis. Frequently it is not practical or possible to use remote BLAST services through the Internet due to restrictions of a security or technical nature or the need for high?throughput analysis requiring greater amounts of processing power than are available from remote services. This unit describes the steps involved in obtaining and installing a copy of the BLAST software for use on a local intranet or stand?alone workstation. Once installed, the BLAST package can be used to create BLAST?searchable nucleotide and protein sequence databanks. Various popular hardware (PPC, Intel) and operating system (MacOSX, FreeBSD and Linux) options for running and maintaining the software are discussed. Finally, steps for indexing proprietary and third party (publicly available) sequence databanks for use with BLAST and managing these resources are discussed.

Keywords: BLAST; sequence similarity searching; Unix?like operating systems

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Strategic Planning
  • Basic Protocol 1: Installing and Running Blast Locally under Unix‐Like Operating Systems such as Linux
  • Alternate Protocol 1: Installing and Running Blast Locally under Microsoft Windows
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Installing and Running Blast Locally under Unix‐Like Operating Systems such as Linux

  Necessary Resources
  • Hardware
    • The hardware requirements for running BLAST locally are modest indeed: any Intel or equivalent (e.g., AMD)–based architecture will be adequate for running this protocol. However a few considerations on hardware need to be addressed:
      • BLAST can be computationally intensive. For example, if searching large databases, using CPU‐intensive BLAST algorithms (e.g., TBLASTX) or searching with many sequences, more CPU power is better.
      • BLAST can be memory‐hungry; for ideal performance, one should have enough memory to load the entire indexed database comfortably into RAM.
      • Databases can be large and require large amounts of disk space. Researchers who decide to take on ambitious projects such as downloading the entire GenBank database for local searching should keep this in mind.
    • The nice thing about BLAST is that the hardware needs can be scaled easily by adding more disk space or RAM or moving to multiprocessor architectures. In addition, if using BLAST to query a large number of sequences against a common database, this process can very easily be parallelized by copying the indexed database, the relevant query sequences, and the blastall application to any number of machines on which the analyses are to be performed, and running them at the same time.
  • Software
    • There are several programs in the stand‐alone BLAST package. The main ones that are needed to run BLAST locally are formatdb to create BLASTable databases and blastall to query these databases using any of the favorite BLAST algorithms (blastn, blastp, blastx, tblastn, and tblastx). formatdb is a program for formatting FASTA formatted databases for searching using BLAST. Details on formatdb can be found in the file README.formatdb distributed with the BLAST package. The options for formatdb are listed in Table 3.11.1 . blastall is the main BLAST application. It is used for running queries against the indexed databases created with formatdb. Details on blastall can be found in the file README.bls distributed with the BLAST package. Some of the most commonly used blastall options are listed in Table 3.11.2 .
      Table 3.1.1   Necessary Resources   Options for formatdb a   Options for formatdb   Options for blastall b   Options for blastall

      Option Explanation
      ‐t Title for database file [String] (optional)
      ‐i Input file(s) for formatting [File In] (this parameter must be set)
      ‐l Logfile name: [File Out] (optional)
        default = formatdb.log
      ‐p Type of file [T/F] (optional):
        T = protein
        F = nucleotide
        default = T
      ‐o Parse options [T/F] (optional):
        T (true) = parse SeqID and create indexes
        F (false) = do not parse SeqID; do not create indexes
        default = F
      ‐a Input file is database in ASN.1 format (otherwise FASTA is expected) [T/F](optional):
        T = True
        F = False
        default = F
      ‐b ASN.1 database in binary mode [T/F] (optional):
        T = binary
        F = text mode
        default = F
      ‐e Input is a Seq entry [T/F] (optional)
        default = F
      ‐n Base name for BLAST files [String] (optional)
      ‐v Number of sequence bases to be created in the volume [Integer] (optional)
        default = 0
      ‐s Create indexes limited only to accessions: sparse [T/F] (optional)default = F
      ‐V Verbose: check for nonunique string IDs in the database [T/F] (optional)default = F
      ‐A Create ASN.1 structured deflines [T/F] (optional)default = F
      Option Explanation
      ‐p Program name [String]
        Input should be one of blastp, blastn, blastx, tblastn, or, tblastx
      ‐d Database [String]
        default = nr
        The database specified must first be formatted with formatdb. An example would be ‐d nr est, which will search both the nr and est databases, presenting the results as if one “virtual” database consisting of all the entries from both were searched. The statistics are based on the “virtual” database of nr and est.
      ‐i Query file [File In]
        default = stdin
        The query should be in FASTA format. If multiple FASTA entries are in the input file, all queries will be searched.
      ‐e Expectation value (E ) [Real]
        default = 10.0
      ‐o BLAST report output file [File Out] (optional)
        default = stdout
      ‐F Filter query sequence (dust with BLASTN, seg with others) [String]
        default = T
        BLAST 2.0 and 2.1 use the dust low‐complexity filter for BLASTN and seg for the other programs. Both dust and seg are integral parts of the NCBI Toolkit and are accessed automatically. If one uses ‐F T then normal filtering by seg or dust (for BLASTN) occurs (likewise ‐F F means no filtering whatsoever). This option also takes a string as an argument. One may use such a string to change the specific parameters of seg or invoke other filters.
      ‐S Query strands to search against database (for BLAST[NX], and TBLASTX). 3 is both, 1 is top, 2 is bottom [Integer]
        default = 3
      ‐T Produce HTML output [T/F]
        default = F
      ‐l Restrict search of database to list of GI's [String] (optional)
        This option specifies that only a subset of the database should be searched, determined by the list of GI's (i.e., NCBI identifiers) in a file. One can obtain a list of gi's for a given Entrez query from http://www.ncbi.nlm.nih.gov/Entrez/batch.html. This file should be in the same directory as the database, or in the directory from which BLAST is called.
      ‐U Use lowercase filtering of FASTA sequence [T/F] (optional)
        This option specifies that any lower‐case letters in the input FASTA file should be masked

       a For an example of using these options, see protocol 1 , step .
      Table 3.1.2   Necessary Resources   Options for formatdb a   Options for formatdb   Options for blastall b   Options for blastall

      Option Explanation
      ‐t Title for database file [String] (optional)
      ‐i Input file(s) for formatting [File In] (this parameter must be set)
      ‐l Logfile name: [File Out] (optional)
        default = formatdb.log
      ‐p Type of file [T/F] (optional):
        T = protein
        F = nucleotide
        default = T
      ‐o Parse options [T/F] (optional):
        T (true) = parse SeqID and create indexes
        F (false) = do not parse SeqID; do not create indexes
        default = F
      ‐a Input file is database in ASN.1 format (otherwise FASTA is expected) [T/F](optional):
        T = True
        F = False
        default = F
      ‐b ASN.1 database in binary mode [T/F] (optional):
        T = binary
        F = text mode
        default = F
      ‐e Input is a Seq entry [T/F] (optional)
        default = F
      ‐n Base name for BLAST files [String] (optional)
      ‐v Number of sequence bases to be created in the volume [Integer] (optional)
        default = 0
      ‐s Create indexes limited only to accessions: sparse [T/F] (optional)default = F
      ‐V Verbose: check for nonunique string IDs in the database [T/F] (optional)default = F
      ‐A Create ASN.1 structured deflines [T/F] (optional)default = F
      Option Explanation
      ‐p Program name [String]
        Input should be one of blastp, blastn, blastx, tblastn, or, tblastx
      ‐d Database [String]
        default = nr
        The database specified must first be formatted with formatdb. An example would be ‐d nr est, which will search both the nr and est databases, presenting the results as if one “virtual” database consisting of all the entries from both were searched. The statistics are based on the “virtual” database of nr and est.
      ‐i Query file [File In]
        default = stdin
        The query should be in FASTA format. If multiple FASTA entries are in the input file, all queries will be searched.
      ‐e Expectation value (E ) [Real]
        default = 10.0
      ‐o BLAST report output file [File Out] (optional)
        default = stdout
      ‐F Filter query sequence (dust with BLASTN, seg with others) [String]
        default = T
        BLAST 2.0 and 2.1 use the dust low‐complexity filter for BLASTN and seg for the other programs. Both dust and seg are integral parts of the NCBI Toolkit and are accessed automatically. If one uses ‐F T then normal filtering by seg or dust (for BLASTN) occurs (likewise ‐F F means no filtering whatsoever). This option also takes a string as an argument. One may use such a string to change the specific parameters of seg or invoke other filters.
      ‐S Query strands to search against database (for BLAST[NX], and TBLASTX). 3 is both, 1 is top, 2 is bottom [Integer]
        default = 3
      ‐T Produce HTML output [T/F]
        default = F
      ‐l Restrict search of database to list of GI's [String] (optional)
        This option specifies that only a subset of the database should be searched, determined by the list of GI's (i.e., NCBI identifiers) in a file. One can obtain a list of gi's for a given Entrez query from http://www.ncbi.nlm.nih.gov/Entrez/batch.html. This file should be in the same directory as the database, or in the directory from which BLAST is called.
      ‐U Use lowercase filtering of FASTA sequence [T/F] (optional)
        This option specifies that any lower‐case letters in the input FASTA file should be masked

       b For an example of using these options, see protocol 1 , step .
  • Files
    • Input data files must be in FASTA format (see appendix 1B )
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure Figure 3.11.1 Verbose listing of all the files in the BLAST distribution file for Linux as they are “un‐tarred” (see , step ).
    View Image
  •   Figure Figure 3.11.2 The file fungus.fasta, used as an example query file (see , step ).
    View Image
  •   Figure Figure 3.11.3 The file fungus.blastp, an example of output from BLAST (see , steps and ).
    View Image

Videos

Literature Cited

   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402.
Internet Resources
   http://www.ibiostation.com
   Web site of iBiostation, from which the book iBiostation Linux: Bioinformatics for Linux (2003), by M. Hobbs, T. G. Littlejohn and K. Castle (BioLateral Pty. Ltd., Sydney, Au.; ISBN 0‐9750583‐0‐4), may be purchased.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
 
ad image
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
扫码领资料
反馈
TOP
打开小程序