Using BLAST for Performing Sequence Alignment
互联网
- Abstract
- Table of Contents
- Materials
- Figures
- Literature Cited
Abstract
BLAST is a widely used genetic sequence comparison program developed at the National Center for Biotechnology Information (NCBI). In this unit, three Basic Protocols and one Support Protocol are provided for general?purpose BLAST searches on the NCBI and ENSEMBL Web?accessible BLAST servers. Key parameters affecting how the search algorithm works are reviewed, with advice on modifying search parameters for specific situations. Many other public and private Web sites offer BLAST interfaces which may differ from those described in this unit, but the general principles will be similar. The Support Protocol describes how to obtain sequences in various formats from NCBI for use in BLAST searches. It is emphasized that no algorithm can be a substitute for biological understanding; performing a BLAST search takes only a few minutes but understanding the implications of the results takes much longer.
Keywords: Algorithms; Molecular Sequence Data; Sequence Alignment; Software
Table of Contents
- Basic Protocol 1: BLASTP: Searching the NCBI Protein Databases Using a Protein Query Sequence
- Basic Protocol 2: BLASTN: Searching the NCBI Nucleotide Databases Using a Nucleotide Query Sequence
- Basic Protocol 3: Searching the Ensembl Human Genomic Nucleotide Database Using a Nucleotide Query Sequence
- Support Protocol 1: Downloading Protein and Nucleotide Sequences from the NCBI Databases
- Commentary
- Literature Cited
- Figures
- Tables
Materials
Basic Protocol 1: BLASTP: Searching the NCBI Protein Databases Using a Protein Query Sequence
Materials
Basic Protocol 2: BLASTN: Searching the NCBI Nucleotide Databases Using a Nucleotide Query Sequence
Materials
Basic Protocol 3: Searching the Ensembl Human Genomic Nucleotide Database Using a Nucleotide Query Sequence
Materials
Support Protocol 1: Downloading Protein and Nucleotide Sequences from the NCBI Databases
Materials
|
Figures
-
Figure 6.8.1 The BLAST homepage at the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/BLAST/). View Image -
Figure 6.8.2 The basic options portion of the NCBI protein‐protein BLAST search page. View Image -
Figure 6.8.3 The advanced options portion of the NCBI protein‐protein BLAST search page. View Image -
Figure 6.8.4 The formatting options portion of the NCBI protein‐protein BLAST search page. View Image -
Figure 6.8.5 Output from a Conserved Domain search at NCBI. View Image -
Figure 6.8.6 Graphical overview of where BLAST hits align to the query sequence. For the color version of this figure go to http://www.currentprotocols.com. View Image -
Figure 6.8.7 List of hits from a BLAST Search at NCBI. View Image -
Figure 6.8.8 Pairwise alignment of the query sequence gi|55956902|ref|NP_006715.2 and the hit sequence gi|60688294|gb|AAH91661.1 from zebrafish ( Danio rerio ) as generated by the BLAST Web site at NCBI. View Image -
Figure 6.8.9 The basic options portion of the NCBI nucleotide‐nucleotide BLAST search page. View Image -
Figure 6.8.10 The advanced options portion of the NCBI nucleotide‐nucleotide BLAST search page. View Image -
Figure 6.8.11 The formatting options portion of the NCBI nucleotide‐nucleotide BLAST search page. View Image -
Figure 6.8.12 Graphical overview of hits relative to the query sequence on the NCBI BLAST interface. For the color version of this figure go to http://www.currentprotocols.com. View Image -
Figure 6.8.13 List of hits from BLAST search output at NCBI. View Image -
Figure 6.8.14 Further down the list of hits, the raw scores begin getting smaller and the statistical “E” scores are not as small, but these hits are still very strong hits. View Image -
Figure 6.8.15 Pairwise alignment between gi|5596901|ref|NM_006724.2 and gi|50741727|ref|XM_419617.1 generated by a BLAST search submitted to the NCBI Web site. View Image -
Figure 6.8.16 Sequence entry section of the BLAST submission page at the ENSEMBL Web site (http://www.ensembl.org/Multi/blastview). View Image -
Figure 6.8.17 Specifying the search options on the ENSEMBL BLAST search page. View Image -
Figure 6.8.18 Retrieving BLAST output from the ENSEMBL BLAST search. View Image -
Figure 6.8.19 Karyotype view of hits from ENSEMBL BLAST search. Clicking one of the red arrows will cause a pop‐up menu to appear. View Image -
Figure 6.8.20 Click on the Chromosome 6 hit, then pick ContigView from this menu. View Image -
Figure 6.8.21 Part of the ENSEMBL view of Chromosome 6 in a region surrounding this BLAST hit. For the color version of this figure go to http://www.currentprotocols.com. View Image -
Figure 6.8.22 Sequence in FASTA format showing a fairly simple sequence file format, which can be used to store one or many sequences. View Image -
Figure 6.8.23 A sequence in Genbank format. View Image
Videos
Literature Cited
Altschul, S.F., Boguski, M.S., Gish, W., and Wootton, J.C. 1994. Issues in searching molecular sequence databases. Nat. Genet. 6:119‐129. | |
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402. | |
Altschul, S.F., Wootton, J.C., Gertz, E.M., Agarwala, R., Morgulis, A., Schaffer, A.A., and Yu, Y.K. 2005. Protein database searches using computationally adjusted substitution matrices. FEBS Journal 272:5099‐5100. | |
Gilks, W.R., Audit, B., de Angelis, D., Tsoka, S., and Ouzounis, C.A. 2005. Percolation of annotation errors through hierarchically structured protein sequence databases. Math. Biosci. 193:223‐234. | |
Jackson, D.G., Healy, M.D., Davison, D.B. 2003. Bioinformatics: Not just for sequences anymore. Biosilico 1:103‐111. | |
Jones, D.T. and Swindells, M.B. 2002. Getting the most from PSI‐BLAST. Trends Biochem. Sci. 27:161‐164. | |
McGinnis, S., and Madden, T.L. 2004. BLAST: At the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 32:W20‐W25. | |
Key References | |
Altschul et al., 1997. See above. | |
In the late 1990s NCBI made major improvements to the BLAST algorithms; this paper summarizes how those improvements work and why they matter. | |
Korf, I., Yandell, M., and Bedell, J. 2003. BLAST. O'Reilly Media, Sebastopol, CA. | |
This is an entire book dedicated to the BLAST program, from a leading publisher of technical books. | |
Woodford, N. 2004. Public databases: retrieving and manipulating sequences for beginners. Methods Mol. Biol. 266:17‐28. | |
This general discussion of how to use the major sequence databases can supplement of this Unit. |