• 我要登录|
  • 免费注册
    |
  • 我的丁香通
    • 企业机构:
    • 成为企业机构
    • 个人用户:
    • 个人中心
  • 移动端
    移动端
丁香通 logo丁香实验_LOGO
搜实验

    大家都在搜

      大家都在搜

        0 人通过求购买到了急需的产品
        免费发布求购
        发布求购
        点赞
        收藏
        wx-share
        分享

        Finding Homologs to Nucleic Acid or Protein Sequences Using the Framesearch Program

        互联网

        1100
        • Abstract
        • Table of Contents
        • Materials
        • Figures
        • Literature Cited

        Abstract

         

        The Framesearch algorithm includes the possibility of a frameshift error in its alignment algorithm, and therefore can find alignments that span different reading frames. Protocols in this unit describe the use of Framesearch to search a protein sequence database for sequences that are similar to a query nucleotide sequence, and to search a nucleotide sequence database for sequences that are similar to a query protein sequence. Three alternate protocols describe ways to improve the speed of Framesearch and thus make it practical for routine use. Framesearch is especially appropriate for low?quality single?read nucleotide sequence data, such as ESTs (expressed sequence tags) or early drafts of genomic sequences; it does not offer any significant advantage over less CPU?intensive algorithms for relatively high?quality nucleotide sequences without many single?nucleotide insertion or deletion errors.

             
         
        GO TO THE FULL PROTOCOL:
        PDF or HTML at Wiley Online Library

        Table of Contents

        • Basic Protocol 1: Framesearch Using a Nucleic Acid Query Sequence
        • Basic Protocol 2: Framesearch Using a Protein Query Sequence
        • Alternate Protocol 1: Prefiltering with a Search Algorithm to Improve the Speed of Framesearch with a Nucleic Acid Query Sequence
        • Alternate Protocol 2: Prefiltering with a Search Algorithm to Improve the Speed of Framesearch with a Protein Query Sequence
        • Alternate Protocol 3: Improving Speed of Framesearch by Using Specialized Hardware
        • Support Protocol 1: Downloading and Converting Sequence Files for the Examples Used in the Protocols
        • Guidelines for Understanding Results
        • Commentary
        • Figures
             
         
        GO TO THE FULL PROTOCOL:
        PDF or HTML at Wiley Online Library

        Materials

        Basic Protocol 1: Framesearch Using a Nucleic Acid Query Sequence

          Necessary Resources
        • Hardware
        • Framesearch can be run on any Unix or VMS system that has the Wisconsin Package installed; because it is so CPU‐intensive, Framesearch should be run on the fastest computer available to the user
        • Software
        • GCG Wisconsin Package (v. 8.1 or higher)
        • Files
        • DNA sequence file of interest (this will be the query sequence; maximum length, 350 kb)
        • Protein database of sequences to which the DNA sequence will be compared
        For example, BA000007.faa contains the amino acid translations of all putative genes found in this bacterial genome by the lab where it was sequenced, as a single FASTA format text file ( appendix 1B ).Both the query sequence and the database files must be converted to the GCG format ( protocol 6 ).The files used in this example should be downloaded from NCBI or from the Current Protocols Web site (http://www3.interscience.wiley.com/c_p/cpbi_sampledatafiles.htm) and converted to GCG format, as described in the protocol 6 .

        Basic Protocol 2: Framesearch Using a Protein Query Sequence

          Necessary Resources
        • Hardware
        • Framesearch can be run on any Unix or VMS system that has the Wisconsin Package installed; because it is so CPU‐intensive, Framesearch should be run on the fastest computer available to the user
        • Software
        • GCG Wisconsin Package (v. 8.1 or higher)
        • Files
        • Protein sequence file of interest (this will be the query sequence)
        • Nucleic acid database of sequences to which the protein sequence will be compared
        For example, BA000007.fna contains the nucleotide sequence of all putative genes found in this bacterial genome by the laboratory where it was sequenced, as a single FASTA format text file ( appendix 1B ).Both the query sequence and the database files must be converted to the GCG format ( protocol 6 ).The files used in this example should be downloaded from NCBI or from the Current Protocols Web site (http://www3.interscience.wiley.com/c_p/cpbi_sampledatafiles.htm) and converted to GCG format, as described in the protocol 6 .

        Alternate Protocol 1: Prefiltering with a Search Algorithm to Improve the Speed of Framesearch with a Nucleic Acid Query Sequence

          Necessary Resources
        • Hardware
        • Framesearch can be run on any Unix or VMS system that has the Wisconsin Package installed; because it is so CPU‐intensive, Framesearch should be run on the fastest computer available to the user
        • Software
        • GCG Wisconsin Package (v. 8.1 or higher)
        • BLAST program (unit 3.4 )In the GCG environment assumed for these examples, both BLAST and Framesearch are included.
        • Files
        • DNA sequence file of interest (this will be the query sequence; maximum length, 350 kb)
        • Protein database of sequences to which the DNA sequence will be compared
        For example, contains the amino acid translations of all putative genes found in this bacterial genome by the lab where it was sequenced, as a single FASTA format text file ( appendix 1B ).Both the query sequence and the database files must be converted to the GCG format ( protocol 6 ).The files used in this example should be downloaded from NCBI or from the Current Protocols Web site (http://www3.interscience.wiley.com/c_p/cpbi_sampledatafiles.htm) and converted to GCG format, as described in the protocol 6 .

        Alternate Protocol 2: Prefiltering with a Search Algorithm to Improve the Speed of Framesearch with a Protein Query Sequence

          Necessary Resources
        • Hardware
        • Framesearch can be run on any Unix or VMS system that has the Wisconsin Package installed; because it is so CPU‐intensive, Framesearch should be run on the fastest computer available to the user
        • Software
        • GCG Wisconsin Package (v. 8.1 or higher)
        • BLAST program (unit 3.4 )In the GCG environment assumed for these examples, both BLAST and Framesearch are included.
        • Files
        • Protein sequence file of interest (this will be the query sequence)
        • Nucleic acid database of sequences to which the protein sequence will be compared
        For example, BA000007.fna contains the nucleotide sequence of all putative genes found in this bacterial genome by the laboratory where it was sequenced, as a single FASTA format text file ( appendix 1B ).Both the query sequence and the database files must be converted to the GCG format ( protocol 6 ).The files used in this example should be downloaded from NCBI or from the Current Protocols Web site (http://www3.interscience.wiley.com/c_p/cpbi_sampledatafiles.htm) and converted to GCG format, as described in the protocol 6 .

        Alternate Protocol 3: Improving Speed of Framesearch by Using Specialized Hardware

          Necessary Resources
        • Hardware
        • Any Unix or VMS system that has the Wisconsin Package installed
        • Software
        • GCG Wisconsin Package (v. 8.1 or higher; includes FROMFASTA)
        • Files
        • The files used in this example can be downloaded from the NCBI FTP server as described below, or from the Current Protocols Web site (http://www3.interscience.wiley.com/c_p/cpbi_sampledatafiles.htm)
        GO TO THE FULL PROTOCOL:
        PDF or HTML at Wiley Online Library

        Figures

        •   Figure Figure 3.2.1 Six‐frame‐translated search versus Framesearch.
          View Image
        •   Figure Figure 3.2.2 Distribution of scores generated by using Framesearch to compare nucleotides 52500 through 55000 of gi‐15829254_55.seq with all peptide sequences from the example bacterial genome. Since the selected region comprises all of one gene and parts of two flanking genes, there are three very strong hits, highlighted by arrows above. There are also many lower‐quality hits with scores below 400. Most likely, hits with scores above 200 represent genes related to the three genes contained in this region, while hits with scores between 100 and 200 may represent borderline matches, but scores below 100 probably do not represent biologically significant matches.
          View Image
        •   Figure Figure 3.2.3 The list of hits from a Framesearch run in which a nucleic acid sequence was used to search a number of peptide sequences. The name of the query sequence, the wildcard expression specifying the target sequences, and the name of the peptide sequence with the best match have been boldfaced in the sample output.
          View Image
        •   Figure Figure 3.2.4 Alignment of a nucleotide query sequence against a peptide database sequence, generated by Framesearch. Note that the middle portion has been omitted here. The names of the query and database sequences, just above this alignment, have been boldfaced for emphasis.
          View Image
        •   Figure Figure 3.2.5 The list of hits from a Framesearch run in which an amino acid sequence was used to search a number of nucleotide sequences. The name of the query sequence, the wildcard expression specifying the target sequences, and the name of the nucleotide sequence with the best match have been boldfaced in the sample output.
          View Image
        •   Figure Figure 3.2.6 Alignment of an amino acid query sequence against a nucleotide database sequence, generated by Framesearch. Note that the middle portion has been omitted here. The names of the query and database sequences, just above this alignment, have been boldfaced for emphasis. Also note that following the name of the nucleotide sequence in this example is the string “/rev”, which means this alignment is to the reverse complement of this nucleotide sequence.
          View Image
        •   Figure Figure 3.2.7 Illustration of how insertion and deletion errors affect alignments generated by the six‐frame‐translated Smith‐Waterman algorithm. Note that this example was generated on a DeCypher genomics accelerator, manufactured by TimeLogic. SSEARCH in the GCG environment would give very similar results, in a slightly different format. The nucleotides selected are the reverse complement of those nucleotides from the E. coli O157:H7 genome, NCBI REFSEQ number NC_002695, which correspond to amino acids 1 to 84 of the protein with NCBI gi number 13361126.
          View Image
        •   Figure Figure 3.2.8 A Framesearch alignment between a nucleotide query sequence and a peptide target sequence, in the format generated by a TimeLogic DeCypher genomics accelerator system. Framesearch in the GCG environment would generate the same output, in a slightly different format. The nucleotides selected for this example are the reverse complement of those nucleotides from the E. coli O157:H7 genome, NCBI REFSEQ number NC_002695, which correspond to amino acids 1 to 84 of the protein with NCBI gi number 13361126. Compare this figure with Figure , which shows how Framesearch dynamically follows the correct reading frame despite the frameshift errors created when indel errors are deliberately introduced into the nucleotide sequence.
          View Image
        •   Figure Figure 3.2.9 This is a continuation of Figure , and should be compared with it.
          View Image

        Videos

        Literature Cited

        Literature Cited
           Accelerys. 2001. Announcement of new features in SeqWeb version 2 http://www.accelerys.com/products/seqweb/whats_new2p0.html.
           NOTE: The text of this poster can be found at http://sulu.gcg.com/company/posters/framesearch.html.
           Edelman, I., Faigler, S., Mintz, E., Natan, A., and Devereux, J. 1995. Framesearch: A rigorous alignment program for searching protein databases with nucleic acid queries. Poster, Genome Sequence and analysis Conference, Hilton Head, South Carolina, 1995.
           NOTE: The GCG Transcript, subtitled “Bio‐Computing News for Users of the Wisconsin Package,” was published by the company for a number of years. The text of this issue, which features a discussion of the newly‐added Framesearch program, can be found at http://sulu.gcg.com/pub/newsletter/vol3_no2_nov95.html.
           GCG. 1995. GCG Transcript 3:2. Genetics Computing Group, Madison, Wisconsin.
           Halperin, E., Faigler, S., and Gill‐More, R. 1999. FramePlus: Aligning DNA to protein sequences. Bioinformatics 15(11):867‐873.
           TimeLogic. 2001. Manuals supplied with a DeCypher bioinformatics accelerator. TimeLogic Corporation, Incline Village, Nevada.
           Zhang, Z., Pearson, W.R., and Miller, W. 1997. Aligning a DNA sequence with a protein sequence. Journal of Computational Biology 4(3):339‐349.
        Key References
           Edelman et al., 1995. See above.
           The key reference for the Framesearch algorithm is the poster by Edelman. The key reference for a particular implementation of Framesearch is the documentation supplied with that implementation.
        Internet Resources
           http://www.accelerys.com/
           Web site of Accelerys, the corporate parent of GCG.
           http://www.cgen.com/
           Web site of the Compugen company.
           http://www.paracel.com/
           Web site of the Paracel company.
           http://www.timelogic.com
           Web site of the TimeLogic company.
        GO TO THE FULL PROTOCOL:
        PDF or HTML at Wiley Online Library
         
        ad image
        提问
        扫一扫
        丁香实验小程序二维码
        实验小助手
        丁香实验公众号二维码
        扫码领资料
        反馈
        TOP
        打开小程序