丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

Using GlimmerM to Find Genes in Eukaryotic Genomes

互联网

1025
  • Abstract
  • Table of Contents
  • Figures
  • Literature Cited

Abstract

 

GlimmerM is a eukaryotic gene finder that has been used in the annotation of the genomes of Plasmodium falciparum (the malaria parasite), the model plant Arabidopsis thaliana, Oryza sativa (rice), the parasite Theileria parva, and the fungus Aspergillus fumigatus. A unique feature of the system compared to other eukaryotic gene finders is a module that allows users to provide their own data and train GlimmerM for any organism.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Running GlimmerM Locally to Identify Genes
  • Support Protocol 1: Training GlimmerM for a Specific Organism
  • Alternate Protocol 1: Running GlimmerM VIA the WEB
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure 4.4.1 Training GlimmerM for a malaria data set, where the DNA sequences for training are in the file seqs.fasta and the exon coordinates are in the file exons.coord (shown on the first and second lines). By adding the text >& messages at the end of the trainGlimmerM command, the authors created a file called messages , which captured any potential messages of the training program that were normally printed on the screen. The computer prompt shown is /predator/mpertea/Malaria/train/train11_01 . The third line shows the execution of the ls command ( APPENDIX ), which lists the contents of the directory (fourth line). The directory now contains the subdirectory TrainGlimmM2001‐11‐09D14:34:57 , and the two files TrainGlimmM2001‐11‐09D14:34:57.log and messages .
    View Image
  •   Figure 4.4.2 Example of config_file . Refer to Table for parameters.
    View Image
  •   Figure 4.4.3 An example log file generated by trainGlimmerM .
    View Image
  •   Figure 4.4.4 Example of false.nofilter.acc file.
    View Image
  •   Figure 4.4.5 Example of false.nofilter.don file.
    View Image
  •   Figure 4.4.6 Example of using the GlimmerM Web server.
    View Image
  •   Figure 4.4.7 Output of GlimmerM Web Server.
    View Image
  •   Figure 4.4.8 Sample output from the malaria‐specific version of GlimmerM. The FASTA file used to generate this output is available on the Current Protocols Web site (http://www3.interscience.wiley.com/c_p/cpbi_sampledatafiles.htm).
    View Image
  •   Figure 4.4.9 Sample output of the current version of GlimmerM created by the . The FASTA file used to generate this output is available on the Current Protocols Web site (http://www3.interscience.wiley.com/c_p/cpbi_sampledatafiles.htm).
    View Image

Videos

Literature Cited

   Altschul, S., Gish, W., Miller, W., Myers, E., and Lipman, D. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. 1997. Gapped blast and psi‐blast: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402.
   The Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796‐815.
   Bowman, S., Lawson, D., Basham, D., Brown, D., Chillingworth, T., Churcher, C.M., Craig, A., Davies, R.M., Devlin, K., Feltwell, T., Gentles, S., Gwilliam, R., Hamlin, N., Harris, D., Holroyd, S., Hornsby, T., Horrocks, P., Jagels, K., Jassal, B., Kyes, S., McLean, J., Moule, S., Mungall, K., Murphy, L., Barrell, B.G., et al. 1999. The complete nucleotide sequence of chromosome 3 of Plasmodium falciparum. Nature 400:532‐538.
   Brendel, V. and Kleffe, J. 1998. Prediction of locally optimal splice sites in plant pre‐mRNA with applications to gene identification in Arabidopsis thaliana genomic DNA. Nucleic Acids Res. 26:4748‐4757.
   Burge, C. 1997. Ph.D. thesis. Identification of Genes in Human Genomic DNA. Standford University, Calif.
   Burge, C.B. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268:78‐94.
   Dietrich, R.C., Incorvaia, R., and Padgett, R.A. 1997. Terminal intron dinucleotide sequences do not distinguish between U2‐ and U12‐dependent introns. Mol. Cell. 1:151‐160.
   Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M., and Miller, W. 1998. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8:967‐974.
   Fraser, C.M., Casjens, S., Huang, W.M., Sutton, G.G., Clayton, R., Lathigra, R., White, O., Ketchum, K.A., Dodson, R., Hickey, E.K., Gwinn, M., Dougherty, B., Tomb, J.F., Fleischmann, R.D., Richardson, D., Peterson, J., Kerlavage, A.R., Quackenbush, J., Salzberg, S., Hanson, M., van Vugt, R., Palmer, N., Adams, M.D., Gocayne, J., Venter, J.C., et al. 1997. Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 390:580‐586.
   Fraser, C.M., Norris, S.J., Weinstock, G.M., White, O., Sutton, G.G., Dodson, R., Gwinn, M., Hickey, E.K., Clayton, R., Ketchum, K.A., Sodergren, E., Hardham, J.M., McLeod, M.P., Salzberg, S., Peterson, J., Khalak, H., Richardson, D., Howell, J.K., Chidambaram, M., Utterback, T., McDonald, L., Artiach, P., Bowman, C., Cotton, M.D., Venter, J.C., et al. 1998. Complete genome sequence of Treponema pallidum, the syphilis spirochete. Science 281:375‐388.
   Gardner, M.J., Tettelin, H., Carucci, D.J., Cummings, L.M., Aravind, L., Koonin, E.V., Shallom, S., Mason, T., Yu, K., Fujii, C., Pederson, J., Shen, K., Jing, J., Aston, C., Lai, Z., Schwartz, D.C., Pertea, M., Salzberg, S., Zhou, L., Sutton, G.G., Clayton, R., White, O., Smith, H.O., Fraser, C.M., Hoffman, S.L. 1998. Chromosome 2 sequence of the human malaria parasite Plasmodium falciparum. Science 282:1126‐1132.
   Heidelberg, J.F., Eisen, J.A., Nelson, W.C., Clayton, R.A., Gwinn, M.L., Dodson, R.J., Haft, D.H., Hickey, E.K., Peterson, J.D., Umayam, L., Gill, S.R., Nelson, K.E., Read, T.D., Tettelin, H., Richardson, D., Ermolaeva, M.D., Vamathevan, J., Bass, S., Qin, H., Dragoi, I., Sellers, P., McDonald, L., Utterback, T., Fleishmann, R.D., Nierman, W.C., and White, O. 2000. DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature 406:477‐483.
   Jelinek, F. 1997. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA.
   Murthy, S.K., Kasif, S., Salzberg, S., and Beigel, R. 1993. OC1: Randomized induction of oblique decision trees. Proc. 11th Natl. Conf. on Artificial Intelligence 322‐327.
   Murthy, S.K., Kasif, S., and Salzberg, S. 1994. A system for induction of oblique decision trees. J. of Artificial Intelligence Res. 2:1‐32.
   Nelson, K.E., Eisen, J.A., and Fraser, C.M. 2001. Genome of Thermotoga maritima MSB8. Methods Enzymol. 330:169‐180.
   Pavy, N., Rombauts, S., Dehais, P., Mathe, C., Ramana, D.V., Leroy, P., and Rouze, P. 1999. Evaluation of gene prediction software using a genomic data set: Application to Arabidopsis thaliana sequences. Bioinformatics 15:887‐899.
   Pertea, M., Salzberg, S.L., and Gardner, M.J. 2000. Finding genes in Plasmodium falciparum. Nature 404:34‐35.
   Pertea, M. and Salzberg, S.L. 2002. Computational gene finding in plants. Plant Molecular Biology 48:9‐48.
   Pertea, M., Lin, X., and Salzberg, S.L. 2001. GeneSplicer: A new computational method for splice site prediction. Nucleic Acids Res. 29:1185‐1190.
   Salzberg, S.L. 1997. A method for identifying splice sites and translational start sites in eukaryotic mRNA. Comput. Appl. Biosci. 13:365‐376.
   Salzberg, S.L., Delcher, A.L., Kasif, S., and White, O. 1998a. Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 26:544‐548.
   Salzberg, S., Delcher, A.L., Fasman, K.H., and Henderson, J. 1998b. A decision tree system for finding genes in DNA. J. Comput. Biol. 5:667‐680.
   Salzberg, S.L., Pertea, M., Delcher, A.L., Gardner, M.J., and Tettelin, H. 1999. Interpolated Markov models for eukaryotic gene finding. Genomics 59:24‐31.
   Stephens, R., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E.V., and Davis, R.W. 1998. Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science 282:754‐759.
   Wu, Q. and Krainer, A.R. 1996. U1‐mediated exon definition interactions between AT‐AC and GT‐AG introns. Science 274:1005‐1008.
   Yuan, Q., Quackenbush, J., Sultana, R., Pertea, M., Salzberg, S.L., and Buell, C.R. 2001. Rice bioinformatics: Analysis of rice sequence data and leveraging the data to other plant species. Plant Physiol. 125:1166‐1174.
Key References
   Salzberg et al., 1999. See above.
   This paper introduces the GlimmerM method initially used in finding genes in Plasmodium falciparum. This paper also describes how GlimmerM was used in the annotation of chromosome 2 of P. falciparum.
Internet Resources
   http://www.tigr.org/software/glimmerm/
   GlimmerM Web site.
   http://www.tigr.org/tdb/edb2/pfa1/htmls/
   A preliminary annotation of chromosomes 10, 11, and 14 of P. falciparum. (This will change when the P. falciparum genome is completed.)
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
 
ad image
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
扫码领资料
反馈
TOP
打开小程序