丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

Gene Identification in Prokaryotic Genomes, Phages, Metagenomes, and EST Sequences with GeneMarkS Suite

互联网

966
  • Abstract
  • Table of Contents
  • Figures
  • Literature Cited
  • Supplementary Material

Abstract

 

This unit describes how to use several gene?finding programs from the GeneMark line developed for finding protein?coding ORFs in genomic DNA of prokaryotic species, in genomic DNA of eukaryotic species with intronless genes, in genomes of viruses and phages, and in prokaryotic metagenomic sequences, as well as in EST sequences with spliced?out introns. These bioinformatics tools were demonstrated to have state?of?the?art accuracy and have been frequently used for gene annotation in novel nucleotide sequences. An additional advantage of these sequence?analysis tools is that the problem of algorithm parameterization is solved automatically, with parameters estimated by iterative self?training (unsupervised training). Curr. Protoc. Bioinform. 35:4.5.1?4.5.17. © 2011 by John Wiley & Sons, Inc.

Keywords: gene finding; hidden Markov model; unsupervised parameter estimation

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Using GeneMarkS
  • Basic Protocol 2: Using GeneMark.hmm for Prokaryotic Gene Prediction
  • Basic Protocol 3: Using GeneMark for Prokaryotic Gene Prediction
  • Basic Protocol 4: Using the Heuristic Approach for Prokaryotic Model Building
  • Basic Protocol 5: Using MetaGeneMark for Finding Genes in Metagenomes
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure 4.5.1 User interface for the GeneMarkS gene‐finding program. Required input includes a DNA sequence in FASTA format. The algorithm (1) estimates the HMM model parameters from the input sequence via unsupervised iterative training, and (2) makes a final run of GeneMark.hmm to get gene predictions.
    View Image
  •   Figure 4.5.2 User interface for the GeneMark.hmm program. Required input includes a DNA sequence in FASTA format and the set of species‐specific parameters for the model.
    View Image
  •   Figure 4.5.3 The text output of GeneMark.hmm using both the Atypical and Typical model. The predicted gene coordinates, the strand, and the model type are listed for each gene.
    View Image
  •   Figure 4.5.4 The graphical output from GeneMark combined with the output of GeneMark.hmm. The genes predicted by the Typical model are shown in black (solid line); the genes predicted by the Atypical model are shown in red (dashed line). The black horizontal bars indicate protein coding regions predicted by GeneMark.hmm.
    View Image
  •   Figure 4.5.5 The user interface of GeneMark. Required input includes a DNA sequence in FASTA format and the set of species‐specific parameters of the model.
    View Image
  •   Figure 4.5.6 The text output of the GeneMark program. Open reading frames predicted as genes are listed, along with the average coding potential of an ORF and the probability for alternative ATG triplets to be a translation start .
    View Image
  •   Figure 4.5.7 The graphical output of GeneMark. The six different panels represent the six possible reading frames, three of each in the direct and reverse strands.
    View Image
  •   Figure 4.5.8 The user interface for the Heuristic approach. Required input includes a DNA sequence in FASTA format. This method finds model parameters for the gene‐prediction algorithm from the given (short) sequence. GeneMark.hmm runs with these parameters to predict genes.
    View Image
  •   Figure 4.5.9 The user interface for the MetaGeneMark program. Required input includes a set of DNA sequences in multi‐ FASTA format.
    View Image

Videos

Literature Cited

Literature Cited
   Besemer, J. and Borodovsky, M. 1999. Heuristic approach to deriving models for gene finding. Nucleic Acids Res. 27:3911‐3920.
   Besemer, J., Lomsadze, A., and Borodovsky, M. 2001. GeneMarkS: A self‐training method for prediction of gene starts in microbial genomes: Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 29:2607‐2618.
   Borodovsky, M. and McIninch, J. 1993. GENMARK: Parallel gene recognition for both DNA strands. Comput. Chem. 17:123‐133.
   Borodovsky, M., Sprizhitsky, Yu., Golovanov, E., and Alexandrov, A. 1986a. Statistical patterns in primary structures of functional regions in the E. coli genome: I. Oligonucleotide frequencies analysis. Mol. Biol. 20:826‐833.
   Borodovsky, M., Sprizhitsky, Y., Golovanov, E., and Alexandrov, A. 1986b. Statistical patterns in primary structures of functional regions in the E. coli genome: II. Non‐homogeneous Markov models. Mol. Biol. 20:833‐840.
   Borodovsky, M., Sprizhitsky, Y., Golovanov, E., and Alexandrov, A. 1986c. Statistical patterns in primary structures of functional regions in the E. coli genome: III. Computer recognition of coding regions. Mol. Biol. 20:1145‐1150.
   Borodovsky, M., Rudd, K., and Koonin, Eu. 1994a. Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. Nucleic Acids Res. 22:4756‐4767.
   Borodovsky, M., Koonin, Eu., and Rudd, K. 1994b. New genes in old sequences: A strategy for finding genes in a bacterial genome. Trends Biochem. Sci. 19:309‐313.
   Borodovsky, M., McIninch, J., Koonin, E., Rudd, K., Medigue, C., and Danchin, A. 1995. Detection of new genes in the bacterial genome using Markov models for three gene classes. Nucleic Acids Res. 23:3554‐3562.
   Bult, C.J., White, O., Olsen, G.J., Zhou, L., Fleischmann, R.D., Sutton, G.G., Blake, J.A., FitzGerald, L.M., Clayton, R.A., Gocayne, J.D., Kerlavage, A.R., Dougherty, B.A., Tomb, J.‐F., Adams, M.D., Reich, C.I., Overbeek, R., Kirkness, E.F., Weinstock, K.G., Merrick, J.M., Glodek, A., Scott, J.L., Geoghagen, N.S.M., Weidman, J.F., Fuhrmann, J.L., Nguyen, D., Utterback, T.R., Kelley, J.M., Peterson, J.D., Sadow, P.W., Hanna, M.C., Cotton, M.D., Roberts, K.M., Hurst, M.A., Kaine, B.P., Borodovsky, M., Klenk, H.‐P., Fraser, C.M., Smith, H.O., Woese, C.R., and Venter, J.C. 1996. Complete genome sequence of the methanogenic archaeon Methanococcus jannaschii. Science 273:1058‐1073
   Durbin, R., Eddy, S., Krough, A., and Mitchison, G. 1998. Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, U.K.
   Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., Kerlavage, A.R., Bult, C.J., Tomb, J.‐F., Dougherty, B.A., Merrick, J.M., McKenney, K., Sutton, G., Fitzhugh, W., Fields, C.A., Gocayne, J.D., Scott, J.D., Shirley, R., Liu, L.‐I., Glodek, A., Kelley, J.M., Weidman, J.F., Phillips, C.A., Spriggs, T., Hedblom, E., Cotton, M.D., Utterback, T.R., Hanna, M.C., Nguyen, D.T., Saudek, D.M., Brandon, R.C., Fine, L.D., Fritchman, J.L., Fuhrmann, J.L., Geoghagen, N.S.M., Gnehm, C.L., McDonald, L.A., Small, K.V., Fraser, C.M., Smith, H.O., and Venter, J.C. 1995. Whole‐genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496‐512.
   Fraser, C.M., Gocayne, J.D., White, O., Adams, M.D., Clayton, R.A., Fleischmann, R.D., Bult, C.J., Kerlavage, A.R., Sutton, G., Kelley, J.M., Fritchman, J.L., Weidman, J.F., Small, K.V., Sandusky, M., Fuhrmann, J.L., Nguyen, D.T., Utterback, T.R., Saudek, D.M., Phillips, C.A., Merrick, J.M., Tomb, J.‐F., Dougherty, B.A., Bott, K.F., Hu, P.‐C., Lucier, T.S., Peterson, S.N., Smith, H.O., Hutchison, C.A. III, and Venter, J.C. 1995. The minimal gene complement of Mycoplasma genitalium. Science 270:397‐403.
   Hayes, W. and Borodovsky, M. 1998. How to interpret anonymous genome? Machine learning approach to gene identification. Genome Res. 8:1154‐1171.
   Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., and Wootton, J.C. 1993. Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262:208‐214.
   Lukashin, A.V. and Borodovsky, M. 1998. GeneMark.hmm: New solutions for gene finding. Nucleic Acids Res. 26:1107‐1115.
   Mills, R., Rozanov, M., Lomsadze, A., Tatusova, T., and Borodovsky, M. 2003. Improving gene annotation in complete viral genomes. Nucleic Acids Res. 31:7041‐7055.
   Tatusov, R.L., Mushegian, A.R., Bork, P., Brown, N.P., Hayes, W., Borodovsky, M., Rudd, K.E., and Koonin, E.V. 1996. Metabolism and evolution of H. influenzae deduced from whole genome comparison to E. coli. Curr. Biol. 6:279‐291.
   Tomb, J., White, O., Kerlavage, A.R., Clayton, R.A., Sutton, G.G., Fleischmann, R.D., Ketchum, K.A., Klenk, H.P., Gill, S., Dougherty, B.A., Nelson, K., Quackenbush, J., Zhou, L., Kirkness, E.F., Peterson, S., Loftus, B., Richardson, D., Dodson, R., Khalak, H.G., Glodek, A., McKenney, K., Fitzegerald, L.M., Lee, N., Adams, M.D., Hickey, E.K., Berg, D.E., Gocayne, J.D., Utterback, T.R., Peterson, J.D., Kelley, J.M., Cotton, M.D., Weidman, J.M., Fujii, C., Bowman, C., Watthey, L., Wallin, E., Hayes, W.S., Borodovsky, M., Karp, P.D., Smith, H.O., Fraser, C.M., and Venter, J.C. 1997. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388:539‐547
   Zhu, W., Lomsadze, A., and Borodovsky, M. 2010. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38:e132.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Supplementary Material

Example: bi0405.zip

 
ad image
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
扫码领资料
反馈
TOP
打开小程序