丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

PipMaker: A World Wide Web Server for Genomic Sequence Alignments

互联网

2061
  • Abstract
  • Table of Contents
  • Materials
  • Figures
  • Literature Cited

Abstract

 

PipMaker is a World?Wide Web site used to compare two long genomic sequences and identify conserved segments between them. This unit describes the use of the PipMaker server and explains the resulting output files. PipMaker provides an efficient method of aligning genomic sequences and returns a compact, but easy?to?interpret form of output, the percent identity plot (pip). For each aligning segment between two sequences the pip shows both the position relative to the first sequence and the degree of similarity. Optional annotations on the pip provide additional information to assist in the interpretation of the alignment. The default parameters of the underlying blastz alignment program are tuned for human?mouse alignments.

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Strategic Planning
  • Basic Protocol 1: Submitting Sequences to PipMaker
  • Support Protocol 1: Generating a Repeats File for Use with PipMaker
  • Support Protocol 2: Generating an Exons File for Use with PipMaker
  • Support Protocol 3: Generating Color Underlays for Use with PipMaker
  • Support Protocol 4: Generating Annotation Files for Use with PipMaker
  • Support Protocol 5: Installing Stand‐Alone Blastz
  • Guidelines for Understanding Results
  • Commentary
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Submitting Sequences to PipMaker

  Necessary Resources
  • Hardware
    • PipMaker can be accessed and used by any computer with a World Wide Web browser and E‐mail access.
  • Software
    • PipMaker is accessible via a Web interface at http://bio.cse.psu.edu/. All output files will be returned to the user via E‐mail. The E‐mail account and software must be capable of handling large messages. Viewing the output from PipMaker requires a PDF viewer to display the pip or dot plot, such as Aladdin GhostScript or Adobe Acrobat Reader. These are available for free download at http://www.cs.wisc.edu/~ghost/ and http://www.adobe.com/, respectively. At the present time, Acrobat Reader has better support for hyperlinks in PDF files, which are an option in PipMaker. PipMaker can optionally generate a PostScript version of the output files. This feature is useful for importing the plot into a graphics program in preparation for publication.
  • Files
    • The following file types are used:
      • Sequences: The PipMaker server accepts two DNA sequences in FASTA format ( appendix 1B ) only. These sequence files must be in plain text format, consisting of A, C, G, T, N, and X, typically uppercase. Line length should be within ∼70 characters. The first sequence should be in one contiguous piece, while the second sequence can be in unordered, unoriented contigs.
      • Repeatsfile (see protocol 2 )
      • Exonfile (optional; see protocol 3 )
      • Underlayfile (optional; see protocol 4 )
      • Annotationfile (optional; see protocol 5 )

Support Protocol 1: Generating a Repeats File for Use with PipMaker

  Necessary Resources
  • Hardware
    • The authors test and use Blastz on Solaris/Sparc and Linux/x86 platforms, but it should be portable to virtually any ANSI/POSIX system, including Windows and Macintosh.
  • Software
    • The current development snapshot of Blastz is available on the authors' Web site (http://bio.cse.psu.edu/), in a tar.gz file. To unpack it, tar and gzip (or compatible programs) will be needed. An ANSI‐compatible C compiler and the make utility will be needed to compile and install it.
  • Files
    • The stand‐alone of the Blastz program uses the same sequence and repeats files as the PipMaker Web server (see protocol 1 ).
NOTE: for an introduction to Unix, see appendix 1C .
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure 10.2.1 Flowchart outlining the steps for using PipMaker.
    View Image
  •   Figure 10.2.2 A pip showing human chromosome 22 in the region associated with velocardiofacial syndrome aligned with the orthologous region in mouse. A 60‐kb segment surrounding the CDC45L and CLDN5 genes illustrates the alignments and annotations on a pip. Each aligning segment is displayed as a series of horizontal lines whose positions correspond to the first sequence used in the alignment. The aligning segments are drawn according to their percent identity, which is shown on the vertical axis from 50% to 100%. A number of optional annotation files allow a user to augment the information content of the display. For instance, the names of genes and their direction of transcription is part of the exons file format. An underlay file specifies what color to draw genomic features such as exons (e.g., light blue), introns (e.g., light yellow), UTRs (e.g., light orange), and conserved noncoding regions (e.g., shades of red). An annotations file provides colored, horizontal lines above the alignment that are hyperlinks in the PDF file and provide direct links to relevant Internet sites—e.g., appropriate PubMed citation(s) for the gene (red), the LocusLink entry for the gene (blue), or a protein sequence from GenBank (green). The bookmarks along the left side provide links to compiled information about the various genes and other annotations describing the comparative analysis of the sequences. The bookmarks represent a much larger region than that shown in the image.
    View Image
  •   Figure 10.2.3 A dot plot of the 1.5‐Mb region from human chromosome 22 associated with velocardiofacial syndrome, aligned to the orthologous sequences from mouse. Annotations used in the alignment are displayed along the horizontal axis as gene names with the direction of transcription. The mouse sequence is represented by two contigs that are labeled along the vertical axis of the plot (gi: 20346266 and 20346218). The Order and Orient option attempts to arrange the mouse sequences in the same relative order as the human and indicates the presence of rearrangements in the mouse sequence relative to the human sequence. The dot plot uses the same underlay file as the pip to color the image. Note that the gene names used in this example are not all recognized by the HUGO nomenclature committee, and serve as illustrations only.
    View Image
  •   Figure 10.2.4 Illustration of the PipMaker options (A ) Show All Matches, (B ) Chaining, and (C ) Single Coverage. The human β‐globin gene locus contains a family of duplicated genes which, when aligned to the orthologous region in mouse, shows matches to multiple family members in the mouse globin locus. The option to Show All Matches reveals extensive sequence similarity between globin gene clusters in human and mouse sequences (panel A, from left to right: dot plot of the extended genomic region and the pip of the human δ‐globin gene, HBD ). In addition, a cluster of related olfactory receptor genes surrounds the globin locus in both species and creates the checkerboard pattern. The Chaining option reduces the amount of aligning sequence by showing alignments of sequences that appear in the same relative order between the two species (panel B, leftmost box). For this reason, most of the ORGs disappear. One aligning segment is identified for each human globin gene and multiple hits are removed from the pip (panel B, rightmost box). The Single Coverage option identifies the highest‐scoring alignments and allows any position in the first sequence to align only once to the second sequence. Therefore, no alignments are in the same vertical space, although they may appear to be very close if the display is of insufficient resolution. In the globin locus, several genes are most similar to the same sequence in the mouse locus (see panel C, leftmost and middle boxes) and appear on the same horizontal line. The pip (panel C, rightmost box) shows more alignments than with the Chaining option because the best match is not restricted to being in the same relative order along the two sequences.
    View Image
  •   Figure 10.2.5 Example of documentation output from Repeatmasker.
    View Image
  •   Figure 10.2.6 Alternate format for repeats file.
    View Image
  •   Figure 10.2.7 An exons file containing two genes from human chromosome 22 that are transcribed in opposite orientations.
    View Image
  •   Figure 10.2.8 An example of an underlay file which refers to Figure .
    View Image
  •   Figure 10.2.9 Alternate format for an underlay file which refers to Figure .
    View Image
  •   Figure 10.2.10 An example of a line in the body of an underlay file used to paint just the upper or lower half of a region by using a + or ‐ sign.
    View Image
  •   Figure 10.2.11 A sample type definition entry for the header of an annotation file for Advanced PipMaker.
    View Image
  •   Figure 10.2.12 A sample annotation entry for the body of an annotation file for Advanced PipMaker.
    View Image
  •   Figure 10.2.13 Compact overview of the alignment. The two‐panel image shows the locations of aligned regions (upper panel) and the position of colors specified by the underlay file (lower panel). Green bars represent all regions within an alignment and red bars are those regions that align at a high level of similarity (at least 100 bp without a gap and with at least 70% nucleotide identity). The colors are specified from the underlay file and the gene names and directionality come from the exons file.
    View Image
  •   Figure 10.2.14 Icons used in a pip that represent features in a genomic sequence, such as exons, repeats, and CpG islands.
    View Image
  •   Figure 10.2.15 An example of the concise output file.
    View Image
  •   Figure 10.2.16 The traditional textual form of alignment output.
    View Image
  •   Figure 10.2.17 Analysis of Exons output when the second sequence has only one contig.
    View Image
  •   Figure 10.2.18 Putative coding sequence from optional Analysis of Exons output.
    View Image
  •   Figure 10.2.19 A sample index for Analysis of Exons output when the second sequence file contains multiple contigs.
    View Image
  •   Figure 10.2.20 A sample listing of the exon positions for Analysis of Exons output (analogous to Fig. ) when the second sequence file contains multiple contigs.
    View Image
  •   Figure 10.2.21 Text file describing the predicted arrangement of ordered and oriented contigs.
    View Image
  •   Figure 10.2.22 An example of a second sequence consisting of two segments separated by 100 Ns, which is then is treated as shown in Figure .
    View Image
  •   Figure 10.2.23 See Figure .
    View Image
  •   Figure 10.2.24 Example messages from a PDF file with embedded contig names.
    View Image
  •   Figure 10.2.25 The default scoring matrix.
    View Image

Videos

Literature Cited

Literature Cited
   Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402.
   Bulger, M., van Doorninck, J.H., Saitoh, N., Telling, A., Farrell, C., Bender, M.A., Felsenfeld, G., Axel, R., Groudine, M., and von Doorninck, J.H. 1999. Conservation of sequence and structure flanking the mouse and human beta‐globin loci: The beta‐globin genes are embedded within an array of odorant receptor genes. Proc. Natl. Acad. Sci. U.S.A. 96:5129‐5134
   Bulger, M., Bender, M.A., van Doorninck, J.H., Wertman, B., Farrell, C.M., Felsenfeld, G., Groudine, M., and Hardison, R. 2000. Comparative structural and functional analysis of the olfactory receptor genes flanking the human and mouse beta‐globin gene clusters. Proc. Natl. Acad. Sci. U.S.A. 97:14560‐14565
   Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268:78‐94
   Chiaromonte, F., Yap, V.B., and Miller, W. 2002. Scoring pairwise genomic sequence alignments. Pac. Symp. Biocomput. 2002:115‐126
   Elnitski, L., Riemer, C., Petrykowska, H., Florea, L., Schwartz, S., Hardison, R., and Miller, W. 2002. PipTools: A computational toolkit to prepare and evaluate annotated pairwise comparisons of genomic sequences. Genomics. In press.
   Endrizzi, M.G., Hadinoto, V., Growney, J.D., Miller, W., and Dietrich, W.F. 2000. Genomic sequence analysis of the mouse Naip gene array. Genome Res. 10:1095‐1102
   Florea, F., Riemer, C., Schwartz, S., Zhang, Z., Stojanovic, N., Miller, W., and McClelland, M. 2000. Web‐based visualization tools for bacterial genome alignments. Nucleic Acids Res. 28:3486‐3496
   Gumucio, D., Shelton, D., Zhu, W., Millinoff, D., Gray, T., Bock, J., Slightom, J., and Goodman, M. 1996. Evolutionary strategies for the elucidation of cis and trans factors that regulate the developmental switching programs of the ‐like globin genes. Mol. Phylogenet. Evol. 5:18‐32
   Hardison, R. and Miller, W. 1993. Use of long sequence alignments to study the evolution and regulation of mammalian globin gene clusters. Mol. Biol. Evol. 10:73‐102
   Hardison, R., Slightom, J.L., Gumucio, D.L., Goodman, M., Stojanovic, N., and Miller, W. 1997. Locus control regions of mammalian globin gene clusters: Combining phylogenetic analyses and experimental results to gain functional insights. Gene 205:73‐94
   Jang, W., Hua, A., Spilson, S.V., Miller, W., Roe, B.A., and Meisler, M.H. 1999. Comparative sequence of human and mouse BAC clones from the mnd2 region of chromosome 2p13. Genome Res. 9:53‐61
   Kent, W.J. and Zahler, A.M. 2000. Conservation, regulation, synteny, and introns in a large‐scale C. briggsae–C. elegans genomic alignment. Genome Res. 10:1115‐1125
   Kent, W.J., Sugnet, C.W., Terrence, S.F., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The Human Genome Browser at UCSC. Genome Res. 12:996‐1006
   Kurihara, L.J., Semenova, E., Miller, W., Ingram, R.S., Guan, X.J., and Tilghman, S.M. 2002. Candidate genes required for embryonic development: A comparative analysis of distal mouse chromosome 14 and human chromosome 13q22. Genomics 79:154‐161
   Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., et al. 2001. Initial sequencing and analysis of the human genome. Nature 409:860‐921
   Liang, Y., Wang, A., Belyantseva, I., Anderson, D., Probst, F.J., Barber, T.D., Miller, W., Touchman, J., Jin, L., and Sullivan, S. 1999. Structure and expression of the human and mouse novel unconventional myosin XV genes responsible for hereditary deafness, DFNB3 and shaker‐2. Genomics 61:243‐258
   Loots, G.G., Locksley, R.M., Blankespoor, C.M., Wang, Z.E., Miller, W., Rubin, E.M., and Frazer, K.A. 2000. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross‐species sequence comparisons. Science 288:136‐140
   McClelland, M., Florea, L., Sanderson, K., Clifton, S.W., Parkhill, J., Churcher, C., Dougan, G., Wilson, R.K., and Miller, W. 2000. Comparison of the Escherichia coli K‐12 genome with sampled genomes of a Klebsiella pneumoniae and three Salmonella enterica serovars, Typhimurium, Typhi and Paratyphi. Nucleic Acids Res. 28:4974‐4986
   Oeltjen, J.C., Malley, T.M., Muzny, D.M., Miller, W., Gibbs, R.A., and Belmont, J.W. 1997. Large‐scale comparative sequence analysis of the human and murine Bruton's tyrosine kinase loci reveals conserved regulatory domains. Genome Res. 7:315‐329
   Schwartz, S., Zhang, Z., Frazer, K.A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., and Miller, W. 2000. PipMaker: A web server for aligning two genomic DNA sequences. Genome Res. 10:577‐586
   Waterston, R., Lindblad‐Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P., Antonarakis, S.E., Attwood, J., Baertsch, R. Bailey, J., Barlow, K., Beck, S., Berry, E., Birren, B., Bloom, T., Bork, P., Botcherby, M., Bray, N., Brent, M.R., Brown, D.B., Bult, C., Burton, J., Butler, J., Campbell, R.D., Carninci, P., Cawley, S., Chinwalla, A., Church, D., Clamp, M., Clee, C., Collins, F.S., Cook, L., Copley, R.R., Coulson, A., Couronne, O., Cuff, J., Curwen, V., Cutts, T., Daly, M., David, R., Davies, J., Delehaunty, K., Deri, J., Dermitzakis, E.T., Dewey, C., Dickens, N.J., Diekhans, M., Dodge, S., Dubchak, I., Dunn, D.M., Eddy, S.R., Elnitski, L., Emes, R.D., Eswara, P., Eyras, E., Felsenfeld, A., Fewell, G., Flicek, P., Foley, K., Frankel, W.N., Fulton, L., Fulton, R., Furey, T.S., Gage, D., Gibbs, R.A., Glusman, G., Gnerre, S., Goldman, N., Goodstadt, L., Graffham, D., Graves, T., Green, E.D., Gregory, S., Guigo, R., Guyer, M., Hardison, R.C., Haussler, D., Hayashizaki, Y., Hillier, L., Hinrichs, A., Hlavina, W., Holzer, T., Hsu, F., Hua, A., Hubbard, T., Hunt, A., Jackson, I., Jaffe, D.B., Johnson, L.S., Jones, M., Jones, T.A., Joy, A., Kamal, M., Karlsson, E.K., Karolchik, D., Kasprzyk, A., Kawai, A., Keibler, E., Kells, C., Kent, W.J., Kirby, A., Kolbe, D., Korf, I., Kucherlapati, R.S., Kulbokas, R.J. III., Kulp, D., Landers, T., Leger, J.P., Leonard, S., Letunic, I., Levine, R., Li, J., Li, M., Lloyd, C., Lucas, S., Ma, B., Maglott, D.R., Maier, J., Mardis, E.R., Matthews, L., Mauceli, E., Mayer, J.H., McCarthy, M., McCombie, R., McLaren, S., McLay, K., McPherson, J., Meldrim, J., Meredith, B., Mesirov, J.P., Miller, W., Miner, T., Mongin, E., Montgomery, K.T., Morgan, M., Mott, R., Mullikin, J.C., Muzny, D.M., Nash, W., Nelson, J., Nhan, M., Nicol, R., Ning, Z., Nusbaum, C., O'Connor, M.J., Okazaki, Y., Oliver, K., Overton‐Larty, E., Pachter, L., Parra, G., Pepin, K., Peterson, J., Pezvner, P., Plumb, R., Pohl, C., Poliakov, A., Ponce, T., Ponting, C., Potter, S., Quail, M., Reymond, A., Roe, B.A., Roskin, K.M., Rubin, E., Rust, A.G., Santos, R., Sapojnikov, V., Schultz, B., Schultz, J., Schwartz, M.S., Schwartz, S., Scott, C., Seaman, S., Searle, S., Sharpe, T., Sheridan, A., Shownkeen, R., Sims, S., Singer, J.B., Slater, G., Smit, A., Smith, D.R., Spencer, B., Stabenau, A., Stange‐Thomann, N., Sugnet, C., Suyama, N., Tesler, G., Thompson, J., Torrents, D., Trevaskis, E., Tromp, J., Ucla, C., Ureta‐Vidal, A., Vinson, J.P., von Niederhausern, A.C., Wade, C.M., Wall, M., Weber, R.J., Weiss, R.B., Wendl1, M., West, T., Wetterstrand, C., Wheeler, R., Wierzbowski, J., Willey, T., Williams, S., Wilson, R., Winter, E., Worley, K.C., Wyman, D., Yang, S., Shiaw‐Pyng Ya, Zdobnov, E., Zody, M.C., and Lander, E.S. 2002. Initial sequencing and comparative analysis of the mouse genome. In press.
   Wiehe, T., Gebauer‐Jung, S., Mitchell‐Olds, T., and Guigo, R. 2001. SGP‐1: Prediction and validation of homologous genes based on sequence alignments. Genome Res. 11:1574‐1583
   Wilson, M.D., Riemer, C., Martindale, D.W., Schnupf, P., Boright, A.P., Cheung, T.L., Hardy, D.M., Schwartz, S., Scherer, S.W., Tsui, L.C., Miller, W., and Koop, B.F. 2001. Comparative analysis of the gene‐dense ACHE/TFR2 region on human chromosome 7q22 with the orthologous region on mouse chromosome 5. Nucleic Acids Res. 29:1352‐1365
Key References
   Schwartz et al. 2000. See above.
   Detailed documentation for the PipMaker server.
Internet Resources
   http://bio.cse.psu.edu/
   Links to alignment programs and complementary programs, including the PipMaker server homepage, the list of PipMaker underlay and annotation colors, PipMaker examples, whole genome human/mouse homology, and Laj download and instruction site.
   http://www.cs.wisc.edu/~ghost/
   Aladdin homepage (for the GhostScript program).
   http://www.adobe.com/
   Adobe homepage (for the Acrobat Acrobat Reader program).
   http://ftp.genome.washington.edu/cgi‐bin/RepeatMasker/
   RepeatMasker Web site.
   http://www.ncbi.nlm.nih.gov/
   Contains link to GenBank Web site.
   http://genome.cse.ucsc.edu/
   Human Genome Browser.
   http://www.ensembl.org/
   Ensembl Genome Browser.
   http://soft.ice.mpg.de/sgp‐1
   SGP‐1 (Syntenic Gene Prediction Program) server.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
 
ad image
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
扫码领资料
反馈
TOP
打开小程序