Generating a Genome Assembly with PCAP
互联网
- Abstract
- Table of Contents
- Figures
- Literature Cited
Abstract
This unit describes how to use the Parallel Contig Assembly Program (PCAP) to assemble the data produced by a whole?genome shotgun sequencing project. We present a basic protocol for using PCAP on a multiprocessor computer in a 300?Mb genome assembly project. A support protocol to prepare input files for PCAP is also described. Another basic protocol for using PCAP on a distributed cluster of computers in a 3?Gb genome assembly project is presented, in addition to suggestions for understanding results from PCAP.
Keywords: Whole?Genome Shotgun Sequencing; Genome Assembly
Table of Contents
- Basic Protocol 1: Producing an Assembly with PCAP Using an Example Data Set
- Support Protocol 1: Downloading and Installing PCAP
- Support Protocol 2: Preparation of Input Files
- Support Protocol 3: Generating the fofn.con File
- Basic Protocol 2: Generating a Large‐Scale Assembly with PCAP Using Distributed Computing
- Guidelines for Understanding Results
- Commentary
- Literature Cited
- Figures
Materials
Figures
-
Figure 11.3.1 The top part of the contigs.bases file produced on the example data set. View Image -
Figure 11.3.2 The entire content of the supercontigs file produced on the example data set. View Image -
Figure 11.3.3 The top part of the reads.placed file produced on the example data set. View Image -
Figure 11.3.4 The entire content of the reads.unplaced file produced on the example data set. View Image -
Figure 11.3.5 The entire content of the readpairs.contigs file produced on the example data set. View Image -
Figure 11.3.6 The top part of the readpairs.reads file produced on the example data set. View Image -
Figure 11.3.7 The top part of the fofn.con.pcap.results file produced on the example data set. View Image -
Figure 11.3.8 The entire content of the fofn.con.pcap.sort.stat file produced on the example data set. View Image -
Figure 11.3.9 The middle part of the fofn.pcap.n50 file produced on the example data set. View Image -
Figure 11.3.10 The entire content of the fofn.pcap.contigs1.snp file produced on the example data set. View Image -
Figure 11.3.11 Specification of read pairs in the .con file when the same subclone is sequenced multiple times. View Image -
Figure 11.3.12 The top part of the fofn.con file for the example data set. View Image
Videos
Literature Cited
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410. | |
Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A.F., Gelpke, M.D., Roach, J., Oh, T., Ho, I.Y., Wong, M., Detter, C., Verhoef, F., Predki, P., Tay, A., Lucas, S., Richardson, P., Smith, S.F., Clark, M.S., Edwards, Y.J., Doggett, N., Zharkikh, A., Tavtigian, S.V., Pruss, D., Barnstead, M., Evans, C., Baden, H., Powell, J., Glusman, G., Rowen, L., Hood, L., Tan, Y.H., Elgar, G., Hawkins, T., Venkatesh, B., Rokhsar, D., and Brenner, S. 2002. Whole‐genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:1301‐1310. | |
Havlak, P., Chen, R., Durbin, K.J., Egan, A., Ren, Y., Song, X.‐Z., Weinstock, G.M., and Gibbs, R. 2004. The Atlas genome assembly system. Genome Res. 14:721‐732. | |
Huang, X. and Madan, A. 1999. CAP3: A DNA sequence assembly program. Genome Res. 9:868‐877. | |
Huang, X., Wang, J., Aluru, S., Yang, S.‐P., and Hillier, L. 2003. PCAP: A whole‐genome assembly program. Genome Res. 13:2164‐2170. | |
Jaffe, D.B., Butler, J., Gnerre, S., Mauceli, E., Lindblad‐Toh, K., Mesirov, J.P., Zody, M.C. and Lander, E.S. 2003. Whole‐genome sequence assembly for mammalian genomes: ARACHNE 2. Genome Res. 13:91‐96. | |
Kent, W.J. 2002. BLAT: The BLAST‐like alignment tool. Genome Res. 12:656‐664. | |
Kruskal, J.B. 1956. On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Amer. Math. Soc. 7:48‐50. | |
Mullikin, J.C. and Ning, Z. 2003. The Phusion assembler. Genome Res. 13:81‐90. | |
Myers, E.W., Sutton, G.G., Delcher, A.L., Dew, I.M., Fasulo, D.P., Flanigan, M.J., Kravitz, S.A., Mobarry, C.M., Reinert, K.H., Remington, K.A., Anson, E.L., Bolanos, R.A., Chou, H.H., Jordan, C.M., Halpern, A.L., Lonardi, S., Beasley, E.M., Brandon, R.C., Chen, L., Dunn, P.J., Lai, Z., Liang, Y., Nusskern, D.R., Zhan, M., Zhang, Q., Zheng, X., Rubin, G.M., Adams, M.D., and Venter, J.C. 2000. A whole‐genome assembly of Drosophila. Science 287:2196‐2204. | |
Needleman, S.B. and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. 48:443‐453. | |
Pearson, W.R. and Lipman, D. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444‐2448. | |
Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195‐197. | |
Key References | |
Huang et al., 2003. See above. | |
This article describes the methods used in PCAP in detail. | |
Internet Resources | |
http://seq.cs.iastate.edu | |
This site contains documentation on PCAP and example test data sets. |