Generating a Genome Assembly with PCAP
互联网
- Abstract
- Table of Contents
- Figures
- Literature Cited
Abstract
This unit describes how to use the Parallel Contig Assembly Program (PCAP) to assemble the data produced by a whole?genome shotgun sequencing project. We present a basic protocol for using PCAP on a multiprocessor computer in a 300?Mb genome assembly project. A support protocol to prepare input files for PCAP is also described. Another basic protocol for using PCAP on a distributed cluster of computers in a 3?Gb genome assembly project is presented, in addition to suggestions for understanding results from PCAP.
Keywords: Whole?Genome Shotgun Sequencing; Genome Assembly
Table of Contents
- Basic Protocol 1: Producing an Assembly with PCAP Using an Example Data Set
- Support Protocol 1: Downloading and Installing PCAP
- Support Protocol 2: Preparation of Input Files
- Support Protocol 3: Generating the fofn.con File
- Basic Protocol 2: Generating a Large‐Scale Assembly with PCAP Using Distributed Computing
- Guidelines for Understanding Results
- Commentary
- Literature Cited
- Figures
Materials
Figures
-

Figure 11.3.1 The top part of the contigs.bases file produced on the example data set. View Image -

Figure 11.3.2 The entire content of the supercontigs file produced on the example data set. View Image -

Figure 11.3.3 The top part of the reads.placed file produced on the example data set. View Image -

Figure 11.3.4 The entire content of the reads.unplaced file produced on the example data set. View Image -

Figure 11.3.5 The entire content of the readpairs.contigs file produced on the example data set. View Image -

Figure 11.3.6 The top part of the readpairs.reads file produced on the example data set. View Image -

Figure 11.3.7 The top part of the fofn.con.pcap.results file produced on the example data set. View Image -

Figure 11.3.8 The entire content of the fofn.con.pcap.sort.stat file produced on the example data set. View Image -

Figure 11.3.9 The middle part of the fofn.pcap.n50 file produced on the example data set. View Image -

Figure 11.3.10 The entire content of the fofn.pcap.contigs1.snp file produced on the example data set. View Image -

Figure 11.3.11 Specification of read pairs in the .con file when the same subclone is sequenced multiple times. View Image -

Figure 11.3.12 The top part of the fofn.con file for the example data set. View Image
Videos
Literature Cited
| Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410. | |
| Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A.F., Gelpke, M.D., Roach, J., Oh, T., Ho, I.Y., Wong, M., Detter, C., Verhoef, F., Predki, P., Tay, A., Lucas, S., Richardson, P., Smith, S.F., Clark, M.S., Edwards, Y.J., Doggett, N., Zharkikh, A., Tavtigian, S.V., Pruss, D., Barnstead, M., Evans, C., Baden, H., Powell, J., Glusman, G., Rowen, L., Hood, L., Tan, Y.H., Elgar, G., Hawkins, T., Venkatesh, B., Rokhsar, D., and Brenner, S. 2002. Whole‐genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:1301‐1310. | |
| Havlak, P., Chen, R., Durbin, K.J., Egan, A., Ren, Y., Song, X.‐Z., Weinstock, G.M., and Gibbs, R. 2004. The Atlas genome assembly system. Genome Res. 14:721‐732. | |
| Huang, X. and Madan, A. 1999. CAP3: A DNA sequence assembly program. Genome Res. 9:868‐877. | |
| Huang, X., Wang, J., Aluru, S., Yang, S.‐P., and Hillier, L. 2003. PCAP: A whole‐genome assembly program. Genome Res. 13:2164‐2170. | |
| Jaffe, D.B., Butler, J., Gnerre, S., Mauceli, E., Lindblad‐Toh, K., Mesirov, J.P., Zody, M.C. and Lander, E.S. 2003. Whole‐genome sequence assembly for mammalian genomes: ARACHNE 2. Genome Res. 13:91‐96. | |
| Kent, W.J. 2002. BLAT: The BLAST‐like alignment tool. Genome Res. 12:656‐664. | |
| Kruskal, J.B. 1956. On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Amer. Math. Soc. 7:48‐50. | |
| Mullikin, J.C. and Ning, Z. 2003. The Phusion assembler. Genome Res. 13:81‐90. | |
| Myers, E.W., Sutton, G.G., Delcher, A.L., Dew, I.M., Fasulo, D.P., Flanigan, M.J., Kravitz, S.A., Mobarry, C.M., Reinert, K.H., Remington, K.A., Anson, E.L., Bolanos, R.A., Chou, H.H., Jordan, C.M., Halpern, A.L., Lonardi, S., Beasley, E.M., Brandon, R.C., Chen, L., Dunn, P.J., Lai, Z., Liang, Y., Nusskern, D.R., Zhan, M., Zhang, Q., Zheng, X., Rubin, G.M., Adams, M.D., and Venter, J.C. 2000. A whole‐genome assembly of Drosophila. Science 287:2196‐2204. | |
| Needleman, S.B. and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. 48:443‐453. | |
| Pearson, W.R. and Lipman, D. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444‐2448. | |
| Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147:195‐197. | |
| Key References | |
| Huang et al., 2003. See above. | |
| This article describes the methods used in PCAP in detail. | |
| Internet Resources | |
| http://seq.cs.iastate.edu | |
| This site contains documentation on PCAP and example test data sets. |









