From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline
互联网
- Abstract
- Table of Contents
- Figures
- Literature Cited
Abstract
This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high?quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data?processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK. Curr. Protoc. Bioinform . 43:11.10.1?11.10.33. © 2013 by John Wiley & Sons, Inc.
Keywords: NGS; WGS; exome; variant detection; genotyping
Table of Contents
- Introduction
- Strategic Planning
- Basic Protocol 1: From FASTQ to Analysis‐Ready BAM: Preparing the Sequence Data
- Basic Protocol 2: From Analysis‐Ready BAM to Raw Variants: Calling Variants in Diploid Organisms with HaplotypeCaller
- Basic Protocol 3: From Raw to Analysis‐Ready Variants: Variant Quality Score Recalibration
- Alternate Protocol 1: From Analysis‐Ready BAM to Raw Variants: Calling Variants in Non‐Diploid Organisms with UnifiedGenotyper
- Alternate Protocol 2: From Raw to Analysis‐Ready Variants: Hard Filtering Small Datasets
- Support Protocol 1: Obtaining and Installing the Software Used in This Unit
- Support Protocol 2: From BAM Back to FASTQ: Reprocessing Old Data
- Support Protocol 3: Fixing Improperly Formatted BAM Files
- Support Protocol 4: Adding Variant Annotations with VariantAnnotator
- Acknowledgments
- Literature Cited
- Figures
Materials
Figures
-
Figure 11.10.1 Strategic planning workflow for the protocols included in this unit. View Image
Videos
Literature Cited
Literature Cited | |
1000 Genomes Project Consortium. 2010. A map of human genome variation from population‐scale sequencing. Nature 467:1061‐1073. | |
DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., McKenna, A., Fennell, T.J., Kernytsky, A.M., Sivachenko, A.Y., Cibulskis, K., Gabriel, S.B., Altshuler, D., and Daly, M.J. 2011. A framework for variation discovery and genotyping using next‐generation DNA sequencing data. Nat. Genet. 43:491‐498. | |
Fisher, R.A. 1922. On the interpretation of c2 from contingency tables, and the calculation of p. J. R. Stat. Soc. 85:87‐94. | |
International HapMap 3 Consortium, Altshuler, D.M., Gibbs, R.A., Peltonen, L., Altshuler, D.M., Gibbs, R.A., Peltonen, L., Dermitzakis, E., Schaffner, S.F., Yu, F., Chang, K., Hawes, A., Lewis, L.R., Ren, Y., Wheeler, D., Gibbs, R.A., Muzny, D.M., Barnes, C., Darvishi, K., Hurles, M., Korn, J.M., Kristiansson, K., Lee, C., McCarrol, S.A., Nemesh, J., Dermitzakis, E., Keinan, A., Montgomery, S.B., Pollack, S., Price, A.L., 2Soranzo, N., Bonnen, P.E., Gibbs, R.A., Gonzaga‐Jauregui, C., Keinan, A., Price, A.L., Yu, F., Anttila, V., Brodeur, W., Daly, M.J., Leslie, S., McVean, G., Moutsianas, L., Nguyen, H., Schaffner, S.F., Zhang, Q., Ghori, M.J., McGinnis, R., McLaren, W., Pollack, S., Price, A.L., Schaffner, S.F., Takeuchi, F., Grossman, S.R., Shlyakhter, I., Hostetter, E.B., Sabeti, P.C., Adebamowo, C.A., Foster, M.W., Gordon, D.R., Licinio, J., Manca, M.C., Marshall, P.A., Matsuda, I., Ngare, D., Wang, V.O., Reddy, D., Rotimi, C.N., Royal, C.D., Sharp, R.R., Zeng, C., Brooks, L.D., and McEwen, J.E. 2010. Integrating common and rare genetic variation in diverse human populations. Nature 467:52‐58. | |
Li, H. and Durbin, R. 2010. Fast and accurate long‐read alignment with Burrows‐Wheeler transform. Bioinformatics (Oxford) 26:589‐595. | |
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. 1000 Genome Project Data Processing Subgroup 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford) 25:2078‐2079. | |
Mann, H.B. and Whitney, D.R. 1947. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18:50‐60. | |
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., and DePristo, M.A. 2010. The Genome Analysis Toolkit: A MapReduce framework for analyzing next‐generation DNA sequencing data. Genome Res. 20:1297‐1303. | |
Mills, R.E., Luttig, C.T., Larkins, C.E., and Beauchamp, A. 2006. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 16:1182‐1190. | |
Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., and Sirotkin, K. 2001. dbSNP: The NCBI database of genetic variation. Nucleic Acids Res. 29:308‐311. | |
Wickham, H. 2009. ggplot2: Elegant Graphics for Data Analysis. Springer, New York. |