Analyzing Copy Number Variation Using SNP Array Data: Protocols for Calling CNV and Association Tests
互联网
- Abstract
- Table of Contents
- Materials
- Figures
- Literature Cited
Abstract
High?density SNP genotyping technology provides a low?cost, effective tool for conducting Genome Wide Association (GWA) studies. The wide adoption of GWA studies has indeed led to discoveries of disease? or trait?associated SNPs, some of which were subsequently shown to be causal. However, the nearly universal shortcoming of many GWA studies?missing heritability?has prompted great interest in searching for other types of genetic variation, such as copy number variation (CNV). Certain CNVs have been reported to alter disease susceptibility. Algorithms and tools have been developed to identify CNVs using SNP array hybridization intensity data. Such an approach provides an additional source of data with almost no extra cost. In this unit, we demonstrate the steps for calling CNVs from Illumina SNP array data using PennCNV and performing association analysis using R and PLINK. Curr. Protoc. Hum. Genet . 79:1.27.1?1.27.15. © 2013 by John Wiley & Sons, Inc.
Keywords: copy number variations (CNV); CNV calling; genome?wide association studies; SNP genotyping array; association study; burden analysis
Table of Contents
- Introduction
- Basic Protocol 1: Detect CNVs from Illumina Whole‐Genome Genotyping Array Data Using PennCNV
- Basic Protocol 2: Use of R to Perform Association Tests for Common CNVs
- Basic Protocol 3: Use of PLINK to Perform Burden Tests for Rare or Non‐Overlapping CNVs
- Support Protocol 1: Visually Inspect CNVs on the UCSC Genome Browser
- Commentary
- Literature Cited
- Figures
Materials
Basic Protocol 1: Detect CNVs from Illumina Whole‐Genome Genotyping Array Data Using PennCNV
Materials
Basic Protocol 2: Use of R to Perform Association Tests for Common CNVs
Materials
Basic Protocol 3: Use of PLINK to Perform Burden Tests for Rare or Non‐Overlapping CNVs
Materials
Support Protocol 1: Visually Inspect CNVs on the UCSC Genome Browser
Materials
|
Figures
-
Figure 1.27.1 (A ) Calling SNP genotypes by the ratio of probe intensities (allele frequencies) on hybridization arrays. (B ) Examples where copy number variations alter total intensities and allele frequencies. View Image -
Figure 1.27.2 A section of a chromosome to demonstrate how Copy Number Polymorphic Regions (CNPRs) are constructed. In this example, PennCNV has been run to call CNVs from SNP array data of six individuals (indiv1 through 6). All called CNVs from all individuals were pooled together. All non‐redundant end points of the CNVs become break points that would be used to partition the chromosome. A pair of break points form a CNPR. Every CNV is then decomposed into multiple consecutive CNPRs. Red: Copy Number (CN) = 1; Blue: CN=3; Black: CN=4. Based on the type of CNV (CN=1 or CN=3) one individual has in a CNPR (CN=2 if no CNV was called), a matrix can be generated. View Image
Videos
Literature Cited
Barnes, C., Plagnol, V., Fitzgerald, T., Redon, R., Marchini, J., Clayton, D., and Hurles, M.E. 2008. A robust statistical method for case‐control association testing with copy number variation. Nat. Genet. 40:1245‐1252. | |
Bochukova, E.G., Huang, N., Keogh, J., Henning, E., Purmann, C., Blaszczyk, K., Saeed S., Hamilton‐Shield, J., Clayton‐Smith, J., O'Rahilly, S., Hurles, M.E., and Farooqi, I.S. 2010. Large, rare chromosomal deletions associated with severe early‐onset obesity. Nature 463:666‐670. | |
Colella, S., Yau, C., Taylor, J.M., Mirza, G., Butler, H., Clouston, P., Bassett, A.S., Seller, A., Holmes, C.C., and Ragoussis, J. 2007. QuantiSNP: An Objective Bayes Hidden‐Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35:2013‐2025. | |
Conrad, D.F., Pinto, D., Redon, R., Feuk, L., Gokcumen, O., Zhang, Y., Aerts, J., Andrews, T.D., Barnes, C., Campbell, P., Fitzgerald, T., Hum, M., Ihm, C.H., Kristiansson, K., Macarthur, D.G., Macdonald, J.R., Onyiah, I., Pang, A.W., Robson, S., Stirrups, K., Valsesia, A., Walter, K., Wei, J.; Wellcome Trust Case Control Consortium, Tyler‐Smith, C., Carter, N.P., Lee, C., Scherer, S.W., and Hurles, M.E. 2010. Origins and functional impact of copy number variation in the human genome. Nature 464:704‐712. | |
Diskin, S.J., Li, M., Hou, C., Yang, S., Glessner, J., Hakonarson, H., Bucan, M., Maris, J.M., and Wang, K. 2008. Adjustment of genomic waves in signal intensities from whole‐genome SNP genotyping platforms. Nucleic Acids Res. 36:e126. | |
Kidd, J.M., Cooper, G.M., Donahue, W.F., Hayden, H.S., Sampas, N., Graves, T., Hansen, N., Teague, B., Alkan, C., Antonacci, F., Haugen, E., Zerr, T., Yamada, N.A., Tsang, P., Newman, T.L., Tüzün, E., Cheng, Z., Ebling, H.M., Tusneem, N., David, R., Gillett, W., Phelps, K.A., Weaver, M., Saranga, D., Brand, A., Tao, W., Gustafson, E., McKernan, K., Chen, L., Malig, M., Smith, J.D., Korn, J.M., McCarroll, S.A., Altshuler, D.A., Peiffer, D.A., Dorschner, M., Stamatoyannopoulos, J., Schwartz, D., Nickerson, D.A., Mullikin, J.C., Wilson, R.K., Bruhn, L., Olson, M.V., Kaul, R., Smith, D.R., and Eichler, E.E. 2008. Mapping and sequencing of structural variation from eight human genomes. Nature 453:56‐64. | |
Korn, J.M., Kuruvilla, F.G., McCarroll, S.A., Wysoker, A., Nemesh, J., Cawley, S., Hubbell, E., Veitch, J., Collins, P.J., Darvishi, K., Lee, C., Nizzari, M.M., Gabriel, S.B., Purcell, S., Daly, M.J., and Altshuler, D. 2008. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat. Genet. 40:1253‐1260. | |
Merikangas, A.K., Corvin, A.P., and Gallagher, L. 2009. Copy‐number variants in neurodevelopmental disorders: Promises and challenges. Trends Genet. 25:536‐544. | |
Purcell, S., Neale, B., Todd‐Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., de Bakker, P.I., Daly, M.J., and Sham, P.C. 2007. PLINK: A tool set for whole‐genome association and population‐based linkage analyses. Am. J. Hum. Genet. 81:559‐575. | |
Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, T.D., Fiegler, H., Shapero, M.H., Carson, A.R., Chen, W., Cho, E.K., Dallaire, S., Freeman, J.L., González, J.R., Gratacòs, M., Huang, J., Kalaitzopoulos, D., Komura, D., MacDonald, J.R., Marshall, C.R., Mei, R., Montgomery, L., Nishimura, K., Okamura, K., Shen, F., Somerville, M.J., Tchinda, J., Valsesia, A., Woodwark, C., Yang, F., Zhang, J., Zerjal, T., Zhang, J., Armengol, L., Conrad, D.F., Estivill, X., Tyler‐Smith, C., Carter, N.P., Aburatani, H., Lee, C., Jones, K.W., Scherer, S.W., and Hurles, M.E. 2006. Global variation in copy number in the human genome. Nature 444:444‐454. | |
Wang, K., Li, M., Hadley, D., Liu, R., Glessner, J., Grant, S.F., Hakonarson, H., and Bucan, M. 2007. PennCNV: An integrated hidden Markov model designed for high‐resolution copy number variation detection in whole‐genome SNP genotyping data. Genome Res. 17:1665‐1674. | |
Wellcome Trust Case Control Consortium, Craddock, N., Hurles, M. E., Cardin, N., Pearson, R. D., Plagnol, V., Robson, S., Vukcevic, D., Barnes, C., Conrad, D.F., Giannoulatou, E., Holmes, C., Marchini, J.L., Stirrups, K., Tobin, M.D., Wain, L.V., Yau, C., Aerts, J., Ahmad, T., Andrews, T.D., Arbury, H., Attwood, A., Auton, A., Ball, S.G., Balmforth, A.J., Barrett, J.C., Barroso, I., Barton, A., Bennett, A.J., Bhaskar, S., Blaszczyk, K., Bowes, J., Brand, O.J., Braund, P.S., Bredin, F., Breen, G., Brown, M.J., Bruce, I.N., Bull, J., Burren, O.S., Burton, J., Byrnes, J., Caesar, S., Clee, C.M., Coffey, A.J., Connell, J.M., Cooper, J.D., Dominiczak, A.F., Downes, K., Drummond, H.E., Dudakia, D., Dunham, A., Ebbs, B., Eccles, D., Edkins, S., Edwards, C., Elliot, A., Emery, P., Evans, D.M., Evans, G., Eyre, S., Farmer, A., Ferrier, I.N., Feuk, L., Fitzgerald, T., Flynn, E., Forbes, A., Forty, L., Franklyn, J.A., Freathy, R.M., Gibbs, P., Gilbert, P., Gokumen, O., Gordon‐Smith, K., Gray, E., Green, E., Groves, C.J., Grozeva, D., Gwilliam, R., Hall, A., Hammond, N., Hardy, M., Harrison, P., Hassanali, N., Hebaishi, H., Hines, S., Hinks, A., Hitman, G.A., Hocking, L., Howard, E., Howard, P., Howson, J.M., Hughes, D., Hunt, S., Isaacs, J.D., Jain, M., Jewell, D.P., Johnson, T., Jolley, J.D., Jones, I.R., Jones, L.A., Kirov, G., Langford, C.F., Lango‐Allen, H., Lathrop, G.M., Lee, J., Lee, K.L., Lees, C., Lewis, K., Lindgren, C.M., Maisuria‐Armer, M., Maller, J., Mansfield, J., Martin, P., Massey, D.C., McArdle, W.L., McGuffin, P., McLay, K.E., Mentzer, A., Mimmack, M.L., Morgan, A.E., Morris, A.P., Mowat, C., Myers, S., Newman, W., Nimmo, E.R., O'Donovan, M.C., Onipinla, A., Onyiah, I., Ovington, N.R., Owen, M.J., Palin, K., Parnell, K., Pernet, D., Perry, J.R., Phillips, A., Pinto, D., Prescott, N.J., Prokopenko, I., Quail, M.A., Rafelt, S., Rayner, N.W., Redon, R., Reid, D.M., Ring, S.M., Robertson, N., Russell, E., St Clair, D., Sambrook, J.G., Sanderson, J.D., Schuilenburg, H., Scott, C.E., Scott, R., Seal, S., Shaw‐Hawkins, S., Shields, B.M., Simmonds, M.J., Smyth, D.J., Somaskantharajah, E., Spanova, K., Steer, S., Stephens, J., Stevens, H.E., Stone, M.A., Su, Z., Symmons, D.P., Thompson, J.R., Thomson, W., Travers, M.E., Turnbull, C., Valsesia, A., Walker, M., Walker, N.M., Wallace, C., Warren‐Perry, M., Watkins, N.A., Webster, J., Weedon, M.N., Wilson, A.G., Woodburn, M., Wordsworth, B.P., Young, A.H., Zeggini, E., Carter, N.P., Frayling, T.M., Lee, C., McVean, G., Munroe, P.B., Palotie, A., Sawcer, S.J., Scherer, S.W., Strachan, D.P., Tyler‐Smith, C., Brown, M.A., Burton, P.R., Caulfield, M.J., Compston, A., Farrall, M., Gough, S.C., Hall, A.S., Hattersley, A.T., Hill, A.V., Mathew, C.G., Pembrey, M., Satsangi, J., Stratton, M.R., Worthington, J., Deloukas, P., Duncanson, A., Kwiatkowski, D.P., McCarthy, M.I., Ouwehand, W., Parkes, M., Rahman, N., Todd, J.A., Samani, N.J., and Donnelly, P. 2010. Genome‐wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464:713‐720. | |
Internet Resources | |
http://www.openbioinformatics.org/penncnv/ | |
PennCNV Web site. Users can download the PennCNV source code, compile, and install on their own computers. The Web site also contains a wealth of information including program manual, annotation files, tutorials for the PennCNV software, and other useful tips such as visualization and quality control recommendations. | |
http://www.r‐project.org/ | |
R Web site. R is a free program for statistical computing and visualization. Users can download the compiled R package for their specific computing platforms. The Web site also lists URLs to the Comprehensive R Archive Network (CRAN). CRAN hosts user‐contributed packages that provide additional analysis capabilities. | |
http://www.illumina.com/software/genomestudio_software.ilmn | |
Illumina GenomeStudio Web site: The Web site contains instructions and FAQs for the GenomeStudio software, which is required to export SNP intensities from Illumina Chip projects for CNV calling. Illumina customers can obtain the software for free. | |
http://pngu.mgh.harvard.edu/~purcell/plink/ | |
PLINK Web site. PLINK is developed by Shaun Purcell at Harvard University. The free, open‐source program is widely used by the research community to process and analyze genome‐wide association studies (GWAS). Users can download the source code or obtain pre‐compiled binaries for installation from this Web site. This Web site also contains very detailed instructions on how to use the program. | |
http://genome.ucsc.edu/ | |
UCSC Genome Browser. Users can go to UCSC Genome Browser to download genomic annotations, or visualize CNV calls on the reference genome as outlined in the Support Protocol. | |
http://www.humgen.nl/SNP_databases.html | |
List of Genetic variation databases. The Center for Human and Clinical Genetics at Leiden University Medical Center maintains a comprehensive list of genetic variation databases, including CNV databases. | |
http://hgsv.washington.edu/ | |
The Human Genome Structural Variation Project. This Web site, maintained by the Eichler lab at the University of Washington, provides a detailed map of CNVs and large structural variants. | |
http://www.sanger.ac.uk/research/areas/humangenetics/cnv/ | |
The Copy Number Variation (CNV) Project. The database is maintained by the Wellcome Trust Sanger Institute. It hosts CNVs identified through a variety of genotyping and hybridization approaches and provides extensive information of known CNV/phenotype associations. | |
http://projects.tcag.ca/variation/project.html | |
The Database of Genomic Variants. This database is maintained by the University of Toronto Centre for Applied Genomics. The database is a comprehensive catalog of structural variants in the human genome by collecting published reports on healthy controls in the literature. It can be used as controls in studies to correlate CNVs with diseases and traits. |