Identification of Mutations in Zebrafish Using Next‐Generation Sequencing

互联网2013-12-31

1011

Abstract
Table of Contents
Materials
Figures
Literature Cited

Abstract

Whole?genome sequencing (WGS) has been used in many invertebrate model organisms as an efficient tool for mapping and identification of mutations affecting particular morphological or physiological processes. However, the application of WGS in highly polymorphic, larger genomes of vertebrates has required new experimental and analytical approaches. As a consequence, a wealth of different analytical tools has been developed. As the generation and analysis of data stemming from WGS can be unwieldy and daunting to researchers not accustomed to many common bioinformatic analyses and Unix?based computational tools, we focus on how to manage and analyze next?generation sequencing datasets without an extensive computational infrastructure and in?depth bioinformatic knowledge. Here we describe methods for the analysis of WGS for use in mapping and identification of mutations in the zebrafish. We stress key elements of the experimental design and the analytical approach that allow the use of this method across different sequencing platforms and in different model organisms with annotated genomes. Curr. Protoc. Mol. Biol . 104:7.13.1?7.13.33. © 2013 by John Wiley & Sons, Inc.

Keywords: whole?genome sequencing; WGS; mutation mapping; zebrafish; next?generation sequencing

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Introduction
Strategic Planning of Mapping Experiments
Basic Protocol 1: Preparation of the DNA Library for Next‐Generation Sequencing
Basic Protocol 2: Sequence Data Alignment and Variant Identification
Support Protocol 1: Software and Datasets Used for Data Analysis
Basic Protocol 3: Linkage Mapping Based on Homozygosity‐by‐Descent
Support Protocol 2: Verification of Linkage
Basic Protocol 4: Identification of Candidate Mutations
Support Protocol 3: Identifying Candidate Causative Mutations in Regions Covered by Only One Read
Basic Protocol 5: Identification of Small Insertions or Deletions within a Linked Interval as Candidate Mutations
Commentary
Literature Cited
Figures
Tables

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Preparation of the DNA Library for Next‐Generation Sequencing

Materials

F2 generation of a genetic cross, sorted by phenotype into mutants and siblings (Fig. A)
Reagents for DNA extraction: e.g., DNeasy Blood & Tissue Kit (Qiagen, cat. no. 69504)
Optional : Kit for library preparation for next‐generation sequencing, e.g., TruSeq DNA Sample Preparation kit (Illumina, cat. no. CES FC‐121‐2001)
Spectrophotometer, e.g., NanoDrop (see APPENDICES & )

Additional reagents and equipment for DNA extraction (unit 2.1 ), quantitation of nucleic acids (APPENDICES & ), and library preparation for Illumina sequencing (Son and Taylor, )

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 7.13.1 Crossing scheme and subsequent library preparation for mapping zebrafish mutations by next‐generation sequencing. (A ) Crossing scheme to facilitate mapping of zebrafish mutations based on homozygosity‐by‐descent. Heterozygous carriers for the mutation are crossed to a polymorphic strain to enable tracking of recombination events in the F2 generation. In‐crosses between mutant carriers are performed in the F1 generation, and their progeny are sorted into homozygous mutants and wild‐type (wt) siblings. If an SNP allele (here A is the mutant and T is the polymorphic strain) is linked to the mutation, the F2 mutant progeny will be homozygous for the allele found in the mutant strain (A). Depending on the distance between the SNP and the mutation, occasional recombination events can be detected, as they appear heterozygous in mutants ~undefined). Importantly, siblings will either be heterozygous or homozygous for the alternate allele (here T) for a linked marker. An unlinked SNP will show random distribution of alleles in mutants and siblings. (B ) Basic steps for library preparation for next‐generation sequencing. For library preparation, genomic DNA is first fragmented into small fragments (∼200 bp). To permit next‐generation sequencing, short adapters consisting of annealed and therefore double‐stranded oligonucleotides are ligated onto the DNA fragments. These adapters can then be used as sequencing primers in next‐generation sequencing.

View Image

Figure 7.13.2 Novoalign output. Screen shot of a typical Novoalign output during read alignment. The first line indicates the file currently being processed. The next four lines list the Novoalign version used. The command that is being executed is given in line 6. The number of reads in the file and the number of reads which could be aligned to the genome are given in lines 13 and 14, respectively.

View Image

Figure 7.13.3 VCF file content. Represented are the first 100 lines of an example .vcf created in step 19 of . The first 21 lines provide information about the format of the file content. The header of the file is boxed in red; highlighted in blue are the strain names of the four strains used in this example, NC31, AB, TUG, and WKG. The lines below the header list the genotypes of SNPs in each strain. A detailed description of this file format can be found at http://samtools.sourceforge.net/mpileup.shtml.

View Image

Figure 7.13.4 Visualizing data used for mapping in IGV. Shown in the figure is a screen shot of an example file, chr16.vcf_noINDEL_filtered_NC31_TU_WK.cn, created, in step 1 of , in IGV. A description of the single tracks seen in the figure can be found in table . This file can be used to cross check the results of the mapping score and to define a homogeneous region for the identification of potential candidate mutations. A linked interval (high mapping score) is characterized by a low level of heterogeneity (track 1) and a high level of homogeneity (track 2). In this example, two regions showing these characteristics can be found on the chromosome. One between 19 and 25 Mb and one between 40 and 51 Mb. A linked region is also defined as having a low level of SNPs with identity to the mapping strain (e.g., region between 40 and 51 Mb; track 6). The high level of mapping strain alleles thus excludes the region between 19 and 25 Mb; this is reflected in a low mapping score for this region. Within the linked interval defined by the mapping score, a more defined region of reduced/absent heterogeneity can be detected between 42 and 50 Mb. This region is a defined interval for harboring the candidate phenotype causing mutation.

View Image

Figure 7.13.5 SSLP and SNP marker analysis. (A ) SSLP marker analysis on pooled DNA samples from siblings (S) and mutants (M). Examples for a non‐polymorphic (left), polymorphic unlinked (middle), and a polymorphic and linked marker (right). For a linked marker, one of the marker alleles co‐segregates with the mutation. (B ) Verification of linkage on single embryo DNA. Siblings show roughly the expected ratio of homozygous to heterozygous individuals. The band pattern indicates that the upper band is specific for the mapping strain, as all mutants are homozygous for the lower band. One exception is the mutant showing both bands (red asterisk), which indicates a recombination event. (C ) Allele distribution for an SNP (highlighted in red) analyzed by capillary sequencing. Typically, an SNP linked to the mutation will either be heterozygous or homozygous for one allele. The mutants will be homozygous for the alternate allele.

View Image

Videos

Literature Cited

	Afgan, E., Chapman, B., Jadan, M., Franke, V. and Taylor, J. 2012. Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy. Curr. Protoc. Bioinform. 38:11.9.1‐11.9.20.
	Arnold, C.N., Xia, Y., Lin, P., Ross, C., Schwander, M., Smart, N.G., Müller, U. and Beutler, B. 2011. Rapid identification of a disease allele in mouse through whole genome sequencing and bulk segregation analysis. Genetics 187:633‐641.
	Austin, R.S., Vidaurre, D., Stamatiou, G., Breit, R., Provart, N.J., Bonetta, D., Zhang, J., Fung, P., Gong, Y., Wang, P.W., McCourt, P., and Guttman, D.S. 2011. Next‐generation mapping of arabidopsis genes. Plant J. 67:715‐725.
	Blankenberg, D., Von Kuster, G., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A., and Taylor, J. 2010. Galaxy: A web‐based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. 89:19.10.1‐19.10.21.
	Bowen, M.E., Henke, K., Siegfried, K.R., Warman, M.L., and Harris, M.P. 2012. Efficient mapping and cloning of mutations in zebrafish by low‐coverage whole‐genome sequencing. Genetics 190:1017‐1024.
	Bradley, K.M., Elmore, J.B., Breyer, J.P., Yaspan, B.L., Jessen, J.R., Knapik, E.W., and Smith, J.R. 2007. A major zebrafish polymorphism resource for genetic mapping. Genome Biol. 8:R55.
	Coe, T.S., Hamilton, P.B., Griffiths, A.M., Hodgson, D.J., Wahab, M.A., and Tyler, C.R. 2009. Genetic variation in strains of zebrafish (Danio rerio) and the implications for ecotoxicology studies. Ecotoxicology 18:144‐150.
	Cuperus, J.T., Montgomery, T.A., Fahlgren, N., Burke, R.T., Townsend, T., Sullivan, C.M., and Carrington, J.C. 2010. Identification of MIR390a precursor processing–defective mutants in arabidopsis by direct genome sequencing. PNAS 107:466‐471.
	Doitsidou, M., Poole, R.J., Sarin, S., Bigelow, H., and Hobert, O. 2010. C. elegans mutant identification with a one‐step whole‐genome‐sequencing and SNP mapping strategy. PloS One 5:e15435.
	Flicek, P., Amode, M.R., Barrell, D., Beal, K., Brent, S., Carvalho‐Silva, D., Clapham, P., Coates, G., Fairley, S., Fitzgerald, S., Gil, L., Gordon, L., Hendrix, M., Hourlier, T., Johnson, N., Kähäri, A.K., Keefe, D., Keenan, S., Kinsella, R., Komorowska, M., Koscielny, G., Kulesha, E., Larsson, P., Longden, I., McLaren, W., Muffato, M., Overduin, B., Pignatelli, M., Pritchard, B., Riat, H.S., Ritchie, G.R., Ruffier, M., Schuster, M., Sobral, D., Tang, Y.A., Taylor, K., Trevanion, S., Vandrovcova, J., White, S., Wilson, M., Wilder, S.P., Aken, B.L., Birney, E., Cunningham, F., Dunham, I., Durbin, R., Fernández‐Suarez, X.M., Harrow, J., Herrero, J., Hubbard, T.J., Parker, A., Proctor, G., Spudich, G., Vogel, J., Yates, A., Zadissa, A., and Searle, S.M. 2012. Ensembl 2012. Nucleic Acids Res. 40:D84‐D90.
	Geisler, R., Rauch, G.J., Geiger‐Rudolph, S., Albrecht, A., van Bebber, F., Berger, A., Busch‐Nentwich, E., Dahm, R., Dekens, M.P., Dooley, C., Elli, A.F., Gehring, I., Geiger, H., Geisler, M., Glaser, S., Holley, S., Huber, M., Kerr, A., Kirn, A., Knirsch, M., Konantz, M., Küchler, A.M., Maderspacher, F., Neuhauss, S.C., Nicolson, T., Ober, E.A., Praeg, E., Ray, R., Rentzsch, B., Rick, J.M., Rief, E., Schauerte, H.E., Schepp, C.P., Schönberger, U., Schonthaler, H.B., Seiler, C., Sidi, S., Söllner, C., Wehner, A., Weiler, C., and Nüsslein‐Volhard, C. 2007. Large‐scale mapping of mutations affecting zebrafish development. BMC Genomics 8:11.
	Goecks, J., Nekrutenko, A., and Taylor, J. 2010. Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11:R86.
	Guryev, V., Koudijs, M.J., Berezikov, E., Johnson, S.L., Plasterk, R.H.A., van Eeden, F.J.M., and Cuppen, E. 2006. Genetic variation in the zebrafish. Genome Res. 16:491‐497.
	Hill, J.T., Demarest, B.L., Bisgrove, B.W., Gorsi, B., Su, Y‐C., and Yost, H.J. 2013. MMAPPR: Mutation Mapping Analysis Pipeline for Pooled RNA‐seq. Genome Res. 23:687‐697.
	Knapik, E.W., Goodman, A., Ekker, M., Chevrette, M., Delgado, J., Neuhauss, S., Shimoda, N., Driever, W., Fishman, M.C., and Jacob, H.J. 1998. A microsatellite genetic linkage map for zebrafish (Danio rerio). Nat. Genet. 18:338‐343.
	Leshchiner, I., Alexa, K., Kelsey, P., Adzhubei, I, Austin, C., Cooney, J., Anderson, H., King, M.J., Stottmann, R.W., Garnaas, M.K., Ha, S., Drummond, I.A., Paw, B.H., North, T.E., Beier, D.R., Goessling, W., and Sunyaev, SR. 2012. Mutation mapping and identification by whole genome sequencing. Genome Res. 22:1541‐1548
	Li, H. and Homer, N. 2010. A survey of sequence alignment algorithms for next‐generation sequencing. Brief. Bioinform. 11:473‐483.
	Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R.; 1000 Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map Format and SAMtools. Bioinformatics 25:2078‐2079.
	Liu, S., Yeh, C‐T., Tang, H.M., Nettleton, D., and Schnable, P.S. 2012. Gene mapping via bulked segregant RNA‐Seq (BSR‐Seq). PloS One 7:e36406.
	McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., and DePristo, M.A. 2010. The Genome Analysis Toolkit: A MapReduce framework for analyzing next‐generation DNA sequencing data. Genome Res. 20:1297‐1303.
	Miller, A.C., Obholzer, N.D., Shah, A.N., Megason, S.G., and Moens, C.B. 2013. RNA‐seq based mapping and candidate identification of mutations from forward genetic screens. Genome Res. 23:679‐686.
	Noveroske, J.K., Weber, J.S., and Justice, M.J. 2000. The Mutagenic action of N‐ethyl‐N‐nitrosourea in the mouse. Mamm. Genome 11:478‐483.
	Nusslein‐Volhard, C. and Dahm, R. 2002. Zebrafish: A Practical Approach. 1st ed. Oxford University Press, New York.
	Obholzer, N., Swinburne, I.A., Schwab, E., Nechiporuk, A.V., Nicolson, T., and Megason, S.G. 2012. Rapid positional cloning of zebrafish mutations by linkage and homozygosity mapping using whole‐genome sequencing. Development 139:4280‐4290.
	Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M, Lander, E.S., Getz, G., and Mesirov, J.P. 2011. Integrative genomics viewer. Nat. Biotechnol. 29:24‐26.
	Schneeberger, K., Ossowski, S., Lanz, C., Juul, T., Petersen, A.H., Nielsen, K.L., Jørgensen, J., Weigel, D., and Andersen, S.O. 2009. SHOREmap: Simultaneous mapping and mutation identification by deep sequencing. Nat. Methods 6:550‐551.
	Sobreira, N.L.M., Cirulli, E.T., Avramopoulos, D., Wohler, E., Oswald, G.L., Stevens, E.L., Ge, D., Shianna, K.V., Smith, J.P., Maia, J.M., Gumbs, C.E., Pevsner, J., Thomas, G., Valle, D., Hoover‐Fong, J.E., and Goldstein, D.B. 2010. Whole‐genome sequencing of a single proband together with linkage analysis identifies a Mendelian disease gene. PLoS Genet. 6:e100991.
	Son, M.S. and Taylor, R.K. 2011. Preparing DNA libraries for multiplexed paired‐end deep sequencing for Illumina GA sequencers. Curr. Protoc. Microbiol. 20:1E.4.1‐1E.4.13.
	Stickney, H.L., Schmutz, J., Woods, I.G., Holtzer, C.C., Dickson, M.C., Kelly, P.D., Myers, R.M., and Talbot, W.S. 2002. Rapid mapping of zebrafish mutations with SNPs and oligonucleotide microarrays. Genome Res. 12:1929‐1934.
	Uchida, N., Sakamoto, T., Kurata, T., and Tasaka, M. 2011. Identification of EMS‐induced causal mutations in a non‐reference Arabidopsis thaliana accession by whole genome sequencing. Plant Cell Physiol. 52:716‐722.
	Voz, M.L., Coppieters, W., Manfroid, I., Baudhuin, A., Von Berg, V., Charlier, C., Meyer, D., Driever, W., Martial, J.A., and Peers, B. 2012. Fast homozygosity mapping and identification of a Zebrafish ENU‐induced mutation by whole‐genome sequencing. PLoS ONE 7:e34671.
	Wang, K., Li, M., and Hakonarson, H. 2010. ANNOVAR: Functional annotation of genetic variants from high‐throughput sequencing data. Nucleic Acids Res. 38:e164.
	Zuryn, S., Le Gras, S., Jamet, K., and Jarriault, S. 2010. A strategy for direct mapping and identification of mutations by whole‐genome sequencing. Genetics 186:427‐430.
Key Reference
	Bowen, et al., 2012. See above.
	The protocol described here is based on the technique developed by the authors of this paper. More detailed information about the limitations of the technique, for example minimal number of reads needed for mapping, as well as how many potential candidate mutations can be expected to be identified, can be found in this paper.
Internet Resource
	http://seqanswers.com/wiki/Software/list
	See for useful Internet links and resources. An extensive list of algorithms used in next‐generation sequence analysis software can be found at the URL above.