In high-throughput sequencing experiments, the number of reads mapping to a genomic region, also known as the “coverage” or “coverage depth,” is often used as a proxy for the abundance of the underlying genomic region in the sample. The abundance, in turn, can be used for many purposes including calli ...
The development of high-throughput sequencing technologies has revolutionized the way we study genomes and gene regulation. In a single experiment, millions of reads are produced. To gain knowledge from these experiments the first thing to be done is finding the genomic origin of the reads, i ...
Genome sequencing centers are flooding the scientific community with data. A single sequencing machine can nowadays generate more data in one day than any existing machine could have produced throughout the entire year of 2005. Therefore, the pressure for efficient sequencing data com ...
The dramatic fall in the cost of DNA sequencing has revolutionized the experiments within reach in the life sciences. Here we provide an introduction for the domains of analyses possible using high-throughput sequencing, distinguishing between “counting” and “reading” applicati ...
Detection of reverse transcriptase termination sites is important in many different applications, such as structural probing of RNAs, rapid amplification of cDNA 5′ ends (5′ RACE), cap analysis of gene expression, and detection of RNA modifications and protein–RNA cross-links. The thr ...
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is invaluable for identifying genome-wide binding of transcription factors and mapping of epigenomic profiles. We present a statistical protocol for analyzing ChIP-seq data. We describe g ...
Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) offers a powerful means to study transcription factor binding on a genome-wide scale. While a number of advanced software packages have already become available for identifying ChIP-seq-binding sites, it has ...
RNA sequencing (RNA-Seq) has emerged as a powerful and increasingly cost-effective technology for analysis of transcriptomes. RNA-Seq has several significant advantages over gene expression microarrays, including its high sensitivity and accuracy, broad dynamic range, nuc ...
Estimating genetic variance is traditionally performed using pedigree analysis. Using high-throughput DNA marker data measured across the entire genome it is now possible to estimate and partition genetic variation from population samples. In this chapter, we introduce methods ...
Within this chapter we introduce the basic PLINK functions for reading in data, applying quality control, and running association analyses. Three worked examples are provided to illustrate: data management and assessment of population substructure, association analysis of a qua ...
In this chapter we describe methods for statistical analysis of GWAS data with the goal of quantifying evidence for genomic effects associated with trait variation, while avoiding spurious associations due to evidence not being well quantified or due to population structure. Single ma ...
This chapter overviews the quality control (QC) issues for SNP-based genotyping methods used in genome-wide association studies. The main metrics for evaluating the quality of the genotypes are discussed followed by a worked out example of QC pipeline starting with raw data and finishing w ...
Using relational databases to manage SNP datasets is a very useful technique that has significant advantages over alternative methods, including the ability to leverage the power of relational databases to perform data validation, and the use of the powerful SQL query language to export d ...
In this chapter we describe a novel Bayesian approach to designing GWAS studies with the goal of ensuring robust detection of effects of genomic loci associated with trait variation. The goal of GWAS is to detect loci associated with variation in traits of interest. Finding which of 500,000—1,000 ...
A good understanding of the design of an experiment and the observational data that have been collected as part of the experiment is a key pre-requisite for correct and meaningful preparation of field data for further analysis. In this chapter, I provide a guideline of how an understanding of the fie ...
Genomic selection can have a major impact on animal breeding programs, especially where traits that are important in the breeding objective are hard to select for otherwise. Genomic selection provides more accurate estimates for breeding value earlier in the life of breeding animals, gi ...
Typical methods of analyzing genome-wide single nucleotide variant (SNV) data in cases and controls involve testing each variant’s genotypes separately for phenotype association, and then using a substantial multiple-testing penalty to minimize the rate of false positives. This ...
Higher order interactions are known to affect many different phenotypic traits. The advent of large-scale genotyping has, however, shown that finding interactions is not a trivial task. Classical genome-wide association studies (GWAS) are a useful starting point for unraveling the g ...
This chapter describes how to use the R package ‘MDR’ to search and identify gene–gene interactions in high-dimensional data and illustrates applications for exploratory analysis of multi-locus models by providing specific examples.
Genome-wide association studies (GWASs) and other high-throughput initiatives have led to an information explosion in human genetics and genetic epidemiology. Conversion of this wealth of new information about genomic variation to knowledge about public health and human biolo ...