Haplotypes, as they specify linkage patterns between individual nucleotide variants, confer critical information for understanding the genetics of human diseases. However, haplotype information is not directly obtainable from high-throughput genotyping platforms. In ...
This chapter reviews the rationale for the use of haplotypes in association-based testing, discusses statistical issues related to haplotype uncertainty that complicate the analysis, then gives practical guidance for testing haplotype-based associations with phenotype or ...
The limitations of genome-wide association (GWA) studies that are based on the common disease common variants (CDCV) hypothesis have motivated geneticists to test the hypothesis that rare variants contribute to the variation of common diseases, i.e., common disease/rare variants (C ...
In this chapter, we introduce interaction networks by describing how they are generated, where they are stored, and how they are shared. We focus on publicly available interaction networks and describe a simple way of utilizing these resources. As a case study, we used Cytoscape, an open source and e ...
Populations of ethnic mixtures can be useful in genetic studies. Admixture mapping, or mapping by admixture linkage disequilibrium (MALD), is specially developed for admixed populations and can supplement traditional genome-wide association analyses in the search for genetic ...
Structural equation modeling (SEM) is a multivariate statistical framework that is used to model complex relationships between directly and indirectly observed (latent) variables. SEM is a general framework that involves simultaneously solving systems of linear equations a ...
The analysis of high-throughput genotyping data in genome-wide association (GWA) studies has become a standard approach in genetic epidemiology. Data of high quality are crucial for the success of these studies. The first step in the statistical analysis is the generation of genotypes fr ...
Genome-wide association studies have been made possible because of advancements in the design of genotyping technologies to assay a million or more single nucleotide polymorphisms (SNPs) simultaneously. This has resulted in the introduction of automated and unsupervised stat ...
It has been documented that there exist some errors in most large genotype datasets and that an error rate of 1–2% is adequate to lead to the distortion of map distance as well as a false conclusion of linkage (Abecasis et al. Eur J Hum Genet 9(2):130–134, 2001), therefore one needs to ensure that the data are as clean as p ...
Pedigree relationship errors often occur in family data collected for genetic studies, and unidentified errors can lead to either increased false positives or decreased power in both linkage and association analyses. Here we review several allele sharing, as well as likelihood-based ...
The aim of this chapter is to introduce the reader to commonly used software packages and illustrate their input requirements, analysis options, strengths, and limitations. We focus on packages that perform more than one function and include a program for quality control, linkage, and assoc ...
Cryptic relationships such as first-degree relatives often appear in studies that collect population samples such as the case–control genome-wide association studies (GWAS). Cryptic relatedness not only creates increased type 1 error rate but also affects other aspects of GWAS, su ...
The Hardy–Weinberg principle, one of the most important principles in population genetics, was originally developed for the study of allele frequency changes in a population over generations. It is now, however, widely used in studies of human diseases to detect inbreeding, populations ...
Methods of estimating allele frequencies from data on unrelated and related individuals are described in this chapter. For samples of unrelated individuals with simple codominant markers, the natural estimator of allele frequencies can be used. For genetic data on related individu ...
Gametic phase disequilibrium (GPD) is the nonrandom association of alleles within gametes. Linkage disequilibrium (LD) describes the special case of deviation from independence between alleles at two linked genetic loci. Estimation of allelic LD requires knowledge of haplotyp ...
Beyond calculating parameter estimates to characterize the distribution of genetic features of populations (frequencies of mutations in various regions of the genome, allele frequencies, measures of Hardy–Weinberg disequilibrium), genetic epidemiology aims to identi ...
This chapter describes how the heritability of a trait can be estimated using data collected from pairs of twins. The principles of the classical twin design are described, followed by the assumptions and possible extensions of the design. In the second part of this chapter, two example scripts are ...
The array CGH technique (array comparative genome hybridization) has been developed to detect chromosomal copy number changes on a genome-wide and/or high-resolution scale. Here, we present validated protocols using in-house spotted oligonucleotide libraries for array CGH. Th ...
Recently developed microarray-based copy number measurement assays have drastically improved the accuracy and resolution to which DNA copy number alterations can be detected. As with any microarray assay, those designed to measure genome copy number produce large data sets for each ...
Genome sequencing has revealed the remarkable amount of genetic diversity that can be encountered in bacterial genomes. In particular, the comparison of genome sequences from closely related strains has uncovered significant differences in gene content, hinting at the dynamic na ...