The best approach for creating libraries of functional proteins with large numbers of nondisruptive amino acid substitutions is protein recombination, in which structurally related polypeptides are swapped among homologous proteins. Unfortunately, as more distantly rel ...
The analysis of genome-wide association studies (GWAS) poses statistical hurdles that have to be handled efficiently in order for the study to be successful. The two largest impediments in the analysis phase of the study are the multiple comparisons problem and maintaining robustness ag ...
Gene expression profiling technique now enables scientists to obtain a genome-wide picture of cellular functions on various human disease mechanisms which has also proven to be extremely valuable in forecasting patients’ prognosis and therapeutic responses. A wide range of multi ...
High-throughput technologies can routinely assay biological or clinical samples and produce wide data sets where each sample is associated with tens of thousands of measurements. Such data sets can be mined to discover biomarkers and develop statistical models capable of predicting ...
With advancing of modern technologies, high-dimensional data have prevailed in computational biology. The number of variables p is very large, and in many applications, p is larger than the number of observational units n. Such high dimensionality and the unconventional small-n-large ...
Hidden Markov models have wide applications in pattern recognition. In genome sequence analysis, hidden Markov models (HMMs) have been applied to the identification of regions of the genome that contain regulatory information, i.e., binding sites. In higher eukaryotes, the regulato ...
In molecular biology, we are often interested in determining the group structure in, e.g., a population of cells or microarray gene expression data. Clustering methods identify groups of similar observations, but the results can depend on the chosen method’s assumptions and starting para ...
The support vector machine is a supervised learning technique for classification increasingly used in many applications of data mining, engineering, and bioinformatics. This chapter aims to provide an introduction to the method, covering from the basic concept of the optimal separa ...
Microarray experiments have become routine in the past few years in many fields of biology. Analysis of array hybridizations is often performed with the help of commercial software programs, which produce gene lists, graphs, and sometimes provide values for the statistical significan ...
This chapter describes methods for learning gene interaction networks from high-throughput gene expression data sets. Many genes have unknown or poorly understood functions and interactions, especially in diseases such as cancer where the genome is frequently mutated. The gene in ...
The rapid advances in biotechnology have given rise to a variety of high-dimensional data. Many of these data, including DNA microarray data, mass spectrometry protein data, and high-throughput screening (HTS) assay data, are generated by complex experimental procedures that involve ...
Epigenetics is the study of heritable change other than those encoded in DNA sequence. Cytosine methylation of DNA at CpG dinucleotides is the most well-studied epigenetic phenomenon, although epigenetic changes also encompass non-DNA methylation mechanisms, such as covalent hi ...
Genetic linkage analysis has been a traditional means for identifying regions of the genome with large genetic effects that contribute to a disease. Following linkage analysis, association studies are widely pursued to fine-tune regions with significant linkage signals. For compl ...
Sample size calculation is a critical procedure when designing a new biological study. In this chapter, we consider molecular biology studies generating huge dimensional data. Microarray studies are typical examples, so that we state this chapter in terms of gene microarray data, but the d ...
Many biomedical applications are concerned with the problem of selecting important predictors from a high-dimensional set of candidates, with the gene expression data as one example. Due to the fact that the sample size in any single study is usually small, it is thus important to combine inform ...
Large-scale sequencing, copy number, mRNA, and protein data have given great promise to the biomedical research, while posing great challenges to data management and data analysis. Integrating different types of high-throughput data from diverse sources can increase the statisti ...
Cardiovascular disease, metabolic syndrome, schizophrenia, diabetes, bipolar disorder, and autism are a few of the numerous complex diseases for which researchers are trying to decipher the genetic composition. One interest of geneticists is to determine the quantitative trait l ...
A majority of original articles published in biomedical journals include some form of statistical analysis. Unfortunately, many of the articles contain errors in statistical design and/or analysis. These errors are worrisome, as the misuse of statistics jeopardizes the process of s ...
This chapter is an introductory reference guide highlighting some of the most common statistical topics, broken down into both command-line syntax and graphical interface point-and-click commands. This chapter serves to supplement more formal statistics lessons and expedite u ...
In this chapter, we cover basic and fundamental principles and methods in statistics – from “What are Data and Statistics?” to “ANOVA and linear regression,” which are the basis of any statistical thinking and undertaking. Readers can easily find the selected topics in most introductory stati ...