Clustering techniques are used to arrange genes in some natural way, that is, to organize genes into groups or clusters with similar behavior across relevant tissue samples (or cell lines). These techniques can also be applied to tissues rather than genes. Methods such as hierarchical agglom ...
Fixed-parameter algorithms can efficiently find optimal solutions to some computationally hard (NP-hard) problems. This chapter surveys five main practical techniques to develop such algorithms. Each technique is circumstantiated by case studies of applications to biolo ...
This chapter illustrates the use of the combinatorial optimization models presented in Chapter 19 for the Feature Set selection and Gene Ordering problems to find genetic signatures for diseases using micro-array data. We demonstrate the quality of this approach by using a microarray da ...
The aim of this chapter is to present combinatorial optimization models and techniques for the analysis of microarray datasets. The chapter illustrates the application of a novel objective function that guides the search for high-quality solutions for sequential ordering of expres ...
The approach termed Determination and Mapping of Activity-Specific Descriptor Value Ranges (MAD) is a conceptually novel molecular similarity method for the identification of active compounds. MAD is based on mapping of compounds to different (multiple) activity class-selec ...
The introduction of molecular similarity analysis in the early 1990s has catalyzed the development of many small-molecule-based similarity methods to mine large compound databases for novel active molecules. These efforts have profoundly influenced the field of computer-aid ...
Diseases with complex inheritance are characterized by multiple genetic and environmental factors that often interact to produce clinical symptoms. In addition, etiological heterogeneity (different risk factors causing similar phenotypes) obscure the inheritance pa ...
Gene expression profiling using micro-arrays is a modern approach for molecular diagnostics. In clinical micro-array studies, researchers aim to predict disease type, survival, or treatment response using gene expression profiles. In this process, they encounter a series of obsta ...
Phylogenetic profiles describe the presence or absence of a protein in a set of reference genomes. Similarity between profiles is an indicator of functional coupling between gene products: the greater the similarity, the greater the likelihood of proteins sharing membership in the same ...
The idea behind the gene neighbor method is that conservation of gene order in evolutionarily distant prokaryotes indicates functional association. The procedure presented here starts with the organization of all the genomes into pairs of adjacent genes. Then, pairs of genes in a genome of ...
Analysis of amino acid sequences from different organisms often reveals cases in which two or more proteins encoded for separately in a genome also appear as fusions, either in the same genome or that of some other organism. Such fusion proteins, termed Rosetta stone sequences, help link dispara ...
Modern molecular biology approaches often result in the accumulation of abundant biological sequence data. Ideally, the function of individual proteins predicted using such data would be determined experimentally. However, if a gene of interest has no predictable function or if the a ...
The revolution in high throughput biology experiments producing genome-scale data has heightened the challenge of integrating functional genomics data. Data integration is essential for making reliable inferences from functional genomics data, as the datasets are neither e ...
Identifying and analyzing components of complexes is essential to understand the activities and organization of the cell. Moreover, it provides additional information on the possible function of proteins involved in these complexes. Two bioinformatics approaches are usually ...
High throughput methodologies have increased by several orders of magnitude the amount of experimental microarray data available. Nevertheless, translating these data into useful biological knowledge remains a challenge. There is a risk of perceiving these methodologies as m ...
Finding the regulatory mechanisms responsible for gene expression remains one of the most important challenges for biomedical research. A major focus in cellular biology is to find functional transcription factor binding sites (TFBS) responsible for the regulation of a downstream ...
The significant expansion in protein sequence and structure data that we are now witnessing brings with it a pressing need to bring order to the protein world. Such order enables us to gain insights into the evolution of proteins, their function, and the extent to which the functional repertoire can ...
The systematic study of proteins and protein networks, that is, proteomics, calls for qualitative and quantitative analysis of proteins and peptides. Mass spectrometry (MS) is a key analytical technology in current proteomics and modern mass spectrometers generate large amounts of ...
A fundamental problem in molecular biology is the prediction of the three-dimensional structure of a protein from its amino acid sequence. However, molecular modeling to find the structure is at present intractable and is likely to remain so for some time, hence intermediate steps such as pred ...
Protein structure prediction has matured over the past few years to the point that even fully automated methods can provide reasonably accurate three-dimensional models of protein structures. However, until now it has not been possible to develop programs able to perform as well as human ex ...