Computational Approaches for Gene Identification
互联网
533
Genetics is gaining increasing significance as the discovery of new genes continues to have considerable impact in the field of medical sciences. The Human Genome Project is a multidisciplinary endeavor that aims at learning the identity of every single base stored in the human genome has been ongoing for some time now. The genome stores the blueprints for the synthesis of a variety of proteins the macromolecules that enable an organism to be structurally and functionally viable. The blueprint or the program for the synthesis of a single protein is called a gene, a unit of the DNA sequence that is generally between 1 x 103 -1 x 106 bp in length based upon the complexity of the protein that it codes for. A higher level eukaryote contains as many as 30,000-40,000 genes. It has been estimated that gene coding region accounts only for 10-20% of the genome. The gene identification problem is to recognize these regions from an anonymous sequence of DNA.