Most biological databases consist of long strings of nucleotides (guanine, adenine, thymine, cytosine and uracil) and/or amino acids (threonine, serine, glycine, etc.). Each sequence of nucleotides or amino acids represents a particular gene or protein (or section thereof), respectively. Sequences are represented in shorthand, using single letter designations. This decreases the space necessary to store information and increases processing speed for analysis.
While most biological databases contain nucleotide and protein sequence
information, there are also databases which include taxonomic
information such as the structural and biochemical characteristics of
organisms. The power and ease of using sequence information has
however, made it the method of choice in modern analysis.
In the last three decades, contributions from the fields of biology and chemistry have facilitated an increase in the speed of sequencing genes and proteins. The advent of cloning technology allowed foreign DNA sequences to be easily introduced into bacteria. In this way, rapid mass production of particular DNA sequences, a necessary prelude to sequence determination, became possible. Oligonucleotide synthesis provided researchers with the ability to construct short fragments of DNA with sequences of their own choosing. These oligonucleotides could then be used in probing vast libraries of DNA to extract genes containing that sequence. Alternatively, these DNA fragments could also be used in polymerase chain reactions to amplify existing DNA sequences or to modify these sequences. With these techniques in place, progress in biological research increased exponentially.
For researchers to benefit from all this information, however, two additional things were required: 1) ready access to the collected pool of sequence information and 2) a way to extract from this pool only those sequences of interest to a given researcher. Simply collecting, by hand, all necessary sequence information of interest to a given project from published journal articles quickly became a formidable task. After collection, the organization and analysis of this data still remained. It could take weeks to months for a researcher to search sequences by hand in order to find related genes or proteins.
Computer technology has provided the obvious solution to this problem.
Not only can computers be used to store and organize sequence
information into databases, but they can also be used to analyze sequence
data rapidly. The evolution of computing power and storage capacity has,
so far, been able to outpace the increase in sequence information being
created. Theoretical scientists have derived new and sophisticated
algorithms which allow sequences to be readily compared using
probability theories. These comparisons become the basis for determining
gene function, developing phylogenetic relationships and simulating
protein models. The physical linking of a vast array of computers in the
1970's provided a few biologists with ready access to the expanding pool
of sequence information. This web of connections, now known as the
Internet, has evolved and expanded so that nearly
everyone has access to this information and the tools necessary to
analyze it.
[About BioTech | Home | BioTech Dictionary | BioTech Resources | BioTech Search | BioTech Feedback | Tour ]