
Introduction to Nucleic Acid Sequence Databases
- There are three major sites for finding information about
nucleic acids (DNA and/or RNA sequences) on the Web, and all of them contain
basically the same information. The methods and databases that you
will want to use will depend mainly on how much data you want and
in what form.
- GenBank is
your best bet for most sequence searches; it is updated daily, has
detailed online help, and lets you do keyword searches of an organism's or
enzyme's name to get sequence information. This service can be very slow
during peak hours, however.
- EMBL (the
European Molecular Biology Laboratory) is a flat-file database that isn't
quite as easy to use as GenBank, and is usually slow for people in North
America since it's based in Europe, but can be useful if you're looking
for a limited amount of data and when you are not trying to identify a
gene by sequence analysis.
- DDBJ (the DNA Databank of
Japan) is hard for beginners to use, but it is best for people who would
prefer a Japanese-language interface.
- Within GenBank and similar databases, use
BLAST (Basic
Local Alignment Search Tool) if you wish to find what sequences
are similar to a sequence that you already have. If you want to
locate Expressed Sequence Tags ("single-pass" cDNA sequences), use NCBI's
dbEST; if you
want to locate Sequence Tagged sites, use dbSTS.
- Another option is Entrez, which
lets you do keyword searches to retrieve citations and records in the
area of molecular biology from the databases of the National Center for Biotechnology
Information and nucleotide sequences (in both text and graphical format)
from GenBank.
- For more information about online nucleic acid databases, have a look at
George Church's excellent
summary.