A data base / data bank is an archive of sharable information and a logical organization of structural and functional data sets and tools for gaining access to it, as and when necessary. A data base without an effective mode of access is a data grave yard.

A database in molecular biology concerns itself with the sequence structures of nucleic acids and proteins and 3-D structures and functions of proteins.

In the early 1980s sequence information of nucleic acids and proteins started flowing and became abundant in scientific literature. Several laboratories realized the necessity for storing these vast repertoires of data in archives.

Thus was born the concept of data base. Primary sequence data base and structural data base projects took shape in different parts of the world and resulted in establishing nucleic acid and protein sequence data bases and protein structure databases.


Primary sequence databases are archives for raw sequence data, which can be accessed freely on the World Wide Web. There are three primary nucleotide sequence databases comprising the International Nucleotide Sequence Database Collection. The table also shows some primary protein sequence databases.

Secondary sequence databases are derived from primary sources and do not contain raw data. Some secondary nucleic acid databases are dbSTS (klatabase sequence tagged site) and dbEST (database expressed sequence tag). PROSITE is a secondary protein sequence database.

It contains information about the sequence patterns (motifs) among the members of a protein family. Structural databases store structure information on nucleic acids and proteins. Protein Data Bank (PDB) is a single world wide archive of structural data and is maintained by the Research Collaboratory for Structural Bioinformatics (RCSB) at Rutgers University.

The Nucleic Acid Data Bank (NDB) is also maintained there. An equivalent European database to PDB is Macromolecular Structure Database (MSD). In addition, there are some specialized databases. OMIM (Online Mendelian Inheritance in Man) is a comprehensive database of human genes concerned with genetic disorders.


It is maintained by NCBI. A literature database contains abstracts and sometimes the full text and figures of published articles. MEDLINE is one such online library information resource, incorporated into another large resource, PubMed.