In molecular biology, new sequences are added to the existing ones on a daily basis. It is not enough to sequence a gene or a protein and just add it to the database.
It is important to analyze it. Analysis refers to the comparison of the new sequence with those of others, already stored in the database and finding sequences similar to the new sequence in question.
The new sequence, whose similar sequences are searched in the database, is known as the query sequence and the search as sequence similarity search.
Information on similar sequences can be used to predict the structure and function of the query sequence. Comparison of a query sequence with those of the database is known as alignment.
Algorithms and degree of similarity
Best alignment between two sequences is calculated through computational methods called dynamic programming algorithms.
An algorithm is a logical sequence of steps, by which an alignment task is performed. There are two such algorithms for sequence similarity search: (1) Smith-Waterman algorithm and (2) Needleman-Wunsch algorithm. Smith-Waterman algorithm finds local similarity i e, it covers only a small part of both sequences, while, Needleman- Wunsch algorithm finds global similarity i e, it covers as much of the two sequences as possible.
The degree of similarity is calculated by the alignment score. This is given by the number of gaps subtracted from the number of similar positions. Another simple pictorial comparison of similarity between two sequences is dot plot. It a table or matrix, where the rows correspond to the residues of one sequence, while the columns to the other.
The positions are left blank if the residues are different and filled if there is a similarity. Stretches of similarity are shown as diagonals (North-West and South-East).
However, with the advancement in computer programming and software technology, faster and more accurate methods for database searches are available. Two types of sequence alignments are in practice: (1) Pair-wise alignment and (2) Multiple alignments.
Pair-wise sequence alignmen
In this sequence alignment method, two sequences are compared for the degree similarity. Two such search tools are: (1) BLAST (Basic Local Alignment Search Tool) and (2) FASTA.
BLAST is developed and maintained at NCBI, while EASTA at the European Bioinformatics Institute. Both the software packages work for nucleic acid and protein sequences and rely on the E (expect) value on similarity.
BLAST and FASTA alignment programmes enlisted in the table can measure small degree of similarity between two closely related sequences.
These fail to do so, when the sequences are more divergent. PSI-BLAST is an alternative in this situation. It is an iterative (repetitive) BLAST search. In the first step, the query sequence is searched by performing BLAST.
In the second step, each of the hits (similar sequences) above a cut-off E value is BLAST-searched. This process is repeated until; no more significant sequence similarities are detected.
Multiple sequence alignment
Multiple sequence alignment is performed between two or more divergent sequences. In this case, a significant number of residues are dissimilar.
Therefore the conserved sequences (sequences, which have not undergone changes through evolution) are considered for the degree of similarity. Multiple alignments reveal clues about protein structure and function and family relationships among divergent groups of animals and plants.