The detection of ancestral relationships among protein and DNA sequences no longer seems impossible thanks to a new Web engine based on artificial intelligence (AI) as used in speech recognition systems. Research teams at the University of California (UCSD) and the Supercomputer Center (SCSD), which are both located in San Diego, have developed Meta-MEME, a computational tool, to support biologists in their search for evolutionary connections in the enormous stream of protein and DNA sequence data, arising from the Human Genome Project and related sequencing efforts. Meta-MEME explores the hidden language of similar motifs in evolutionary fingerprints on the protein or DNA codes to define the members that belong to the same sequence family.
The research team has installed the Meta-MEME software on a Sun Microsystems Enterprise Server 10000 at SDSC. The programme is designed by William Grundy and Charles Elkan from the Computer Science and Engineering Department of Irwin and Joan Jacobs School of Engineering at UCSD and by Timothy Bailey at SDSC. Their scientific work is funded by the National Biomedical Computation Resource of SDSC, UCSD, and the Scripps Research Institute. Molecular biologists from all over the world can access the Meta-MEME programme via the Web, using the computational power available at SDSC, to compare families of evolutionarily related DNA or protein sequences.
The biological scientist first has to submit a family of similar DNA or protein sequences. The Meta-MEME software automatically starts with the analysis. Once the procedure completed, the user receives four series of outcome by e-mail, consisting of the statistical model, the alignments that display all the common characteristics in the sequences, a second type of alignment which shows how the various sequences are related to one another, and the results provided by the search in a huge sequence database with use of the model. By means of the statistical model and the different analyses, the scientist is able to detect evolutionary family trees and to discover previously unknown protein functions as well as unsuspected relationships between species.
The method applied by the Meta-MEME programme to unveil the most subtle evolutionary relationships between the genetic sequences proceeds from the machine learning techniques in artificial intelligence processes. The same technology is being used in the production of commercial speech recognition systems. The strong performance of these relatively inexpensive and popular tools depends on a class of statistical models, referred to as hidden Markov models (HMMs). As a matter of fact, biological sequences, just like speech, can be analysed with HMMs. The different sets of nucleotides or amino acids form the biological speech code which has to be deciphered by the Meta-MEME models.
The strength of the Meta-MEME hidden Markov models lies in their power to apply probabilistic reasoning in order to recognize the common ancestors in biological sequences by concentrating on the evolutionary fingerprints or motifs. The motif should be considered as a short "word" in the protein or DNA code. The motif can be found in a similar form in almost all of the members in a specific sequence family. Whenever it is discovered in a remotely related sequence, this can form an evidence for the researcher that such a particular region in the ancestral sequence hardly ever changed over millions of years in the evolution and therefore is essential for the protein's proper functioning.
The Meta-MEME programme is particularly useful to accurately predict this kind of functional relationships between biological sequences. An expensive wet lab experiment to verify the ancestral relationship only has to be carried out after the Meta-MEME software models have searched all the publicly available databases for unannotated genetic material which possibly could yield distant evolutionary relatives of the protein or DNA sequence. Users can submit sequences for analysis to the Meta-MEME software via the Web. For more information on hidden relationships between proteins, please read the related article Supercomputer database reveals hidden relationship with entirely different functions in the VMW October 1998 issue.