Petascale computational tools could revolutionize understanding of and provide deeper insight in genomic evolution

Atlanta 17 November 2009Technological advances in high-throughput DNA sequencing make determining how living things are related possible by analyzing the ways in which their genes have been re-arranged on chromosomes. However, inferring these evolutionary relationships from re-arrangement events requires massive computing impossible even on the most advanced computing systems available today. A four-year $1 million project, funded by the National Science Foundation's PetaApps programme, aims to develop computational tools that will use next-generation petascale computers to understand genomic evolution. A team of universities received the grant, including the Georgia Institute of Technology, the University of South Carolina and the Pennsylvania State University. The funding is part of the American Recovery and Reinvestment Act.

Advertisement

"Genome sequences are now available for many organisms, but making biological sense of the genomic data requires high-performance computing methods and an evolutionary perspective, whether you are trying to understand how genes of new functions arise, why genes are organized as they are in chromosomes, or why these arrangements are subject to change", stated lead investigator David A. Bader, professor, Computational Science and Engineering Division, Georgia Tech's College of Computing.

Even on today's fastest parallel computers, it could take centuries to analyze genome re-arrangements for large, complex organisms. So, the research team - which also includes Jijun Tang, associate professor, department of computer science and engineering, University of South Carolina, and Stephen Schaeffer, associate professor of biology, Penn State - is focusing on future generations of petascale machines, which will be able to process more than a thousand trillion calculations per second. Today, most personal computers can only process a few hundred thousand calculations per second.

The researchers plan to develop new algorithms in an open-source software framework that will use the capabilities of parallel, petascale computing platforms to infer ancestral re-arrangement events. The starting point to develop these new algorithms will be GRAPPA, an open-source code co-developed by David A. Bader and initially released in 2000 that reconstructed the evolutionary relatedness among species.

"GRAPPA is currently the most accurate method for determining genome re-arrangement, but it has only been applied to small genomes with simple events because of the limitation of the algorithms and the lack of computational power", explained David A. Bader, who is also executive director of high-performance computing at Georgia Tech. On a dataset of a dozen bellflower genomes, the latest version of GRAPPA determined the flowers' evolutionary relatedness one billion times faster than the original implementation that did not utilize parallel processing or optimization.

The researchers will test the performance of their new algorithms by analyzing a collection of fruit fly genomes. "Fruit flies - formally known as Drosophila - are an excellent model system for studying genome rearrangement because the genome sizes are relatively small for animals, the mechanism that alters gene order is reasonably well understood and the evolutionary relationships among the 12 sequenced genomes are known", stated Stephen Schaeffer.

The analysis of genome re-arrangements in Drosophila will provide a relatively simple system to understand the mechanisms that underlie gene order diversity, which can later be extended to more complex mammalian genomes, such as primates. The researchers believe these new algorithms will make genome re-arrangement analysis more reliable and efficient, while potentially revealing new evolutionary patterns. In addition, the algorithms will enable a better understanding of the mechanisms and rate of gene rearrangements in genomes, and the importance of the re-arrangements in shaping the organisation of genes within the genome.

"Ultimately this information can be used to identify micro-organisms, develop better vaccines, and help researchers better understand the dynamics of microbial communities and biochemical pathways", added David A. Bader.


Source: Georgia Institute of Technology

[Medical IT News][Calendar][Virtual Medical Worlds Community][News on Advanced IT]