"Cloud computing approaches are likely to change the nature of our national research computing infrastructure in the coming years", stated Principal Investigator Geoffrey Fox, director of the Digital Science Center and associate dean of research and graduate studies in the Indiana University School of Informatics and Computing. "These technologies hold significant promise in the life sciences and medical sciences as they offer the potential for greater computational power and faster speeds at a lower cost, and in a way that is easier for scientists to use than traditional Grid computing approaches."
Technological advances have made medical and biological research increasingly data-rich in recent years - a trend that scientists believe will continue to accelerate. Processing extremely large sets of digital data that result from gene sequencing and other medical research technologies is a significant challenge that generally cannot be met by a single facility or supercomputer.
The project team is developing a software infrastructure that makes use of the substantial hardware and networking investment made by Indiana University and the National Science Foundation in FutureGrid, a national experimental testbed, and TeraGrid, a national network of high performance computing resources. The project will also harness commercial Cloud computing infrastructure such as Amazon Web Services, Microsoft Azure, and other open source software.
"This research is potentially path-breaking", stated Peter Cherbas, professor of biology and director of the Indiana University Center for Genomics and Bioinformatics. Peter Cherbas and other researchers from the Center will be significant contributing partners in the Cloud computing research effort. "Contemporary DNA sequencing machines are churning out data at rates that would have been unimaginable to biologists just a few years ago. To use these data - to turn data into some kind of understanding - will demand good tools for using the Cloud and those tools will impact genomics projects worldwide. We're very excited to be part of this effort."
Cloud computing provides a way to outsource computing infrastructure in order to create virtual supercomputers with greater computational power than can be provided by any one facility. Clouds also support new data parallel technologies used to process massive data sets, such as Google's MapReduce, a software framework to support distributed computing on clusters.
Users of Clouds can access nearly unlimited computational power, created by pooling distributed computational resources, and using simple and straightforward web interfaces. This eliminates the need for users or their institutions to own and maintain large and expensive computational equipment, and also for users to have detailed technical understanding of the computational resources supporting their research. The research team will explore the use of Cloud techniques to overcome current medical computing obstacles such as long computation time and large memory requirements.
In addition to developing new Cloud computing approaches, the research team will partner with several Indiana University life science research teams to apply and test these techniques in their specific areas of life science research. These include projects related to population genomics, an area of science that improves our understanding of evolution and genetic disorders, as well as projects involved in assembling and sequencing gene fragments.
Cloud technologies will also be applied to gene family clustering and the visualization of their structure in three dimensions. The overall goal is to provide a suite of services that will allow the simultaneous processing of many millions of gene samples in the Cloud. Thanks to new sequencing technology, the size of the gene samples processed is expected to be one to two orders of magnitude larger than allowed by current computational capabilities.
Pervasive Technology Institute (PTI) at Indiana University is a world-class organisation dedicated to the development and delivery of innovative information technology to advance research, education, industry, and society. Supported in part by a $15 million grant from the Lilly Endowment Inc., PTI is built upon a spirit of collaboration and brings together researchers and technologists from a range of disciplines and organisations, including the Indiana University School of Informatics and Computing at Bloomington, the Indiana University Maurer School of Law, and University Information Technology Services at Indiana University.
The Center for Genomics and Bioinformatics is a multi-disciplinary research centre serving the Indiana University Bloomington campus. It has three principal missions: to develop and carry out extramurally funded research projects in genomics and bioinformatics; to assist the research programmes of campus faculty by providing access to equipment and expert advice and by participating in long-term collaborations; and to promote interdisciplinary work in genomics and bioinformatics by seminar and workshops.
The Center was created in 2000 by the Office of the Vice President for Research and the College of Arts and Sciences. It has since received regular support from the School of Informatics and the Department of Biology and has also received crucial financial support from two Lilly Endowment Inc. awards to Indiana University: the Indiana Genomics Initiative and the Indiana METACyt Initiative.