Primeur/EnterTheGrid - Virtual Medical Worlds Magazine (PV): The human genome project has been concluded. The computational genomics initiative seems to be the next step to discover the secrets behind protein structures and functions. It has already been stated that even the next generation of teraFLOP computers will not be sufficient to model all levels of cellular interaction. Which level of simulation can be achieved exactly with the current computational power at the National Energy Research Scientific Computing Center (NERSC)?
Manfred Zorn (MZ): While the Human Genome Project is officially concluded, there are a number of things that still lie ahead of us. The current annotation of the genome, i.e., identification of genes in the genome, is still largely unfinished. About 40 percent of the predicted genes do not have any significant matches with other sequences in the database. Without such matches we cannot assign even putative functions for these genes and their proteins. Even the 60 percent where we are reasonably confident about at least a categorical assignment, can be expected to be modified and reassigned in the coming months and years.
Alternative gene structures allow for the production of more than a single protein from the same gene, thus expanding the modest number of 30,000 to 40,000 genes in the human genome.
The level of simulation possible with e.g. the NERSC computer depends in large measure on your assumptions. Are you trying to thread a sequence into an existing structure? Are you modelling the folding and unfolding of a protein with coarse approximations or are you trying to calculate the optimal energy configuration from first principles. Just adding a thin layer of water surrounding your simulated protein molecule increases the number of components and their interactions dramatically. The number of water molecules in sixteenth of a litre 1/16 (62 ml) is close to 68xE23 (68 with twenty three zeros). Even a thin water film has thousands of molecules. Thorough calculations could easily use up PetaFLOPs to simulate a protein over a portion of its life time.
PV: Dr. Richard M. Satava in his introductory keynote to the "Medicine Meets Virtual Reality 2001" Proceedings describes interdisciplinary research at the intersection of two different sciences - e.g. between biology and information; biology and physics; or information and physics - as the new hallmark of the Bio-Intelligence Age. To this pioneering era which will succeed the Information Age, the author expects a large contribution from the biotechnology, bioinformatics, and biocomputation areas. Can you give us a realistic picture of the degree of interdisciplinarity in your department today?
MZ: Leroy Hood termed the phrase that "Biology is an Information Science". Increasingly that fact is reflected in the labs and companies. At Celera Genomics the genome annotation effort is headed jointly by a biologist, a bioinformaticist, and a computer scientist. The day to day sequencing operation is quite evenly split between biology and computer technicians. A major driving force in other genome centres is the availability and access to interdisciplinary staff. In CBCG, the Center for Bioinformatics and Computational Genomics at LBNL, the staff consists of computer scientists, biochemists, and biophysicists, and we're physically located in a biology building to foster and drive collaborations.
PV: In the United Kingdom, Professor Warwick, a cybernetics specialist from the University of Reading, this summer will have a silicon chip implanted into his left arm where it will be connected to the nervous system to communicate with his brain and analyse brain signals that are associated with motion and pain. Dr. Satava also presents a visionary view of the cellular cyborg as the result of a profound collaboration between biologists, physicists and informatics scientists. Which role are bioinformatics and biocomputation actually playing in the artificial perfection of the human body and in its psychological and physical functioning?
MZ: It's still a long journey to simulate entire organs from individual molecular components. In many cases we don't even know all the components that interact in complex ways to achieve a functioning cell, let alone organs. However bioinformatics and biocomputation are already playing major roles in simulating for example the functions of the human heart or complex diseases like diabetes. These examples are used to study signal flow and outcomes in clinical settings, not necessarily to improve the human body. Humans are the result of a very, very long evolutionary process and quite well adapted to thrive in the strangest places on earth. For many of the shortcomings we developed tools and gadgets. Given the widely differing life expectancies between humans and computers, I'd rather by a new PDA or cell phone than live with an implanted chip. But that's just my personal preference.
PV: The European Commission has just begun negotiations with the European Bioinformatics Institute (EBI) and a number of other European research centres over a 19.4 million euro contract for genomics research. Last year, Dr. Michael Ashburner, Joint Head of the EBI, denounced the fact that the EBI was judged ineligible for funding under the current European Framework Programme V, whereas in the USA there is a strong commitment to funding both national institutions and academic groups for genomics research. How well are funding relationships between your institution and the Department of Energy and do you have additional funding resources?
MZ: Genomics research encompasses two different meanings. One, it is research about the genome of an organism. Since most genomes are encoded in fairly long sequences, it takes a lot of computing to study them. But genomics research also means research on a genome-wide scale. Instead of studying one gene at a time, in genomics we study thousands of genes and their reactions at once. That has profound consequences on how data are generated, manipulated, and analysed. Both components drive the development of large genomics centres. This is a development similar to the course of high-energy physics over the past 30 years where scientists gather around large, expensive instruments to do experiments and analyse the results on equally large computing centres.
Only very few biology labs currently have the compute power let alone the networking and data storage capabilities to support sequencing and annotating a genome. The raw output of a typical large scale sequencing lab is in the tens of gigabytes per day without any data analysis or annotation. Annotation can increase that amount 10 fold or more, depending on how thorough your annotation process is.
In the United States is a strong commitment to genome research throughout all agencies. The National Science Foundation has a $192Million ITR (Information Technology Research) programme where biocomputing is a major component. In the coming years it is expected that an increasing fraction will be devoted to health and biology related information technology research. The Department of Energy, which got the ball rolling in the Human Genome Project, is developing a Genome to Life initiative that will attempt to bridge the gap between the genome sequence and the functioning of cells and organisms. The National Institutes of Health enjoy healthy budget increases, even DARPA, the Defense Advanced Research Projects Agency, which funded the initial development of the Internet (or rather it's smaller precursor) recognises biocomputing as an opportunity with potentially large payoff.
Read further on the following Web sites: