Primeur/EnterTheGrid - Virtual Medical Worlds Magazine
(PV) The human genome project has been concluded. The
computational genomics initiative seems to be the next step to discover the secrets behind protein structures and functions. It has already been stated that even the next generation of teraFLOP computers will not be sufficient to model all levels of cellular interaction. Which level of simulation can be achieved exactly with the current computational power at the National Energy Research Scientific Computing Center (NERSC)?
Manfred Zorn (MZ) While the Human Genome Project is officially
concluded, there are a number of things that still lie ahead of us.
The current annotation of the genome, i.e., identification of genes
in the genome, is still largely unfinished. About 40% of the
predicted genes do not have any significant matches with other
sequences in the database. Without such matches we cannot assign even
putative functions for these genes and their proteins. Even the 60%
where we are reasonably confident about at least a categorical
assignment, can be expected to be modified and reassigned in the
coming months and years.
Alternative gene structures allow for the production of more than a
single protein from the same gene, thus expanding the modest number
of 30,000 to 40,000 genes in the human genome.
The level of simulation possible with e.g. the NERSC computer depends
in large measure on your assumptions. Are you trying to thread a
sequence into an existing structure? Are you modelling the folding and
unfolding of a protein with coarse approximations or are you trying
to calculate the optimal energy configuration from first principles.
Just adding a thin layer of water surrounding your simulated protein
molecule increases the number of components and their interactions
dramatically. The number of water molecules in sixteenth of a litre
1/16 (62 ml) is close to 68xE23 (68 with twenty three zeros). Even a
thin water film has thousands of molecules. Thorough calculations
could easily use up PetaFLOPs to simulate a protein over a portion of
its life time.
(PV) Dr. Richard M. Satava in his introductory keynote to the "Medicine Meets Virtual Reality 2001" Proceedings describes interdisciplinary research at the intersection of two different sciences - e.g. between biology and information; biology and physics; or information and physics - as the new hallmark of the Bio-Intelligence Age. To this pioneering era which will succeed the Information Age, the author expects a large contribution from the biotechnology, bioinformatics, and biocomputation areas. Can you give us a realistic picture of the degree of interdisciplinarity in your department today?
(MZ)
Leroy Hood termed the phrase that "Biology is an Information
Science". Increasingly that fact is reflected in the labs and companies.
At Celera Genomics the genome annotation effort is headed jointly by
a biologist, a bioinformaticist, and a computer scientist. The day to
day sequencing operation is quite evenly split between biology and
computer technicians. A major driving force in other genome centres
is the availability and access to interdisciplinary staff. In CBCG,
the Center for Bioinformatics and Computational Genomics at LBNL, the
staff consists of computer scientists, biochemists, and
biophysicists, and we're physically located in a biology building to
foster and drive collaborations.
(PV)
In the United Kingdom, Professor Warwick, a cybernetics specialist from the
University of Reading, this summer will have a silicon chip implanted into
his left arm where it will be connected to the nervous system to
communicate with his brain and analyse brain signals that are associated
with motion and pain. Dr. Satava also presents a visionary view of the
cellular cyborg as the result of a profound collaboration between
biologists, physicists and informatics scientists. Which role are
bioinformatics and biocomputation actually playing in the artificial
perfection of the human body and in its psychological and physical
functioning?
(MZ)
It's still a long journey to simulate entire organs from individual
molecular components. In many cases we don't even know all the
components that interact in complex ways to achieve a functioning
cell, let alone organs. However bioinformatics and biocomputation are
already playing major roles in simulating for example the functions
of the human heart or complex diseases like diabetes. These examples
are used to study signal flow and outcomes in clinical settings, not
necessarily to improve the human body. Humans are the result of a
very, very long evolutionary process and quite well adapted to thrive
in the strangest places on earth. For many of the shortcomings we
developed tools and gadgets. Given the widely differing life
expectancies between humans and computers, I'd rather by a new PDA or
cell phone than live with an implanted chip. But that's just my
personal preference.
(PV)
The European Commission has just begun negotiations with the European
Bioinformatics Institute (EBI) and a number of other European research
centres over a 19.4 million euro contract for genomics research. Last
year, Dr. Michael Ashburner, Joint Head of the EBI, denounced the fact that
the EBI was judged ineligible for funding under the current European
Framework Programme V, whereas in the USA there is a strong commitment to funding both national institutions and academic groups for genomics
research. How well are funding relationships between your institution and
the Department of Energy and do you have additional funding resources?
(MZ)
Genomics research encompasses two different meanings. One, it is
research about the genome of an organism. Since most genomes are
encoded in fairly long sequences, it takes a lot of computing to
study them. But genomics research also means research on a
genome-wide scale. Instead of studying one gene at a time, in
genomics we study thousands of genes and their reactions at once.
That has profound consequences on how data are generated,
manipulated, and analysed. Both components drive the development of
large genomics centres. This is a development similar to the course
of high-energy physics over the past 30 years where scientists gather
around large, expensive instruments to do experiments and analyse the
results on equally large computing centres.
Only very few biology labs currently have the compute power let alone the networking and data storage capabilities to support sequencing and annotating a genome. The raw output of a typical large scale sequencing lab is in the tens of gigabytes per day without any data analysis or
annotation. Annotation can increase that amount 10 fold or more,
depending on how thorough your annotation process is.
In the United States is a strong commitment to genome research throughout all agencies. The National Science Foundation has a $192Million ITR (Information Technology Research) programme where biocomputing is a major component. In the coming years it is expected that an increasing fraction will be devoted to health and biology related information technology
research. The Department of Energy, which got the ball rolling in the
Human Genome Project, is developing a Genome to Life initiative that
will attempt to bridge the gap between the genome sequence and the
functioning of cells and organisms. The National Institutes of Health
enjoy healthy budget increases, even DARPA, the Defense Advanced
Research Projects Agency, which funded the initial development of the
Internet (or rather it's smaller precursor) recognises biocomputing
as an opportunity with potentially large payoff.
Read further on the following web sites: