Dr. Dennis Newns from IBM's Computational Biology Group at the T.J. Watson Laboratory in New York, visited the University of Cambridge, UK, on October 19th. He was touting for scientific collaboration especially with scientists at the Sanger centre, named after the British researcher Frederick Sanger, 1980 Nobelprize winner for his 1977 sequencing technique work, and responsible for the Human Genome Project in England. Dr. Newns gave a seminar with the title: "The Blue Gene Petaflop Supercomputer Project, early milestones and Science Challenges".
The lecture used as a starting point the Human Genome Project and the new research avenues that are opened by it. One of these is the study of protein structures. This can take two forms: data mining of the DNA mapping plus experiments or more daringly a frontal attack by direct computer simulation. This includes the simulation of both ion channels and protein functions as, for example, membrane transmissions. This is an exciting new development in biotechnology with enormous lucrative business potential. One should note that protein malfolding is highly toxic to life. One example of this is the mad cow disease, which devastated the beef industry in the United Kingdom.
When one looks at the computational aspects of protein folding using a free energy funnel, which allows dealing with a small section of space rather than all configurations, it still requires ten to the power of fifteen instructions per second, that is a Petaflop/s computer. IBM is not known to have super fast processors, only Power 3 and next year Power 4, which are an order of magnitude slower than the proprietary processors produced by Japanese vendors. A good example is NEC with its SX5 processor technology adapted and used in the 40 Teraflop/s Japanese Earth Simulator. So how does IBM hope to deliver a Petaflop/s machine with this type of technology in the next four years?
According to Dennis Newns, the design of the processor has been more or less completed and likely to be frozen in the next two to three months. The current design envisages a special processor with a constraint instruction set (57 instructions to the normal 256 plus), and limited 4 Mbit DRAM memory on the chip. Each chip will house 32 processors, and in addition to the DRAM memory, it will have a small amount of fast SRAM memory for data staging allowing for two instructions per clock cycle. Each processor thread will have a 2 nanosecond latency but since there are 8 threads running in parallel, the latency will be amortised so that each chip will have a 32 Gflop/s peak performance. Even with this performance on a chip, one needs 32 thousand chips to get a Petaflop/s rate. You will need a very large computer room full of computer racks and at least 2 MWatts of electrical power supply.
The architecture chosen uses a cube with a 1 Gbit link, reminiscent to the INMOS transputer. It has 1 GByte bandwidth which according to preliminary simulations should be sufficient for this particular protein folding application. There is no reason to doubt that Blue Gene will be build. The question is how to keep system integrity with 32 thousand chips. This is no mean feat since any chip failure will require connection re-routing and re-balance of atoms. One proposal is to mirror the calculations and also perform frequent check points comparing the results for every time step.
Assuming that the system failure rate is infrequent so that it remains stable enough to get results, how much of the peak will be translated into sustain performance is currently any ones guess. At present, IBM is of course doing simulations which should tell whether the chip will work or not. It is the size of the machine which is the biggest unknown. Note that for N chips, theory requires communication speeds to increase NlogN to keep pace, so the communication bottleneck will reduce performance at least an order of magnitude unless some way is found to amortise this.
IBM and some of their collaborators in various universities have been working on smaller problems to establish whether protein structure stability is sensitive to force field and whether folding rate depends on topological complexity of fold. The results from the few simulations on folding dynamics of small peptides to date are very positive. At present, to check stability of fold, they use umbrella sampling calculations for force fields and even this restricted method for a 36 residue protein with ten to the power of eight time steps required 3 months of dedicated computing on a 256 Node Cray T3E. The Blue Gene project is expected to improve on this, folding an 80 residue protein with ten to the power eleven time steps in 3 months. The insights gained in understanding the mechanisms controlling bio-systems has a great potential for the design of a plethora of new products spanning the agriculture industry to the field of medicine.
Finally, Dr. Newns stated that the Genome project has opened an enormous new field in bio-informatics and in 50 years from now, those alive will view current research in the same light as we view a UNIVAC computer of the late 1950's and compare it to present Teraflop/s machines. It is also the fastest growth business around with lucrative opportunities for computer vendors to deliver the essential modelling systems, for designing the new biotechnology based products. A number of companies are already actively involved, such as Celera Genomics founded by Dr. Venter, raising fears and fanning an ethical debate about the "ownership" of humanity's genetic heritage.
In Europe, in addition to the Human Genome Project's participants, many genomic research projects received support from the European Union. For example, the Quality of Life programme funds genomic research concerning human complaints, such as cancer, infectious diseases, inherited deafness, autism, muscular dystrophy and so on. Other projects focus on genomic tools for developing diagnosis and treatment methods.
Copyright: Christopher Lazou, Managing Director, HiPerCom Consultants Ltd.