The THIRD-BRAIN: The Next Generation of Supercomputer Design Beyond PetaFlop/s - an interview with Steve Chen

Dresden 07 August 2006One of the prominent speakers in Dresden at the ISC2006 Supercomputer Conference was Dr. Steve Chen. Steve Chen designed some of the very successful early Cray Supercomputers in the United States. In the recent years, he designed the SuperBlade systems for the Chinese company, Galactic Computing. But it does not stop there. While everybody else is looking ahead to the first Petaflop/s systems, perhaps next year, Steve Chen is already one step further. His new design, called the THIRD-BRAIN is an architecture far beyond the Petaflop/s system. Emulating and augmenting the real Human Brain, the design draws from a number of disciplines and incorporates this in an innovative approach. In parallel to designing Supercomputers, Steve Chen is also very active in developing leading-edge applications in China. He is working on a project of putting many HPC systems in the different Chinese states and linking them together as a large and highly efficient Integral Utility Supercomputing Grid (IUGS) or Integral Grid. The Grid Supercomputers will be made available as a commercial service on the Internet, allowing highly productive, real-time, interactive and collaborative enterprise or personal applications in science, engineering, commerce, telecommunication, health care, education, media, financial and logistics. The pervasive use of Grid Supercomputers could bring China to the forefront of Supercomputer applications in five years from now. Currently, Steve Chen is also leading two Integral Grid trial projects delivering services in "digital hospital and health care" and "Third-Brain driven learning engine" to underdeveloped rural and farming areas in China.


Primeur: The basic design of the Superblade Supercomputer that you have architected consists of a number of processors, a number of memory and storage units, and a fast interconnect?

Steve Chen: Yes, we used commodity and high volume production components such as processor, memory and storage devices. Infiniband switch is used as the fast end-to-end interconnect. We were the first in HPC industry to use Infiniband. We were also the first to deliver a faster than 1 Tflop/s HPC system using Intel Xeon processors (not Itanium) to achieve superior efficiency.

Primeur: Can SuperBlade be an architecture for, say, the next ten or twenty years?

Steve Chen: Yes, it can be for the next ten years. With the current systems we can already reach 4 Tflop/s using Intel Xeon processors with the highest efficiency in the industry.

Primeur: What is the efficiency of the SuperBlade system?

Steve Chen: We can reach close to 85 percent efficiency using Intel Xeon. Most others can reach 70 percent, 60 percent, or even worse. I do not call those real HPC systems. That is just putting a lot of components together.

Primeur: To what level does it scale? Do you have an idea about that?

Steve Chen: Right now we have built 4 Tflop/s systems with high efficiency. Probably we can scale up to 100 Tflop/s with high efficiency using the current set of components. Next year we expect to achieve 250 Tflop/s with high efficiency.

Primeur: So you can reach Petaflop/s too?

Steve Chen: Yes. What I am doing right now is building systems with one Tflop/s in a rack. In four years' time we will reach the performance level of one Tflop/s per blade.

Primeur: When I am looking at your software stack, called Integral Utility Grid Supercomputing (IUGS), it looks like a kind of virtualization software, with which you can divide the machine in different parts for different applications.

Steve Chen: This is what I call Engine-OS layer which will manage all hardware resources: storage, blade and switch. Most people call it "virtualization" when it happens at this layer. But in fact it is only the first level of virtualization. This is not enough and we did this already years ago. You need to go further up to the application layer before it becomes useful.

Primeur: Did you design that yourself?

Steve Chen: Yes, we did the core design of this layer ourselves. For some other components such as Network-OS layer we have partners. We just focus on some key parts. The other parts we can get from partners.

Primeur: But the key pieces can be found around the whole chain?

Steve Chen: Yes, you have to control all the layers and tuning and optimize them during run time. Otherwise it will not work efficiently and you do not get the best performance.

Primeur: When a number of applications are running on the machine, does the system then decide on the fly what they need?

Steve Chen: When you run, for example, a health care application and an education application side-by-side, it can predict in real time which application needs what resource and propagate this information from top to bottom to mobilize the necessary resources, making them instantly available. In other words, it can provide Compute-On-Demand, Bandwidth-On-Demand and Knowledge-On-Demand instantaneously. When I have an Integral Grid, I can run Video-on-Demand and IPTV applications on one side of the Grid, and health care, education or financial services on the other side. After midnight, the whole Grid can be running Weather Forecasting.

Primeur: Today the general opinion is that if you have Weather Forecasting you need a capability machine and if you do Image Analysis you need a capacity machine. Do you not need separate machines for different application classes?

Steve Chen: No, we can detect which application is running and what it needs. One can need a capacity and another one a capability type of resources in the upper level of my software stack.

Primeur: I can understand that technically: if you have a capability machine you can run capacity jobs, too. But it is more expensive than using a capacity machine. Is that true with your machine, too?

Steve Chen: Not necessarily. That is why we have a single cost-effective but very balanced architecture that can suit intensive I/O, or intensive data, or intensive computation: we can decompose the system anyway we want. So when we detect, for instance, that we need to do more capacity or we need more capability from the continuous self-learning intelligent analysis, we just bring those pieces together when we need them. When your application is running on the system we are learning. It is just like a Brain. But we need a lot of middleware to do that, linked to built-in hardware monitoring.

Primeur: Is the machine that you have out already doing that?

Steve Chen: To some extent it does, but not to the higher levels. In the lower level it does so statically, not yet dynamically. So, if you say, "Today I want to partition the machine in three different ways with so many processors, so many disks, etc.", we can do it. Yet, to reach the higher levels, we need more intelligence to be integrated into the system. We need machine learning, human-machine interaction, artificial intelligence, intelligent search and pattern recognition.

Primeur: Who is working on these new parts? Is that done in a company, or in a project?

Steve Chen: In a company based in the USA and China, called HCOM, which has established the "THIRD-BRAIN Research Institute" where the core development is done. It is a long term and international effort to design a new generation of Supercomputers beyond Petaflop/s. The overall sponsoring company is called AHA! Ventures based in the USA. They are the one providing the funding as well as collaborating on the development and deployment of large-scale Integral Grid projects worldwide.

Primeur: How is your involvement today in China? Are you participating in projects there?

Steve Chen: I have stepped out of Galactic completely and focused on the "THIRD-BRAIN Research Institute". And in the meantime, I am also working with the government, industries and universities in the actual development and deployment of large-scale healthcare and education applications on the Integral Grid based on the THIRD-BRAIN technologies. The applications will extend to other areas e.g. digital media, financial services, modern logistics, e-government, collaborative commerce, advanced design and R/D in semiconductors, equipments, aerospace, automotive, chemical, biomedical, pharmaceutical, petroleum, weather, material and renewable energy.

Primeur: Does public research work the same way as in Europe with for instance the European Union that provides funding?

Steve Chen: It depends. For example, in the USA we use the AHA! Ventures as the foundation to raise funding for given projects. So these projects will be funded from private ventures. In China you need to work with the Government. The Government will say, "OK, you have the ability. You have proven in the past you can achieve. So now please develop this system with this application". When we work with the Government we only do the commercial side. Projects are all centered around delivering services on the Integral Grid e.g. digital health care, learning and media. The Integral Grid can grow bigger and bigger. So each state could have a 10-100 Tflop/s or even Petaflop/s Supercomputer and we can connect all of them. Each state may have their own applications in Science, Engineering, Commerce, Government, Logistics, Financial Service, Weather Forecasting, Pharmaceutical Research etc. - all delivered as a Service, via our Integral Utility Supercomputing Grid. So, you only pay for what is needed, making supercomputing pervasive and affordable for personal and enterprise.

Primeur: So you only pay for what you really use?

Steve Chen: Exactly. I think that is the big Market. Today, one still tries to sell one software package or one piece of hardware at a time. In the long run, that business model may not work as effective as on the Integral Grid.

Primeur: Another interesting thing you were saying in your presentation is that "only rolling out 20 Tflop/s machines is not enough: you need to learn how to use them too". How is that acceptance curve in China?

Steve Chen: I would say that from the application and software point of view they are still behind. For instance, oil companies today buy hardware and exploration software. But during the next five years they will learn to write some big total application software packages to meet their own special requirements. Especially, when we have deployed a large Integral Grid covering major cities in many states, each node with a 10-100 Tflop/s Supercomputer. They will do research on that, train graduates and researchers, and eventually deliver application results. They will become very good at writing Supercomputing application software as well.

Primeur: And in ten years time people will not even know there was a time when there were no Tflop/s systems at all.

Steve Chen: That is right. By gradually using this approach of delivering this capacity and capability as a Utility Service, people can get acquainted with it. Some people use it two days; others only one day. That is OK. The machine will be used by many users. The Chinese government supports university research and gives them money to pay for the use of the service. That is better than to spend money on buying thousands of separate smaller systems and none of them can do significant work.

Primeur: How are these big systems managed?

Steve Chen: Current HPC systems are managed by separate computer centres. In China there are now two big HPC centres. One is the Chinese National Academy of Science. They have a 5 Tflop/s machine. They want to upgrade to a 100 Tflop/s system. The second one is in Shanghai, with 11 Tflop/s, which they would like to upgrade to 100 Tflop/s. These are the two National Centres right now, but that is too little and too few. There are thousands of universities. The best way for China is to provide a 10-100 Tflop/s system per node, about 100 nodes, each node supported by hundreds or thousands of 1 Tflop/s group level or smaller personal supercomputers, and link all these together as an Integral Grid with shared capability and capacity up to 1-10 Petaflop/s.

Primeur: Is that related to what you call the THIRD-BRAIN?

Steve Chen: Not directly, the THIRD-BRAIN is a future Bio-Supercomputer. We can use current Supercomputing Grid to simulate and emulate the THIRD-BRAIN. The THIRD-BRAIN can be thought of as an addition to the Human Brain - the First and the Second Brain (the Cerebrum and Cerebellum) inside our head.

Primeur: But the THIRD-BRAIN will be a system still detached from our real Brain?

Steve Chen: Yes, the THIRD-BRAIN is outside and complements our own natural ones. It will do many jobs as well as or better than our Human Brain, but it will not decay, rarely be sick or can be self-healing. It can store a lot of information in addition to our Brain and will not forget easily. The Human Brain is better at higher-level intelligence. The THIRD-BRAIN cannot catch up with that in the near future. Maybe some day. We however, when we get older, we tend to forget things. If you have both Brains, functioning together, you will forget less and think fast. The THIRD-BRAIN will be close enough to emulate a real human brain for these functions.

Primeur: Are you also looking at the interfaces?

Steve Chen: Yes. For example, initially there may be a new type of human-machine interface that may be using light in three dimensions, a bit like our eyes. The problem is, for instance, "How do you see, process, store and retrieve images?" I think we have to research just how Humans manage to do that.

Primeur: But there are already for a long time people who study Human Vision and Artificial Intelligence. But they did not succeed in getting a practical system mimicking that behavior.

Steve Chen: That is because they always looked at it part by part and not as a Bio system. We will combine Neuroscience, Bioinformatics and Supercomputing multi-disciplines to study, for example, the eye and the ear, and at the same time their connections to the Brain and the ways that the Brain is processing visual and audio signals. What we want to do is start with the big picture: take a relatively simple general model that emulates how Humans transmit information from the body to the Brain and gradually add more levels of details.

Primeur: Will that be a good model? Look, for instance, how especially young people are very good with SMS. When someone would have told you just a few years ago that typing SMS messages was a good interface, everybody would have laughed; but it works. So a successful interface does not always have to mimic what we Humans do.

Steve Chen: Our knowledge of the Human and Nature in general and its inherent capabilities is very limited. As Humans we always adapt to what we have available to us today, that does not mean it is the best or the most productive. The Human has used shouting (natural sound wave through the air) to communicate for thousands of years, until he discovered telephone and mobile phone through electric current and electromagnetic waves - two other nature wonders which were there already but were not discovered by Humans until thousands of years later. Perhaps in 10-30 years from now, you and I may be able to communicate remotely through another nature wonder "Human Brainwave". But, today we just do not know how to decode it yet. Once we figure out how to do it, then we would not need to listen to or be surrounded by a sea of mobile phones and "zapped" by harmful radio wave everywhere. So because we adapt to our current limitations that allow us to do things in a certain way, that does not mean it is better than the natural one. We can find similar arguments in natural vs. traditional medicine.

Primeur: I am not sure. People are very good at adapting and specializing. That is why Humans are so successful as a species. We can specialize in almost any function, while animals can specialize in only one, perhaps two functions; and if they do, the specialization is mostly fixed.

Steve Chen: That is where the THIRD-BRAIN comes in. It will provide new high-level functions. For example, the new THIRD-BRAIN driven learning engine will bring you to a much higher level of learning experience and effectiveness. If the new way allows you to learn in a simpler and more intuitive way, you gradually move over to using this new way. When there was no telephone, people were using Telegraph and Morse code, and before that, Smoke to communicate - and they were very good with these. When they got new technology, they adapted. I think Humans are very good at it. If I see a new way driven by the THIRD-BRIAN that is easier and more effective, I move over to it. That is the good thing about Human Beings - they continue to adapt, create and evolve.

Primeur: Seems like your research is rather broad.

Steve Chen: Our research will cross between Neuroscience, Bioinformatics and Supercomputing. We want to build a system model and human-machine interface that allows us to understand, to use, to protect the Human Brain, and ultimately, to create a THIRD-BRAIN that behaves very closely to our Human Brain and can function as its extension. We can use it to study how the real Brain works and to detect and repair when it does not work. We can do experiments on the model that we cannot do on the real Human Brain, by making the THIRD-BRAIN very close to the real one, close enough to detect a disease origin and find a cure. For example, we can detect early symptoms of Alzheimer's disease at a much earlier stage and treat the disease accordingly. This is what we wish to do with our THIRD-BRAIN.

To build the THIRD-BRAIN systems (or the future Bio-Supercomputers), we can look at the insight of the Human Brain. This machine can consist of many major nodes that each represent a different part of the Brain and communicate with each other. Our Brain can think fast despite that each neuron may be slow, as we have lots of them running in parallel. Neurons are not identical and are not connected in identical ways in each part of the Brain. I think that gives us an idea on how to construct such a system. To build it, one can do it from the gene level, then to the molecular level, and move up. This bottom up approach may take too much time, may be too slow, and lack of overall system view may turn research into a wrong path. We will start from the system level or top down approach in which we can not go too far away from what it should be. We will continue to refine our model and verify against our experiment, similar to the development of electronics. In electronics we have electricity. Inventors of electrical devices had no idea of Quantum Physics, but their model was sufficient and worked. They started with Field Theory and delivering electricity, then go to Wave Theory, and then go to Particle Theory. In the Human Bio-system we need to go through that same process. Right now we are far away. We have not even reached the first model.

Primeur: Do you have an idea what your first model will look like?

Steve Chen: I think we will very soon understand what the first level model will be.

Primeur: But how do you start there? Two centuries ago people also had models of the Human Brain. They thought - "Well, if you look on the outside skull, there were knobs for mathematics, languages, etc., so looking at the knobs was looking at the Brain, and you could predict how smart someone is".

Steve Chen: That was a very crude approach. They did not have a model that could be verified experimentally. Today we can make assumptions and we can also prove these assumptions, step by step, in the brain. With, for instance, Magnetic Resonance Imaging, MRI, we have an advanced experimental tool available. Brain Imaging as well as supercomputer simulation will be used to do this modeling. They did not have that before. Today I can have people in the lab, checking their Brainwave when they are doing an English language exercise, and see which part of brain has a reaction.

Primeur: And then you have the THIRD-BRAIN and you train it until it lights up at the right places.

Steve Chen: Correct.

Primeur: OK, that is an interesting model.

Steve Chen: It is not so much a different approach than our traditional one. You have a theoretical model, and a real model on which you experiment. If they are very close the theoretical model is OK. We can construct the model with a very fast Supercomputer. But to construct the model you need a lot of disciplines. That is why we have a Psychologist, Neuroscientist, Integral Medicine and Bioenergetics Specialist, Biomedical and Life Science Researcher, Software Architect, and System Architect on the team. We get together frequently to think and discuss vigorously about this model assumption. You can not be too wrong on your assumptions, as you always check them against experiments. Even if the assumptions are very crude in the beginning, we can continue to refine the model until it is close enough and becomes useful.

Primeur: Yes, most people think exact science is "exact" but most of the time it is about approximation and doing that the correct way.

Steve Chen: I agree. Right now there is no "exact" bioscience model of humans as a system. So you cannot tell anything exactly. Especially in the medical field, for instance, nobody can tell you whether this drug is good for you without any side effect. So what they do is approximate. Unfortunately when you take a second drug, a third drug, a fourth drug, each of them is tested based only on a few thousands cases of human samples, and if you take these drugs together for ten years, these drugs may ultimately kill you instead of helping you because of the complex interaction among all four drugs. We just do not have enough data to predict it.

Primeur: It is interesting to see your model at work. If you have the high model of the brain, will that fit directly on your blade supercomputer?

Steve Chen: It does when we decompose it in big chunks. When we look at the brain as having N functional units, some of them will be simple, some of them very complex. We can implement each functional unit on a system partition of a single or a group of blades. Each partition itself can be decomposed further down to represent the next level structural details of each functional unit. The partitions and the links between them will gradually grow and stabilize. We may end up using a single chip in the blade to emulate a group of neurons that perform the same brain function. Remember the brain has many billions of neurons.

Primeur: That is a lot. But chips also already have a lot of connections.

Steve Chen: A single chip in the blade can contain millions of bits representing millions of neurons. By connecting all blades inside each partition and between partitions like a human brain, we can establish a model to simulate or emulate a portion of our brain initially and ultimately the whole brain.

Primeur: Could I compare this for instance with the development of airplane design? Twenty years ago, one was happy, if one could analyze one wing with a supercomputer. Then there came guys who could do two wings at the same time, and then they added the hull and then added the engines and finally they could do the modeling of the whole airplane in a supercomputer.

Steve Chen: Yes. That is what we have to go through. Take one wing or one piece of brain at the time and do a very good job. When it is close enough, then do the second piece, and then maybe do the simulation of two pieces at the same time by linking them together. Then gradually add more pieces to it.

Primeur: When can I use my Third Brain?

Steve Chen: That is an interesting question. With proper funding support, our goal is to establish an initial model within three years, and to refine the model to show some meaningful results within five years.

Primeur: So before I get old, I can use my Third Brain.

Steve Chen: Hope so. In fact, we have already achieved some good progress. Gradually, based on our research results, we will develop and deliver products and services on our Integral Grid to benefit many people in China and worldwide.

Primeur: This brings me to another question. You told in your presentation that in remote areas of China there are no doctors. Only one nurse for a very large number of people. Do nurses in these remote regions have Internet access?

Steve Chen: Yes, most community clinic offices or hospitals in remote areas can connect to Internet through DSL or broadband wireless services. China has one of the most intensive fiber optical networks in the world. Plus emerging WiMax technology, access to Internet through broadband communications will not be a problem in China.

Primeur: So then a nurse would have a PC with Internet access?

Steve Chen: In some poor areas, a nurse may not have a PC. But, during the next five years, China government is providing large funding to help remote schools and hospitals to install PC, server, software and broadband Internet access. However, only hardware and software is of no use until you have good content. We are now developing the content for digital health care and digital learning to deliver as services on our Integral Grid.

Primeur: The content can be a kind of "virtual" doctor?

Steve Chen: Yes, especially for a very small and remote farming village that can be useful. We can use some mature software packages as an expert diagnostics system. When you are a patient coming here and I am a nurse and not a doctor, I can still ask you a set of pre-defined questions. The questions are well defined, and there may be as many as forty questions in a specific clinical field. These questions were developed by three or more expert specialists in the same field, in which they practiced their whole life. These questions are just like the ones most doctors will ask you when you go to hospital. The nurse will guide the patient through the questions and answers in the computer, and the system will do the intensive data mining for diagnosis and give out a proposal for treatment. The US military has already adapted this approach to replace real doctors in the battle field. The accuracy is pretty high, about 80 percent. The real doctors sometimes perform worse than this due to lack of good quality patient care time and human fatigue.

Primeur: And they also follow the same protocol: the nurses with the "Virtual doctor" and the real doctors.

Steve Chen: Yes. The system will complement with real doctors' remote consultation and review of diagnosis and treatment plan, cross hospitals patient transfer and outpatient care, early physician appointment through the web, digital and standard-driven personal health cards, electronic medical records and medical imaging data, as well as a national health care information system for personalized health care information gathering, updating, monitoring, disease control and prevention. The deployment of this system will be on our Integral Grid. This project is very significant and will levitate the severe shortage of well-trained community family doctors, and most important of all, solve the huge health care problems (expensive, hardship and tedious) facing 1.3 billion people in China. After the system works in China, we will introduce it to all other parts of the world as the health care and education problems are the two most important issues facing the whole world including the advanced developed countries like the United States.

Primeur: Thanks for the interview.

Editor's note:

This interview has been published by courtesy of Primeur/EntertheGrid Magazine.

Ad Emmen

[Medical IT News][Calendar][Virtual Medical Worlds Community][News on Advanced IT]