Limits and expectations of HPC systems
London 20 Apr 00 Tera's MTA designer, Burton Smith, said at the HPC conference in Oxford, that he foresees two distinct lines of supercomputers that will survive indefinitely. The scalar MPPs with little communication capabilities compared to their processing and the vector PVPs which are better characterized by memory bandwidth than theoretically infinite scalable peak performance. Will this really be the future? Let us look at the last decade to see whether we can foretell what is likely to happen.
Anticipating the Future Any forecasting model of the future must contain both problems and opportunities. How well we manage future events depends mainly on our anticipatory prowess. It is much better to anticipate a problem and avoid it rather than trying to correct it afterwards. Similarly, anticipation plays a big part in the creation of opportunities. This view is succinctly stated in Alvin Toffler's dictum that "we can manage the future only to the extent that we can anticipate it". So, back in 1988 when I was revising my book "Supercomputers and their Use", I wrote the following closing remarks: "Two strands are currently dominant in the pursuit of greater computer performance; faster switching circuits and the harness of parallelism in new novel architectures. In the field of technology there is still plenty of mileage in silicon: both conventional CMOS and new massively replicated fabrications on wafers. Optic-fibre switching networks are likely to be with us in the next five years and could provide the switching speed necessary to reduce the connectivity overheads. As for supercomputer parallelism, it is important to note that all supercomputers possess a high degree of parallelism, the most successful architectures to-date hide most of their parallelism from the user. New computers would have to be upward compatible to use the wealth of mature software products users are currently running on present systems; the use of new cutting edge technology and the expansion of the number of CPUs, to say, 64, would provide systems with 100 Gflop/s peak performance by 1993-5. From 1995 onwards the supercomputer industry will require both new technology and new architectures. A typical scenario for the new architecture is as follows: A cluster of 256 CPUs in a 16 way closely coupled computers; each CPU communicates by reading and writing to a fast very large shared memory. The communication channels would have to be very fast, with Gigaword rates, and may be constructed using fibre optic devices. Massively Parallel Systems (MPPs), tend to use slower off-the-shelf technology. New generation systems will increasingly group their processors into modules to reduce inter-connect overheads. Each module will have an ever-increasing amount of local memory and special chips for floating point arithmetic. It is becoming apparent that for scientific applications the supercomputers of the 1990s will tend to converge somewhere in the middle; each strand, conventional and massively parallel, will arrive from its own end of the spectrum. As for biological computer devices they must be considered as the technology for the twenty-first century". HPC landscape today So what is the supercomputer landscape today. Even with the ASCI programme which somewhat distorted the market in favour of Scalar Parallel Processors (SPPs), the current state of play is not much different from above: Today all HPC systems on the market are parallel. According to Dr. Jeffrey Mohr, Chief Technology Officer of Computer Sciences Corporation, writing in an RCI management white paper, the commonest size for scalable parallel systems using commodity off-the-shelf chips is 32 to 64 processors, with some between 128 and 512 and a sprinkle of systems larger than 1024 processors delivered mostly to the Federal Labs, as part of the ASCI programme. In the Parallel Vector Processors (PVPs) area, using custom logic design chips, the commonly delivered systems have 32 to 64 processors, some systems with128 and configurations with up to 512 announced but not usually delivered except to special projects such as the Japanese Earth Simulator. Thus, convergence of CPU number size is a reality. The rationale behind practical limits Why is this happening? After all, scalable systems promise potential expansion to infinity and yet their profile in the market is not much different than that of the PVPs. The reason lies with the fundamental fact that one requires to invest a great deal more silicon in memory bandwidth and inter-processor off-chip communication than the SPPs managed to-date to get a balanced system. For P-processors one requires communication speed of Px(lnP), which means communication speed needs to grow much faster than the number of processors if the balance of the machine is to be maintained. Further analysis shows that, the impact of communication delays between processors, even in idealised models, is non-linear. Completely parallelized problems, still reach a relatively low level of overall utilisation as a consequence of inter- processor communications. A smaller number of fast processors and memory bandwidth would always achieve higher relative performance. Thus, theory sets practical limits Hardware engineers constantly have to weigh up the trade-offs between the overheads from communication transport costs inherent in low populated off-the-shelf scalar processors and the costs of designing highly integrated and balanced, but low volume custom chips. What the experts say As Mr. Burton Smith, designer of the Denelcor HEP, the Tera MTA and now owner of Cray Inc., pointed out, at the High Performance Computing conference at Oxford UK, 3-4 April: "In spite of the rhetoric, the high computing ecosystem has niches in which out-of-fashion but otherwise good architectural ideas still thrive These niches exist because there is a diversity among high performance applications, particularly in their ratio of communication to computation". He then goes on to say: "As HPC continues to evolve, I foresee specialisation of the supercomputer genus along two distinct lines. Both subgenus's that result will survive indefinitely. One will be sparsely connected internally and will be found doing computations that need little communication; the other will be densely connected internally and will be for computations that are dominated by communication. Ironically, dense linear algebra will be natural prey for the first subgenus and sparse linear algebra for the second". The second subgenus is of course the PVPs represented by the NEC SX-5 and the Tera MTA. As Mr. Watanabe, the designer of the SX-5 said at Supercomputer 98: "The difference between a supercomputer, and other systems can now be arguably characterised by memory bandwidth more so than peak performance. Our entire SX-2 (1983) main memory provided only 11 Gbyte/s - which is still an order of magnitude above even the newest commodity systems when matched to processor performance. On the new SX-5 Series, advanced controllers have taken this to 32 Gbyte/s per memory board, which is enough to provide 1 TeraByte per second bandwidth. He went on to say that: putting a complete computer on a single chip could never satisfy high-end demands because the proportion of on-chip memory capacity to CPU performance would always force reliance on the performance of external memory systems". Two approaches are left which are conducive to utilising advances in chip integration. One approach involves several scalar CPUs being co-resident on a single chip. This however creates even more severe balance problems as the communication costs between the on-chip CPUs and external memories would become prohibitive. The other approach includes vector processing and high memory bandwidth to bridge the CPU-memory performance gap. Mr. Watanabe pointed out that a parallel vector computer system with 200 Teraflop/s performance is possible by year 2009. In a nutshell, IBM, COMPAQ and other US vendors offering clusters of SPPs are following the first approach, while NEC, TERA, CRAY INC. and other PVP vendors have opted for the second. HPC Market fragmenting and diverging. As a side-effect of the ASCI programme the HPC market which in the USA always depended on military priorities, has been distorted. Once the U.S. Federal Labs opted for the SPP paradigm, using commodity off-the-shelf chips, all the research effort went that way and potentially extendible designs such as the Cray T90, were neglected, atrophied and hit the commercial rocks. So, what was once a homogeneous global HPC market and almost exclusively the preserve of U.S. companies, is now fragmenting with the Europeans shifting their allegiance to Japanese vendors and Parallel Vector Systems. This happened because, in the view of users, high bandwidth vector computing is required for many classes of HPC applications, and computers on offer from most U.S. vendors are based on commodity off-the-shelf chips and totally inadequate. Thus, it was left to Japanese vendors, especially NEC with the SX-5 design to carry the flag for PVPs. This was soon translated into a business opportunity in markets outside the USA, gaining dominance in weather forecasting and climate research, Aerospace, and the automotive industries. Users can influence the market. In my closing remarks back in 1988, I also said that "as far as the main supercomputer vendors are concerned, their future seems to be fairly secure. The biggest uncertainty stems from the intentions of IBM. If IBM decides to enter the fray in earnest, the financial landscape will change dramatically overnight". The takeover of Digital, Convex and Cray Research by PC vendors in the 1990s, is a testament of this and has marginalised the PVP approach to supercomputing in the USA. In selecting a new supercomputer, users take many factors into consideration. Performance, ease of use, availability and software, potential of system for future developments, and viability of the company marketing it, are but a few. This is why the odds are often stacked in favour of established vendors to deliver the next successful product. The dictum that "History proceeds by changing the subject", can provide the necessary optimism for aspiring newcomers of radical architectures. The World Wide Web has already caused a paradigm shift affecting the HPC market. E-commerce and quantum nano-technology are likely to be the new kids on the block.
Chris Lazou
[News on Advanced IT]
[Calendar]
[Analysis]
[IT in Medicine]
|