UK number crunching scientists want real supercomputers
London 20 December 2002 The annual machine evaluation workshop at EPSRC Daresbury Laboratories,
UK 11-12 December 2002 illustrated the year's main general HPC trend:
Cluster computers have got a firm foothold in the upper regions of the
TOP500 list of supercomputers. At Daresbury workshop, new vendors of
clusters, from the UK and abroad, presented their solutions, whilst the
established vendors displayed their own cluster machines. There were
some regrets expressed about this development: Number crunching UK
Scientists in practice need a capability system, but because of the
funding mechanism currently in place they ended up with a capacity
system. (Chris Lazou)
This established workshop provided a plethora of distributed memory benchmark results on the latest products from vendors using their latest chips. The Daresbury benchmark suite, used to obtain these results, consists of many computational chemistry kernel codes, molecular dynamics, Quantum Monte Carlo, Jacobi Solver, STREAM - measured sustainable memory bandwidth in HPC (TRIAD), the Ab Initio molecular electronic structure, the DL_POLY and the parallel molecular dynamics benchmark. The results from SPECfp2000 and other well-known benchmarks were also presented. Martyn Guest and other Daresbury staff described their benchmark findings, comparing the performance of many PC based available systems. Looking at SPECfp_Rate one finds it differs from SPECfp value. The SPECfp_Rate is much smaller on the IBM P4 compared to the HP Alpha chip. Performance for a particular chip tends to vary on different benchmarks, but one can see a pattern emerging.
The benchmark results indicate that the Intel Itanium2 systems are the bright stars of today, faring well as far as performance is concerned compared to other super-scalar chips. Using the computational Chemistry Rate Benchmark, taking the Geometric Mean one finds that a four processor HP Rx5670 with the Itanium2 is comparable with the IBM P4 (1.3GHz), the HP ES45/1250, the AMD 2000+/1667, HP Alpha EV7 and so on. This is also true when using the Rate Benchmark, DL_POLY Component.
This is re-enforced when using the Linpack Benchmark. For example, the NEC TX7, the only 32-way Itanium2 based product at present, which employs an NEC developed chipset and crossbar switch, broke the record, achieving 101.77Gigaflop/s out of 128Gigaflop/s peak on the Linpack benchmark. In contrast, a comparable 32-way IBM P4 delivers only 96.6Gigaflop/s out of 166.4Gigaflop/s peak, using the same Linpack benchmark.
Vendor presentations included the new IBM Power 4+ p690 chip roadmap and brief description of HPCx, the 5Teraflop/s peak performance IBM P4 system, recently installed at Daresbury, for use by UK academic researchers. Others included the SGI roadmap showing developments until year 2005 for their new SN-IPF system based on the Itanium2 chip; the new AMD Opteron chip and its expected appearance on the market in 2nd quarter of year 2003; the HP Alpha EV7 and its phasing out in two years time, when current HP product lines are to be merged into the Itanium product line, and so on. I shall not bore you here with details, as they have been reported recently in articles from SC2002 in Baltimore.
A number of presentations including one from Sun Microsystems and one from IBM presented the Grid vision, its current state and its potential both as a "more efficient" utilization of current user computer resources reducing total cost of ownership - and as having potential to develop into a new e-business revenue stream. In order to deploy the new capabilities of Grid services, autonomic functions and management middleware, backed by OGSA infrastructure and services, are being put in place.
Both Cray and NEC gave presentations on parallel vector processor (PVP) supercomputers. Cray concentrated on the newly announced Cray X1 and the roadmap to deliver 1Petaflop/s sustained performance by year 2010. The Cray X1 uses the "successful" network components of the Cray T3E, the vector high bandwidth architecture of the Cray T90, a new instruction set and other architectural innovations to make the system scalable enough to deliver Petaflop/s.
Joerg Stadler gave a brief history of the NEC parallel vector SX series systems including their current SX-6 and promised follow-ups with even higher performance. He also briefly mentioned the Earth Simulator made from NEC SX technologies, with its 40 Teraflop/s peak performance, the fastest system in the current TOP500 list.
Stadler went on to say that until now, the European arm of NEC was concentrating on selling the SX series capability computers, whilst in Japan it was also strong in selling capacity computers in the server commercial market. The introduction of the NEC SX-6i, a desk side departmental vector system and the NEC TX7 series, a 32-way CPU scalar server based on the Intel Itanium2 processor, changed their sales strategy, with both these systems also marketed in Europe.
What has transpired is that almost all vendors, though continuing their own product lines, are also actively planning to incorporate the Intel Itanium2 chip and its successors Madison and Montecito, in their near future server products. HPC vendors are also busy open sourcing their compilers and other software tools, adopting Redhat, Susa, and Turbo Linux as one of the Operating systems to be used on their Intel Itanium2 products. Grid technology is also vigorously pursued.
Tailored systems built for High Performance Computing
The rest of the workshop consisted of user experience in building their own "tailored systems" and presentations from a number of companies, specialising in providing tailored system solutions from commodity components on demand. Instead of buying pre-packaged products from traditional vendors, a cluster can be cobbled from favoured chips and an interconnect network, such as Gigabit, QsNet from Quadrics or Dolphin interconnect, to fit ones pocket and presumably satisfy computational needs.
In the last few years, Beowulf systems have been built with some success, aiming to replace readymade large-scale supercomputers with "cheap" off-the-shelf microchips. These include very large systems in production at Cornell University and the Pittsburgh Supercomputing Centre with its 5Teraflop/s system and the cluster at the Commissariat a l'Energie Atomique (CEA) Bruyeres-le-Chatel - the largest in Europe.
This "tailored system" paradigm has also been used to build departmental systems from off-the-shelf components of choice. These typically consist of several hundred or in a few cases 1000s of processors using AMD, Intel Pentium, IBM Power 3/4 or HP Alpha chips, cobbled together with a network interconnect, such as a Gigabit switch, Dolphin interconnect, or QsNet from Quadrics. But are these really cheap supercomputing alternatives? The cost integral in the educational sites only include semiconductor components and exclude personnel costs.
The tailored system paradigm is nevertheless spawning a number of small companies, providing build and maintenance services for made to order systems. For example, ClusterVision is a start up company, which claims it can deliver a fully functioning system with all hardware and software integrated and configured for immediate deployment. (See Primeur issue, 23 December.)
In summary, although a lot of effort was expended in proving the DIY tailored system paradigm, there are still a lot of issues to be resolved. Crucially these issues include delivering a low latency crossbar switch with high bandwidth, high-speed memory access and software. These issues cumulatively add up and often result in poor performance. Vendor products, developed under a strict engineering regime could still be better value for money if reliability and integration costs are taken into account.
Finally, talking to friends over drinks during the reception, I discovered that UK academic researchers wanted a capability system, but sadly because of the funding mechanism in place, ended up with an IBM p690 P4 capacity system instead. It was claimed, that in some cases the performance from the IBM P4 hardly matches what they had from six years ago on the Cray T3E.
Wishing all my readers, Seasons Greetings and a Peaceful New Year.
Chris Lazou
[News on Advanced IT][Calendar][Analysis][IT in Medicine]
|