Waiting for ASCI-Q to take Los Alamos to the limit
Heidelberg 21 jun 2001 At Los Alamos National Laboratory, the staff will be faced with a new challenge in the near future: the integration of the new ASCI-Q 30 Teraflop/s computer system, built by Compaq. The huge potential of this revolutionary system which represents the next step in the Accelerated Strategic Computing Initiative (ASCI), was introduced at SC 2001 by Richard Kaufman from Compaq and by Ken Koch and John Morrison from Los Alamos. The speakers offered a realistic picture of the complex simulation tasks the ASCI-Q will have to deal with.
The world's first 30 teraOPS supercomputer, Q for the friends, develops a speed of 96.000 megabytes per second. Mr. Kaufman compared the Q to the current Pittsburgh Supercomputing Center system, displaying 6 Teraflops and 750 processors. The Q will include 374 servers, 2 file system domains, 12 terabytes of memory, 600 terabytes of usable storage, and 8 rails of quadrics. The file servers are ranged into groups of 8 servers. The strategy for Q was to build a larger switch with the fat tree concept. There are 64 SMPs provided for the bottom switches.
Los Alamos has been working with SGI and IBM systems in simulations for nuclear physics, as Dr. Koch explained. The ASCI codes represent the major activity, next to modelling and algorithm design, computer science, and verification and validation. The most important technique is Message Passing Interface (MPI) in 3D meshes for multi-physics and aerodynamics. The used programming languages are FORTRAN, C, and C++. Different grids are being developed for different physics. These multiple physics research is performed in self-controlled time steps, using structured and unstructured codes and partial differential equations. This method of proceeding requires synchronised types of communication.
At present, the communication speeds at Los Alamos are pushing their limits. The Q will have to cope with the architectural challenge to move large amounts of data over Wide Area Networks (WANs). The Q capacity will double each six months from 155 Mb/s over OC-3c to 622 Mb/s over OC-12c to 2.5 Gb/s over OC-48c, as Mr. John Morrison stated. With regard to the storage capacity, the present high-performance storage system, implemented by IBM and the National Laboratories, does not respond to the needs of the users. Q's RAIT system therefore will present a system of parallel tape drives for improved storage.
The visualisation challenge at Los Alamos demands huge investments. There is a special contract for a server of servers to move data in parallel streams. The current SGI system uses infinite reality pipes. Visualisation at a distance should be possible at the same speeds as local visualisation. This can be realised by moving the data to a commodity cluster. The software environment consists of Unix with Load Sharing Facility (LSF) as resource management. With 12.000 to 15.000 micro-processors, Mr. Morrison fears the reliability will be dropping dramatically so there will be a need to think differently by developing a software infrastructure in system carriers.
The limits of scalability will be pushed to new dimensions. The existing package of resource management will be integrated into Q for the management of different jobs. Also, system administration will be provided for as well as full system diagnostics. In February 2000, the construction of a building of 303.000 gross square feet was started to host the Q in a 43.500 square feet unobstructed computer room of which Q will consume half of the space. The facility will be completed in September 2001 and will provide 1 powerwall theatre, 4 collaboration and 2 immersive rooms, 300 design laboratories and a 200-seat auditorium.
The risks to be met consist in processor speeds, disks, and interconnectivity; in scheduling issues; in interdependency capabilities of the software; and in system integration. In addition, the installation and scaling will form equal challenges, if not to speak of the performance, administration, and reliability problems. It almost sounded as if Los Alamos was not happy to change to the new system but that was only an impression. The National Laboratory is all too keen on having user requirements drive the system architecture. This concern is stimulating the Los Alamos staff to welcome the Q by pushing it right to the limit.
Leslie Versweyveld
[News on Advanced IT][Calendar][Analysis][IT in Medicine]
|