A joint project of Carnegie Mellon University, the University of Pittsburgh and
Westinghouse Electric Company, the Pittsburgh Supercomputing Center (PSC) has
deployed the TCS to fill a gap in U.S. basic research capability highlighted in
a 1999 presidential report. With peak capability of six
teraflops, the new system is now by far the most powerful available as an open
resource for researchers attacking a wide range of problems.
The TCS represents a synthesis of "off-the-shelf" components
integrated with an advanced interconnectfrom Quadrics Supercomputers World.
It comprises 3,000 Compaq Alpha EV68 microprocessors, housed in 750
four-processor AlphaServer systems running Tru64 UNIX. The latest evolution of
the widely used Alpha microchip technology, the EV68 has peak floating-point
capability of two gigaflops (two billion calculations per second).
Along with six teraflops of processing power, the TCS features 3.0 terabytes of
memory, high-bandwidth, low-latency interconnections and remarkable capabilities
for large-scale data handling, including the ability to write the entire memory
to disk in under 40 seconds. This extremely short system-write time, developed
through PSC systems and software engineering, is critical to efficient
checkpointing, needed to preserve research data in the event of component
failure.
Preparation for the TCS began in October 2000 with installation of a
256-processor prototype system. In August 2001, the first of the new AlphaServer
systems arrived at the PSC computer room at Westinghouse Energy Center in
Monroeville, Pennsylvania. System components came in multiple deliveries from
Compaq facilities in Texas and Scotland. An on-site team of Compaq, PSC and
Westinghouse engineers and technicians -- supported by expert teams at Compaq
locations in the United States, Bristol, England and Galway, Ireland -- worked
aggressively to meet the Oct. 1 installation date.
The TCS installation marks the first operation of AlphaServer SC, the system
software that ties AlphaServer systems together, on this scale and the first
large-scale, multi-level Quadrics switch structure that supports thousands of
processors while achieving sustained operation across the system. Standard
benchmark software has measured system performance over three teraflops. The TCS
will next go through a period of "friendly user" testing, and by early 2002 it
will become available to researchers nationwide through the peer-review process
of the NSF PACI program.
PSC and Compaq collaborated on numerous machine enhancements to improve the
performance of the TCS, changes that range from the disk controller and file
system to wiring optimizations. By careful site planning and redesign of the
AlphaServer configurations, PSC engineers reduced the distance between
processors, thereby also reducing cabling and minimizing network latency.
Total TCS floor space is roughly that of a basketball court. It uses 14 miles of
high-bandwidth interconnect cable to maintain communication among its 3,000
processors. Another seven miles of serial, copper cable and a mile of
fiber-optic cable provide for data handling.
The TCS requires 664 kilowatts of power, enough to power 500 homes. It produces
heat equivalent to burning 169 pounds of coal an hour, much of which is used in
heating the Westinghouse Energy Center. To cool the computer room, more than 600
feet of eight-inch cooling pipe, weighing 12 tons, circulate up to 900 gallons
of water per minute, and twelve 30-ton air-handling units provide cooling
capacity equivalent to 375 room air conditioners.