Carl Anderson, a distinguished engineer at IBM, said one Hot Chip paper from
Brad McCredie and Roger Bailey, the processor design team leaders, will
describe the Power4 test chip. A second paper will detail an asynchronous
interface approach, spearheaded by Frank Ferraiolo.
The CPU design is the first by the Austin design team to be able to run either
the AS400 OS or the AIX version of Unix used in IBM's RS6000 workstations and
servers, including an RS6000 SP system sold into the supercomputer market.
Earlier Power 3 designs from IBM's Rochester, Minn., team were able to run
either OS.
High bandwidth
"With a 0.18-micron process and copper interconnects, there are a much
larger number of transistors available to the designers. One way to put them to
use is to put two processor cores on the same die and take advantage of the
very high bandwidth possible between them," Anderson said.
Though IBM's Rochester design team has created an AS400 multithread processor
before, the Power4 design under way here is a standard out-of-order processor.
Like the Power3, the Power4 has two floating-point units per processor core, or
four FPUs per die. It has multiple load and store units, and many of the other
architectural features of the Power3, to support a high-bandwidth interface to
main memory and back. It operates with a 1.5-V power supply. IBM will discuss
the microarchitecture in detail at the Microprocessor Forum, which begins Oct.
4 in San Jose, Calif.
The Power4 can be used to create 32-way systems. (With two processor cores on
each die, a 32-way system would use 16 Power4s.) Though Anderson declined to
elaborate on the bandwidth or bus specifications, he said the bus would run at
greater than 500 MHz.
"The goal is to run the bus at half the processor speed, which is targeted
at greater than a gigahertz for the processor. The data and clock are all sent
on the data bus, and that approach is best for very high-bandwidth systems. We
devoted a fair amount of circuits to extract the clock and deskew the data,
" Anderson said to Electronic Engineering Times. The bus on the current Power3 CPU runs at 250 MHz.
The I/O design relies on synchronous transfer of the clock and data, an
approach that will take IBM from greater than 500-MHz I/Os in the current
design to beyond the gigahertz level in the next generation.
Minimizing latencies is particularly important in multiprocessor systems, where
the latency of the interface can vary over a wide range-in some cases multiple
bus cycles. A synchronous approach has the advantage of reducing the timing
variations, without using a more costly process or strict design constraints.
For the Power4, the I/O design supports point-to-point, unidirectional and
bidirectional bus types. It is an all-digital design, with low power,
source-terminated drivers and active clamps on the receiving circuits, where a
FIFO is placed.