Challenging innovative technologies at the NEC User Group Meeting in Frascati

Frascati 16 June 2001 The NEC User Group held its 13th meeting in an elegant 18th century Villa on a hill outside Frascati some 20 kilometres from Rome, Italy. The attendees were mostly NEC customers, using shared memory parallel vector supercomputers. During this meeting, there were many interesting presentations, including one by Dr. M. Fukuma, general manager , silicon systems research laboratories, at NEC. He described the latest research and problems to be overcome on technologies, which are expected to reach the market, from NEC in the next five to fifteen years. Some of the most interesting elements of his presentation were proprietary, nevertheless there is enough public domain material of interest, to share with you.

In pursue of the plausible

The computer industry thrives on sound-bites and the pursuit of the plausible. A tenuous observation by one or other computer luminary and the next thing you know it has become a "law of nature". Gordon Moore is wisely easing himself out of the front line at Intel, just before his "mortal law" runs out of steam. Intel invested some US$7.5 Billion to develop the Itanium chip, but are they really hopeful that Moore's Law of doubling performance every 18 months, which this projection translates to putting 325 Gflop/s on a chip, is safe for the next 10 years? Incidentally, this chip is using an 0.18 micron technology. This is already a great challenge and in the next few years could become a nightmare for engineers who are expected to keep up with the relentless requirements implicit in Moore's Law prescription.

It is well known that a 0.15 micron technology is now available from supercomputer vendors including NEC. Scaling has been the driving force for VLSI. When one looks at the CMOS Road Map, technology at say 50 nanometers is possible in 10 years time. However, when the "design rule" (technology node) goes below 100 nanometers it hits many problems, not least current leakage. Those of you who still remember your university physics should recall the tunnelling effects described by Schrodinger's equations.

VLSI technology is driven by consumer goods and new equivalent scaling approaches

As is often the case, VLSI is currently driven by the larger market of consumer goods, such as mobiles, rather than supercomputers. In the mobile case one needs low Voltage devices, and this allows the use of a thin oxide, which in turn has low current leakage. In supercomputer devices one also needs low voltage but the current leakage is high. The crux for higher density VLSI development is to avoid leakage so that a stable device can be built.

Recent work using organic material such as BCB on copper, demonstrated devices when using 80 nanometer etching. The use of BCBs on copper was found to reduce propagation delay by 25%. Copper is apparently a must for VLSI.

The international industry agreed technology road map for logic devices on CMOS, (ITRS) predicts that scaling barriers kick in from about 2008 to 2014. For example, it is generally agreed that a one nanometer thickness oxide is not possible.

Up to now devices were driven by conventional scaling, but from now on they are to be driven by three approaches of equivalent system scaling. The following are examples of these three approaches:

  • Equivalent System Scaling: Dynamically re-configurable logic; multiple processors on a chip.
  • Equivalent Device Scaling: High-k gate insulator, and low-k inter-layer dielectric such as BCB.
  • Synergetic dev-ckt co-design: FP-CMOS such as multiple parameters and software controlled power management

A new paradigm is therefore needed to develop devices post scaling of CMOS. One needs new processes using new material, new architectures and new circuits. With the advent of larger 10X10 mm chips, a new approach is to use a multi-GHz clocking strategy and at the same time adopt a dynamically reconfigured logic. This can be done at very high speed, in 5 nano-seconds one can map 8 tasks on the same hardware device. In addition, the trend is to put multiple processors on a chip. To overcome high stand-by power dissipation, a software controlled power management circuit is needed. So the next generation CMOS is likely to have equivalent scaling with adjustable wide range multiple parameters to allow many tasks performed on the same hardware device. An example which illustrates where research is going, an EJ-MOSFET device at 8 nanometers has already been demonstrated by NEC. This device requires less than 1 volt to power it.

High End Computing is more than a chip

Of course high end supercomputing is more than a chip, it also involves memory bandwidth, heat extraction and tight communication system integration to deliver high optimisation efficiencies. So, although the device developers see a life in CMOS for possibly the next 20 years, HPC systems designers see extra barriers which brings this back to the ITRS time frame.

One of these severe system constraints posing a challenge to system designers, is the number of pins required to service the increased functions included on a chip. Although a high speed interface could help reduce the problem, this would depend on the system architecture, especially in HPC machines.

When one looks at technology changes needed to deliver Petaflop/s in the future the biggest challenge is how these would fit in the computer environment of today. Current computer system research suggests that it is possible to get Petaflop/s for a particular application, as for example, the IBM blue gene project, but very hard to deliver as general computing. In my view, it would be very surprising if technology changes needed for Petaflop/s does not cause some user pain because of side effects.

Note that the blue gene processor proposed by IBM has a restricted instruction set and yet requires one million processors linked together to get one petaflop/s peak performance. Even with 32 processors on a chip, one needs 32 thousand out of chip communication connections. Added the fact that, for a balanced system, the silicon needed for out of chip communication, grows nlog(n) and the horrendous problem facing system engineers becomes obvious.

More importantly when you factor in communication delays in Amdahl's Law, the sustained performance peters out to a relatively low baseline very fast, under the above system configuration. So, when vendors talk about peak performance, remember the old adage, "caveat emptor", (let the buyer beware). One can get round this, of course, by running ensembles of programs, using a subset of say 2048 processors on 64 chip-sets to reduce communication overheads, but then we are changing the semantics of what we mean by a petaflop/s machine.

The challenge after CMOS has run its course

In summary, beyond CMOS, after year 2010-14, there would be new material challenges, some of these are in the experimental phase with many reliability, design, manufacturing process and operating environment issues, to be solved. These include Josephson Junction technology, Single Electron Transistor (SET), and Single Flux Quantum (SFQ). With Josephson junction devices heat dissipation is about 1000 times lower than those incurred with silicon. Many computer manufacturers, including IBM, Control Data, and our friends from Japan, have demonstrated components with logic gates switching at 10 ps as early as 1981. As superconductivity occurs in low temperatures, the devices have to be submerged in cryogenic liquids. This causes difficulties in interconnection and chip packaging.

SFQ can apply conventional Boolean algebra where the existence of single flux within a JJ loop expresses Boolean '0' or '1'. But quantum computing is completely different from the conventional computing mechanism, as it can be surmised from the brief description below.

In quantum computing, the use of Qubits with entanglement allows the development of a super-parallel quantum computer. By using N Qubits, one can do 2 to the power N calculations simultaneously. One should note that, although quantum computation is fundamentally parallel, it requires special algorithms to function with, and is unlikely to become a general purpose machine. To-date only two kinds of practical use have been discovered, Shor's factorisation, and Grover's database search algorithm. One expects this situation to change, as many more researchers throw their hat in the ring.

As for hardware developments, they are even further lagging behind. It is estimated that one needs around ten thousand Qubits for practical factorisation. The latest IBM experiment using NMR, only managed to utilise 5 to 7 Qubits. NEC demonstrated a solid-state Qubit, making a superconductive single electron box with Josephson Junctions and its electronic superposition state was then controlled at will, by electrically operating the gate. This won NEC the Nishina Memorial prize, but there is a long way before Qubits can be used for realistic quantum computation, as this requires Qubits with much longer coherence time.

In short, the dice is already cast, with future systems inevitably having many processors on a chip. However, memory hierarchies, bandwidth, communication interfaces and system software compatibility are not only essential elements, but would have enormous influence on the fortunes of a future product which thrives in the market place.

(Brands and names are the property of their respective owners)

Copyright: Christopher Lazou, HiPerCom Consultants, Ltd., UK. Email: Chris@lazou.demon.co.uk


Chris Lazou

[News on Advanced IT][Calendar][Analysis][IT in Medicine]