|
Mr. Tremblay first expanded on the memory bottleneck we are facing today. CPU frequency is doubling every two years while DRAM speeds are only doubling every six years. This leaves a memory gap. In a typical complex high frequency processor, up to 75 percent of the cycles are waiting for memory. We already evolved from the single-threaded processor to the chip multi-threading (CMT) to the multiple multi-threaded cores for a better performance.
By throughput computing, Mr. Tremblay means an aggregate amount of work done per unit of time by a system from all processors using all cores and threads down to all functioning units. There exists a broad scope in throughput computing including multi-threaded and parallelisable applications, multi-threaded applications with medium scalability and multiple copies of an application. Factors that enable throughput computing are the transistors offered by Moore's Law, SMP technology, MAJC experience, asynchronous design allowing for CMT design, and acquisition. The speaker also referred to the years of research performed at Sun.
Throughput computing has an impact on the micro-architecture in the sense that there is a 20 percent area increase due to the physical size of the register file. The multiple outstanding load/store requests must be supported whereas many of them are deeper load and store buffers. Whenever the average power approaches the peak power, that is a good sign, according to the speaker.
The influence on the memory subsystem is likewise since there are much more customers or threads. The very high bandwidth is being sustained even with the spaghetti code. With data on a simple single-chip system, 20 Gbytes/s of bcopy bandwidth is achievable but the speaker advises to use the available bandwidth.
In relationship with HPC, Sun is building processors and systems which are capable of a much higher sustained and higher peak bandwidth. This is because CMT processors and systems are very bandwidth hungry. Mr. Tremblay believed that it would be possible to address the 64-bit processing which brings CMT-based commercially-oriented systems much closer to what is required for HPC.
There are a few more features needed for HPC. To exploit the peak bandwidth it is necessary to provide enough functional units to digest the bandwidth and very tightly coupled CPUs with low-latency sharing and synchronisation. One also has to avoid conflicts through a multi-banked, multi-associative memory subsystem and provide unattainable invalidates and sharing, according to the speaker.
The keys to success at Sun consist in a multi-threaded environment focusing rather on network computing than on desktop applications with a highly threaded operating system such as Solaris as well as a highly threaded software stack such as Java, an application server, and so on.
With Sparc Blades processors, you can reach a factor of 15 in relative performance in contrast with only a factor of 3 for a single-threaded processor, as Mr. Tremblay noted. And with the Sparc systems processor you even achieve a factor of 30. Customers thus win in performance and reliability. As far as the costs are concerned, there are fewer servers needed, less floor space, a reduced power consumption, less air conditioning, and a reduced administration and maintenance.
Mr. Tremblay mentioned three waves of Sparc innovation, namely RISC processors between the eighties and nineties, SMP between the nineties and 2000, and now CMT which is emerging. Sun has developed the s- and i-series and will be building the h-series as servers-on-a-chip (SOCs).
The speaker announced the upcoming micro-processors at Sun: UltraSparc III in the s-series as the first 64-bit micro-processor with 35 percent more throughput; UltraSparc IIIi in the i-series in 2003; UltraSparc IV in the s-series in 2003/2004 and Gemini in the h-series in 2004, both with CMT design and 2 times more throughput than the current processor; UltraSparc V in the s-series in 2005 with CMT design and 5 times more throughput than the current processor; and Niagara in the h-series in 2005 with 15 times more throughput.
Mr. Tremblay concluded by announcing that for 2006/2007, Sun will be introducing a supercomputer-on-a-chip with 30 times more throughput than 1.2 Ghz US-III based systems.
|