LSF played crucial role in Athlon development

Toronto 03 May 00 The development of the 1 GHz AMD Athlon processor was made possible through the use of Platform Computing's resource manager software LSF. That according to members of the Austin, TX-based AMD team who made the world's fastest PC processor a reality.

"It is hard to say how Platform LSF didn't help this project, "said Steve Baugh, systems administrator/LSF administrator, AMD. "There are literally thousands of components and billions of tests that were run to create the 1 GHz chip. This is one the most complicated machines ever designed."

Platform LSF enabled the design teams to run the machines in AMD's compute clusters as a single machine. By running computing tasks which were virtually impossible on smaller systems across the distributed processors in their Linux 'supercluster' environment, the team was able to get access to the computing resources needed to accomplish the historic project.

Announced last month, the 1GHz Athlon broke the PC microprocessor speed record for the entire computing industry. According to AMD, one of the keys to their success was the use of LSF.

The 1GHz Athlon chip is a PC processor capable of executing one billion clock cycles per second. It makes use of AMD's 0.18-micron, six-layer metal process technology, and has approximately 22 million transistors. The sheer magnitude of the job of verifying a design of this complexity is, in the words of Baugh, "mind-boggling."

"The percentage of the processing power across the environment we have been able to harness with LSF has really been unprecedented," said Baugh. We are now running at 90 per cent and above utilization - really pushing the outer limits of our design hardware and computing resources."

A number of years ago, AMD employed what was then the traditional design infrastructure of a workstation on every designer's desk – a set-up still common in some semiconductor design environments today. What they discovered was the processing power of the compute infrastructure was going idle the vast majority of the time.

"Plaform is extremely pleased to have played a role in this extraordinary achievement. The development of the 1GHz Athlon chip is a watershed event for the industry," said Phil Weaver, President and COO, Platform Computing.

Today, the racks of Linux systems, powered with AMD's own high-powered processors along with a variety of other Unix-based systems were able to run in tandem on demanding compute jobs, thanks to the use of LSF. By marshalling the compute power needed to run leading design simulation and verification applications across the cluster, LSF was able to ensure in the words of Baugh, "the right person got access to the right machine at the right time."

LSF creates a virtual queue that dispatches jobs and matches them to the correct computing resources. A challenging job would be matched with the requisite amount of processing power and memory. The functionality extends to the management of software licenses and network resources. "Designers are able to submit their work to LSF for completion," said Baugh. The result is the work gets the right resources to get done – fast, with no questions asked."

Moreover, the LSF solution was able to ensure mission-critical performance of the Linux environment. AMD, an early pioneer with Linux systems, was able to ensure that the environment worked reliably and in close concert with existing Unix systems as part of the same cluster if required. Thanks to Platform's early and ongoing support for Linux as well as extensive multi-platform support, this was not an issue for the organization.

"I can say without exaggeration that we performed literally billions of tests. I don't know how you could have tested this more." Baugh adds that without LSF, that kind of extensive quality testing would have been impossible. "Not only are we able to complete the tests more quickly but we were able to increase the number of tests by a factor of ten. …So not only were we running better and faster but we were able to run more. That's unprecedented." With the variety and sheer scale of the work that needed to be accomplished, high availability was a daily concern. A no-compromise, no-downtime environment was key to the project's success according Clive Dawson, manager of CAD Systems Engineering, AMD. "With designers submitting so many jobs to the clusters, that represents a lot of productivity that needed to be guaranteed and preserved."

Most importantly, the sheer scale of work was something that couldn't have been imagined in AMD's previous environment, according to Dawson. "It has really changed the entire way in which we work. We have been able to harness the full power of not only the background compute servers, but of all our desktop systems as well. LSF can detect whenever a workstation is idle (if the engineer is at lunch or in a meeting, for example) and immediately press it into service. Upon the engineer's return, LSF will detect the need for interactive response time and divert the background job to another system. This is where the competitive advantage really kicks in for us."

 


Ad Emmen

[News on Advanced IT]   [Calendar]   [Analysis]   [IT in Medicine]