Currently, there are several smaller Cray T3E systems in northern Germany
for program development and one large T3E at ZIB
in Berlin. The large supercomputer is oversubscribed and thus
urgently in need of replacement.
According to Prof. dr. Alexander Reinefeld, director of
ZIB Computer Science, the
German Science Council who approved the project was very pleased to
see
several states coming together to try to solve the computing needs of
their researchers.
Reinefeld explains that the machine will be distributed over two
sites: On part will be in Berlin and the other in Hannover.
The two parts will be connected by a dedicated connection with a
bandwidth of 2.4 Gbit/s or more.
For the access to remote disks, this is an acceptable performance.
Only the expected latency of 6-8 msec needs special treatment for the
efficient execution of multi-site applications utilizing both system
components at the same time.
In Germany, there is, however, already much
experience with this type of "metacomputing" applications
that run over several supercomputers at different cities, even
different continents. One example is the
Cactus code, which started as a code for supporting astrophysical
applications, but now is also used for
other types of applications. Also large commercial chemistry code such as
GAMESS-UK can run over the distributed constellation.
The operating of the machine will be transparent to the user. From
each side, it will look like one big machine
to which users can submit jobs to queues, that are different for the
size of memory, computing time and priority, but
not for the installation in Hannover or Berlin. According to Alexander Reinefeld, this
poses a challenge to the vendors: "They
just cannot get away with clustered resource management systems, but
should offer us a system that
really provides a single resource to the users."
In a way the machine is like the ancient Roman god Janus: the god of
gates and doors. On coins he is depicted with two heads: looking
the same from each side.
The two computer centres are responsible for the day-to-day operation
and maintenance of their parts of the
machine. Which researcher will be allowed on the machine will be
decided by a scientific committee, that
will look at the projects from a scientific and computational
requirements point of view. Representatives
from all six states will be present in the commission.
It is expected that several universities and research centres in the
north will install "baby-supercomputers"
with an architecture that matches the new large machine. These
baby-supers will be used to experiment and develop code
for the big super.
The exact architecture of the machine has not yet been decided. It
depends on the offers of the vendors. But in
any case, it will consist of closely coupled SMP's of high-performance processors.
Reinefeld said that the consortium plans to issue a
Request-for-Proposals early next year, as a first step in the formal European
Tender process which is required in the European Union. It is
expected that the process will be finished with
the installation of the new machine late autumn 2001.
Computer prices are, unfortunately, very dependent on the US dollar.
Hence, it is a pity that the euro
is not doing too well compared to the dollar. Basically, there is a
linear dependency of the final
size of the new machine on the euro-dollar exchange rate.
As recommended by the German Science Council,
the consortium plans to submit a proposal for an even
larger 10 Tflop/s machine in
the 2004 timeframe and to become a Federal supercomputer centre, just
like LRZ in Munich or the HLRS Stuttgart.
But first they want to demonstrate that they can manage such a
complex constellation of six states
working together and a machine distributed over two sites.
For more information check, for instance, the HLRN web site.