Metacomputing environments constitute a vital service which is much sought after in the pharmaceutical industry to enhance both the scope and speed of the drug target discovery process. The Paderborn Center for Parallel Computing co-ordinates a team of scientific researchers and industrial end-users in PHASE, a two-year Esprit funded project which started in February 1997, to develop a distributed high-performance application server for the execution of compute-intensive bio-informatic codes through Inter- and/or Intranet connections. The identification of new proteins requires much less time when integrated in a job load balancing scheme in which the available computational resources are assigned to the various partitions of the task. The server architecture is also very suitable for other than pharmaceutical implementations and supports a multi-level built-in security to protect the user's privacy.
The PHASE environment includes four major key applications, which cover automatic functional sequence annotation, sequence analysis and 3D protein structure prediction and comparison tools. GeneQuiz, MaxHom, DRAGON and MSAP are well-known bio-informatics applications. In the PHASE project, they are adapted for use on parallel platforms in order to improve their execution speed and to allow for faster, nearly interactive usage.
The user can access the PHASE homepage with a simple Web browser to submit a query sequence in order to evaluate and reuse the results. The following example is a possible scenario of PHASE usage. When the user activates the start link on the PHASE home page, the Web browser loads the Java applet which implements the PHASE graphical user interface. The user is able to submit a query sequence to GeneQuiz to search for a series of homologue structures. As a result, these structures can be used as input to MSAP to find more distant homologues. If GeneQuiz doesn't provide any appropriate outcome, MaxHom can be activated to take over the query and pass the multiple alignment of protein homologues to DRAGON to generate 3D models. Once the 3D structure is known, the user again can run MSAP to make a comparison to related structures.
To perform the various identification tasks, the PHASE architecture, instead of hosting a global management server, consists of a series of local center management servers to co-ordinate the available services. A user disposing of his own high performance computing resources, can execute highly confidential jobs locally and send less essential tasks over the Internet. Within the network of co-operating servers, each of them can receive a user request and become the request server for a specific job. After the request is broadcast among the other sites, all the offers from the bidding servers are collected. The request server selects the best solution in correspondence with the user requirements and transmits the request to the chosen center for execution.
In case of multi-site applications, the bidding servers have no details on the amount of resources to provide, since the job's partitions have not yet been allocated. This results in "fuzzy offers", which are multiple bids of both regular and convex selections of resources. The request server has to choose the most suitable subset of resources to perform the job, taking in account criteria such as the speed of both network connections and compute nodes, the availability of software applications and service providers, and the special user preferences for HPC centers. To access the HPC resources, each center management server uses a generic interface, consisting of a load and a job interface. The first one has to measure the application specific load in percent of the machine's maximum performance while the latter indicates the start and the end of the job.
A careful security mechanism is essential to guarantee the privacy in the heterogeneity of the PHASE environment. The minimum requirements are authorisation and encryption but more advanced levels are advisable, yet costly. In fact, PHASE data can be divided into three classes. The top secret information should only be dealt with by the in house resources. The no risk data needs no protection and can be transmitted as such over the Internet. The remaining category consists of data with limited degree of confidentiality. For this type of data, the PHASE project partners are developing a hierarchy of security measures in several steps. A unique user login and password form the tools for authentication and access restriction. Data encryption involves the result data as well as the client/server protocol. The graphical user interface (GUI) has been designed as a signed Java applet to support security of the preferred browser for PHASE which is Netscape.
Looking ahead, the design of commercial software will reduce the enormous costs of security in the very near future. The open PHASE server architecture is prepared to build in more advanced security features which could be based on Secure Sockets Layer (SSL), Pretty Good Privacy (PGP) or even more complex tools similar to Kerberos or DCE. Until then, the Intranet server installation allows the companies to efficiently analyse their confidential data in-house while optimally exploiting their available network resources. Please consult the PHASE Web site for more details on the project.