The vision of John Taylor, who managed to receive funding for the British e-Science Programme, has inspired Tony Hey to address the benefit of Grid computing for e-Science. John Taylor, Director General of the UK Research Councils Office of Science and Technology, defines e-Science as basically being "about global collaboration in key areas of science, and the next generation of infrastructure that will enable it". Funding for the first phase amounted to 200 million euro and for the second phase of the e-Science Programme, another 200 million euro will be reserved.
A great number of projects have been launched in different scientific areas. The infrastructure to enable researchers to do their work is what Tony Hey calls the Grid, a very broad definition which goes beyond high performance computing and initiatives such as the popular SETI@home. In the first programme phase, most of the funding money was spent on application projects in all areas of science engineering.
Another 35 million euro, which is made up of 15 million euro from the Research Councils, is dedicated to the Core Programme which has to support middleware requirements generated by the R&D projects and make it more robust with support of the industry. The other 20 million euro are granted by the Department of Trade and Industry to work with the industry. Sixty companies are engaged in the programme and an additional 30 million euro is contributed by the companies, according to the speaker. Therefore, a Grid Outreach programme has been launched.
The programme which Tony Hey in particular is helping to fund is the supporting infrastructure for a UK e-Science Grid with the help of ten universities and two labs that agreed to use the same software to pull resources to make it possible to use this. The programme is moving to level two by Easter now, heading for a full production Grid. Building Grids is difficult, as Tony Hey explained, since there are not only the security and firewall policies of the research groups, but also of the computing centre directors and the networks to take into account.
A first project described by Tony Hey is the Comb-e-Chem Web lab where high speed high-throughput large numbers of molecules are produced at once, screened and sent to an X-ray physiology lab to remotely perform simulation to determine the structure and access databases for analyses. This e-Lab is supported by Globus. The myGrid project is more than accessing relational databases since it addresses complex and interrelated data such as arrays and micro-arrays, what the genes do, the structure of the proteins, and their matching with the own gene-profile. In this regard, myGrid handles a huge amount of activity.
The Discovery Net project is not specifically biomedical but is about high throughput devices and desequencing. This project has sensors all over London to measure the air quality. Discovery Net starts from scientific information in literature, relational and operational databases, instrument data and images to move to scientific discovery through real time integration, work flow construction, dynamic application integration and interactive visual analysis by using distributed resources. The objective is to create a Grid-enabled knowledge discovery platform based on open services standards to make use of Grid technology for high performance and distributed computing. The service-based implementation allows easy construction and deployment of new composed knowledge discovery services.
The Discovery Net project team has the focus on Discovery Process Management via discovery pathways and service composition which is intellectual property in fact: discovery processes can be stored, reused, audited, refined and deployed in various forms. By means of knowledge and execution servers, the team dynamically integrates cluster classification tools for text analysis and gene function prediction. At Supercomputing 2002 in Baltimore, the team won the high performance computing Challenge Award for their case study on high throughput global wide knowledge discovery services. They ran 21 applications within 15 databases. Three weeks of work were performed in one work flow and a few seconds of execution.
Tony Hey also presented a number of e-Health projects. e-Diamond, supported by IBM, deals with breast cancer detection ranging from training and differential diagnosis, standard mammogram formats, teleradiology and advanced CAD to epidemiology issues. The goal is to develop a prototype service to digitise mammograms in geographically distributed hospitals. Telemedicine via AccessGrid is a little project at the Cambridge e-Science Centre with Siemens Health Care, MacMillan Cancer Relief and a number of hospitals. The West Anglia Cancer Network has various sites across the country with different specialisms. Remote access to patient images can spare the specialists the effort of wasting time with travelling.
The Clinical e-Science Framework or CLEF project is involved with clinical information integration and has IT companies; health service, pharmaceutical and media players as well as universities in its team. CLEF includes clinical research, evidence-based health care and the clinical application of genetic and genomic research. The project captures, integrates and presents descriptive information about clinical histories, radiology and pathology reports, annotations on genomic and image databases, technical literature, and Web-based resources.
One of the demonstration projects to convince the minister of health and the politicians about the relevance and use of the technology was the Dynamic Brain Atlas. Derek Hill of King's College and Guy's Hospital has developed the IXI project about image guidance, surgical intervention and breast cancer, all with the aim to look at using medical images. The operating surgeon is able to view images of the simulated tumour superposed on the operative field to guide him during the intervention. Another application is hip or knee replacement and corrective post-operative surgery because of the wearing out of the prosthesis.
The accuracy of surgical placement is compared against the original plan as follows. The surgeon plans on X-ray or CT and uses a database of prostheses. The operation takes place using the plan as a guidance. The post-operative X-ray is evaluated for accuracy of placement. The data is stored and used for short term assessment and long term evaluation studies.
The MIAKTS project was started six months ago and is about knowledge technology and ontologies, as shown by Mr. Hey, and offers support for multi-disciplinary collaborative environments. It involves the triple assessment of breast cancer patients by surgeons, radiologists, pathologists, oncologists and nurses through the use of images.
MARIBS also deals with breast cancer using magnetic resonance imaging for breast screening. Breast cancer in women under forty is difficult to detect because the breast tissue is too dense. MARIBS tries to find out whether MRI is an effective way of screening young women at high risk of breast cancer. Seventeen centres in the United Kingdom which are associated with other large trials in Europe and Canada are part of the team that is led by the Institute of Cancer Research. The study is funded by the Medical Research Council and the National Health Service. The techniques applied are image enhancement through complex processing and modelling. Hospitals need a lot of computing power but do not wish to run supercomputers of their own. Instead, they prefer to outsource these services and here is where the Grid comes in.
Tony Hey stressed that within the e-Health projects, many end-users are involved such as consulting physicians, planning and executing surgeons, the general practitioner, the radiographer or technologist, the pharmaceutical and medical device industries, universities, the National Health Service, government medical researchers, regulatory bodies and monitoring agencies.
Ian Foster, Carl Kesselman and Steve Tueke have defined the Grid as "a software infrastructure that enables flexible, secure, co-ordinated resource sharing among dynamic collections of individuals, institutions and resources and this infrastructure includes computer systems and data storage resources and specialised facilities." In fact, the Grid serves as middleware layer for enabling transient "Virtual Organisations", the interoperability of IT systems. Tony Hey described the technology on which the Grids are built as Web services, "self-contained, self-describing, modular applications that can be published, located and invoked across the Web". One concrete example is CORBA. This is the IT industry's current "magic bullet" for Internet-scale distributed computing.
In practice, service providers will register their offers and service requestors such as hospitals can walk through the "yellow pages", select the services that correspond to their individual needs, and negotiate the different prices. These are Web services in action. The Global Grid Forum has launched the Open Grid Services Architecture (OGSA). This is a development of Web services which are nothing else than virtualised resources. How they are implemented is up to the user. On top of this, it is possible to build higher level services such as work flow, transactions, datamining, knowledge discovery, and so on.
The ultimate vision is to create synergy between the commercial Internet (WWW) and academic Grid services by developing open source tools and standards in order to deploy an open source university Grid type version versus the heavy weight industrial version. In this regard, the evolution of W3C is very important. What we need is software that works and is reliable, according to Tony Hey. The IBM vision of the Grid - time against dollars - consists in a scala of Grid types, from SETI@home applications over on-line gaming to virtual Grid organisations with dynamic access to unlimited resources. Their symbol is the lizard because of the e-Liza concept of autonomic, intelligent, self-configuring, self-protecting, self-healing computing. If the lizard breaks its tail, it grows back again.
John Manley from HP Labs considers the Grid fabric for e-Utilities as soft and malleable, with a multi-purpose structure; dynamic with resources that will be constantly changing; federated with a global structure not owned by any single authority; and heterogeneous, ranging from supercomputer clusters to PCs. The Grid definition by Sun claims that "Grid computing is one of the three next big things for Sun and its customers", whereas Microsoft stated that "the alignment of OGSA with XML Web services is important because it will make Internet-scale, distributed Grid Computing possible".
In September 2002, Tony Hey had a debate with IBM and Microsoft and asked them when their first commercial Grid applications would be a reality. They both answered within twelve months from then. In the meanwhile, early adopters of Grid technology are coming from pharmaceutical, engineering and petrochemical sectors. The UK programme confirms this picture with players like AstraZeneca, GSK, Merck, Pfizer, Rolls Royce, BAESystems, and Schlumberger. IBM sees Grid middleware being adopted by more mainstream commerce and industry in a 2003-2004 timeframe.
Tony Hey concluded that Grid is not so much about computation, which after all is already there, but about data federation and integration. He thinks that metadata and ontologies will be key to higher level Grid services and that the e-Science projects will produce a deluge of scientific data within the next five years that will need to be annotated and curated in scientific data "digital libraries". Not only the data is important, also the software environment has to remain available in the years to come. So, we have to be careful in handling digital information.
Tony Blair, in 2002, defined the Grid as a resource "intending to make access to computing power, scientific data repositories and experimental facilities as easy as the Web makes access to information". This is exactly what Tony Hey likes to accomplish in the next phase of the UK e-Science Programme: to make a Grid middleware stack that can be used by biologists, medics, chemists, etc. and not only by computer experts. More information can be found on the Grid Outreach Web page.