The recent HPCN'99 event also provided a conference track on the use of high performance computing and networking in medical applications. Dr. Brian Tierney from the Lawrence Berkeley National Laboratory, based in California, brought an account of the numerous advantages of a distributed parallel storage server (DPSS) for high speed data cache. The highly scalable architecture plays a key role in the caching of large amounts of data that can potentially be accessed by many different users. Such a kind of distributed storage allows hospitals to offer their health care professionals remote and location independent access to high volume patient image data for diagnostic purposes, made available by tertiary imaging facilities.
In the San Francisco Bay Area, some two or three hospitals have a catheterization facility. By means of BAGNet, a shared, metropolitan area IP-over-ATM network operating at a speed of 155 Mbit/sec., as well as a high speed distributed data handling system, video sequences are collected from a video-angiography imaging system and sent to the Berkeley cache. There, the data is processed, catalogued, stored, and provided to remote clinics in near real time. As Dr. Tierney explained to the audience, cardio-angiography data was collected within the Kaiser project directly from a Philips scanner by a computer system in the Kaiser Hospital Cardiac Catheterization Laboratory.
The system, attached to the ATM-network, performs the data collection every 20-40 minutes to send 500 to 1000 megabytes of digital imaging data across BAGNet to the Berkeley Laboratory. Here, the sequences are stored on the DPSS distributed cache. The Kaiser project team also built WALDO, a Wide-Area Large-Data-Object system, serving as a digital data archive, optimized to handle real time data. WALDO deals with the management of the data on mass storage systems (MSS), the definition of large data objects (LDOs), and the way to access them. Via BAGNet, the WALDO object definitions can be generated and provided to physicians in other Kaiser hospitals. The auxiliary processing and archiving to one or several storage systems runs in a fully independent manner for 8 to 10 hours a day.
The depositing of medical data in a cache constitutes a major advantage to hospitals which do not dispose of the ideal environment to maintain a large scale digital storage system. As such, data intensive distributed computing provides the pre-eminent tool for the flexible management of on-line storage resources to support initial caching of data, their processes, and interfaces to tertiary storage, thus exhibiting substantial economics. Fault tolerance and load balancing form an integral part of the DPSS approach. In fact, the DPSS architecture provides the functionality of a single very large, random access, data block oriented "virtual disk" with very high capacity, able to isolate the application from tertiary storage systems and instrument data sources.
The Berkeley DPSS consists of four Unix workstations with 4 to 6 ultra SCSI disks on two SCSI host adaptors. Its high performance is obtained through parallel operation of several independent, network-based components. The processing can be performed whether at the computing centre site or at the user's site. Performance and operation monitoring tools have been built into the storage system, in order to tackle the unpredictable network-generated problems. Naturally, DPSS is equally applied in other scientific applications, including atomic particle accelerator detection. For more information on the DPSS architecture, we kindly refer you to the Web site of the Lawrence Berkeley National Laboratory.