Sistina's GFS file system is a Grid enabler

Minneapolis 11 November 2002 In Primeur/EnterTheGrid magazine, we report a lot about fast machines and applications in the Tflop/s range. What gets less attention is storage, and software storage. Nevertheless it is the storage that holds the data to be processed and it is the storage that takes back the results. Access to Tbytes and even Pbytes of data is not uncommon today. Managing storage is, however, not easy. Often one finds that just the acquisition of a storage array is not where it ends. One can easily spend as much as 6 times the acquisition amount on people who manage the storage over a three year period. Real global seamless storage management on the Grid is not feasible yet. There are projects, for instance the EU funded DataGrid project, that try to tackle these big problems. For companies that are doing Grid computing on an enterprise scale, mainly cluster computing, a cluster file system manageable from a single point, already simplifies "Enterprise Grid" management. One of the companies that provide a cluster file system is Sistina, that recently released GFS 5.1, the latest version of their Global File System.

Sistina originated from a project at the University of Minnesota in the US, where the first version of GFS was developed. Since 1997, Sistina is marketing GFS. As the number indicates, it is already the fifth major release of this cluster file system.

GFS is a cluster file system that currently is available on Linux. For the major Linux distributions, like SuSE and RedHAT, out-of-box packaged versions exist. These simplify patch management.

Unlike what one might expect, GFS is not built on top of the Linux file system. It is in fact a native filesystem that talks to the kernel directly. This leads to fast access and makes the system more robust. GFS is a peer-to-peer technology. This makes it scalable, and avoids a single point of failure. GFS works with LVM, logical Volume Manager, Sistina's contribtion to open source Linux and part of the Linux 2.4 distributions.

As long as you run the same Linux distribution, it does not matter what type of hardware you have installed. It can be for instance Intel machines, blades or mainframes. They can all have access to a single enterprise file storage system. This way data can be shared enterprise-wide.

GFS also works on all kinds of network interconnects like SCSI, Fibre Channel and Myrinet. Concerning storage systems, GFS can work also work on top of SAN systems consolidating several SAN systems into a singe view data storage.

HPC is, of course, an important market for GFS. Linux clusters do a lot of the number crunching, and data sets are often very large. An example of a large GFS user is Fermi National Accelerator Lab in the US for the Sloan Sky Survey project. This project needs high-speed multi-cpu access to terabytes of shared data for analysis. At Fermi Lab, GFS is running continuously since two years, without any data loss.

Another large GFS user is SUNY. They run a 2000 Intel dual processor node genetics supercomputer with access to 16 Tbyte of on-line storage.

Apart from HPC, and database supprt, a global file system is useful for web applications, and the emerging embedded blade NAS computing market.

In the new GFS 5.1 release, Sistina added several features that speed up data access or ease management. Direct Connect is a new feature that creates a direct channel between an application and the data. This is useful for database applications. Database systems use their own caching.

Context dependent path names allow to create shared directories. This way one can share a common booth directory or a home directory. This considerably simplifies installing software upgrades, for instance.

Quotas, well-known from standard Unix file systems, can now also be used cluster wide with GFS.

An emerging trend that is of importance today, is blade computing. Cluster companies can package inexpensive computing blades with SAN storage and GFS in a rack, designed specifically for one's customer needs. Sistina sees this as an important market.

Sistina sees GFS as a Grid enabler. It is a way that thousands of servers could share data. Data replicating mechanisms do not work on that scale anymore. But just like companies providing Computational Grid software, like SUN GridEngine and Platform LSF, Sistina is moving up to scale from department to, currently, enterprise scale, leaving the global Grid for the near future.


Ad Emmen

[News on Advanced IT]   [Calendar]   [Analysis]   [IT in Medicine]