Sistina originated from a project at the University of Minnesota in the US, where the first version of GFS was developed. Since 1997, Sistina is marketing GFS. As the number indicates, it is already the fifth major release of this cluster file system.
GFS is a cluster file system that currently is available on Linux. For the major Linux distributions, like SuSE and RedHAT, out-of-box packaged versions exist. These simplify patch management.
Unlike what one might expect, GFS is not built on top of the Linux file system. It is in fact a native filesystem that talks to the kernel directly. This leads to fast access and makes the system more robust. GFS is a peer-to-peer technology. This makes it scalable, and avoids a single point of failure. GFS works with LVM, logical Volume Manager, Sistina's contribtion to open source Linux and part of the Linux 2.4 distributions.
As long as you run the same Linux distribution, it does not matter what type of hardware you have installed. It can be for instance Intel machines, blades or mainframes. They can all have access to a single enterprise file storage system. This way data can be shared enterprise-wide.
GFS also works on all kinds of network interconnects like SCSI, Fibre Channel and Myrinet. Concerning storage systems, GFS can work also work on top of SAN systems consolidating several SAN systems into a singe view data storage.
HPC is, of course, an important market for GFS. Linux clusters do a lot of the number crunching, and data sets are often very large. An example of a large GFS user is Fermi National Accelerator Lab in the US for the Sloan Sky Survey project. This project needs high-speed multi-cpu access to terabytes of shared data for analysis. At Fermi Lab, GFS is running continuously since two years, without any data loss.
Another large GFS user is SUNY. They run a 2000 Intel dual processor node genetics supercomputer with access to 16 Tbyte of on-line storage.
Apart from HPC, and database supprt, a global file system is useful for web applications, and the emerging embedded blade NAS computing market.
In the new GFS 5.1 release, Sistina added several features that speed up data access or ease management. Direct Connect is a new feature that creates a direct channel between an application and the data. This is useful for database applications. Database systems use their own caching.
Context dependent path names allow to create shared directories. This way one can share a common booth directory or a home directory. This considerably simplifies installing software upgrades, for instance.
Quotas, well-known from standard Unix file systems, can now also be used cluster wide with GFS.
An emerging trend that is of importance today, is blade computing. Cluster companies can package inexpensive computing blades with SAN storage and GFS in a rack, designed specifically for one's customer needs. Sistina sees this as an important market.
Sistina sees GFS as a Grid enabler. It is a way that thousands of servers could share data. Data replicating mechanisms do not work on that scale anymore. But just like companies providing Computational Grid software, like SUN GridEngine and Platform LSF, Sistina is moving up to scale from department to, currently, enterprise scale, leaving the global Grid for the near future.