San Diego Supercomputer Center to release Molecular Biology Toolkit for protein visualization and analysis

San Diego 19 February 2004Researchers at the San Diego Supercomputer Center (SDSC) have released the Molecular Biology Toolkit (MBT), a set of Java-based software libraries for manipulating, analysing, and visualizing information about proteins, DNA, and RNA. This first major release of the MBT runs under the Linux, Windows, Mac OS X, and IRIX operating systems, another advantage, since very few off-the-shelf packages enable applications to run seamlessly on several different computer platforms. The MBT includes source code, example applications, a Programmer's Guide, an Application Programme Interface (API) document, a Build Guide, and a Binary Installation Guide.


"We embarked on the MBT project because there were few if any well-documented and easy-to-use developer's toolkits to enable scientists to create custom molecular biology visualization and analysis applications", stated Professor Philip E. Bourne of the University of California, San Diego, Science Advisor to SDSC and the principal investigator on the toolkit development effort. "A number of very powerful, well-developed, and popular stand-alone applications exist for visualization and analysis of protein data, but the MBT is for researchers who want to roll their own applications using a variety of biological data."

The toolkit provides Java classes for efficiently loading, managing, and manipulating protein structure and sequence data. The MBT provides a rich set of graphical 3D and 2D visualization modules that can be plugged together to produce applications that have sophisticated graphical user interfaces. But the core data I/O and manipulation classes also can be used to write completely non-graphical applications, to implement pure analysis codes, for example, or to produce a non-graphical back end for Web-based applications.

Many major biological research resources, including the Protein Data Bank deliver their data via the Web. "Since this project was undertaken to initially support the structural genomics community, we had the design goal of creating a toolkit that could deliver applications for the Web resources operated by this community", Professor Bourne stated. "The MBT makes possible the transparent access of protein data from a Web site via the Internet and will provide interactive database query capability using visual cues."

The toolkit provides the capability to load molecular data from a number of sources, including files of types PDB, mmCIF, and FASTA. These file types can be read from local disk or from an HTTP or FTP server. This distribution of the MBT provides several StructureLoader implementations to read common data formats, although researchers also can write and register their own custom loaders for the toolkit as if they were built in. Most of the provided StructureLoader implementations also can read from files compressed in the "zip" or "gzip" formats.

The MBT makes possible new methods of interactive visualization of complex scientific data. While most existing methods of representing scientific data are static and two-dimensional, the MBT's visualization capabilities provide interactive, three-dimensional environments within which multiple users can examine complex datasets in real time. The distribution supplies a 3D structure viewer, a 2D sequence viewer, and a hierarchical tree viewer; users also can write and plug in their own viewers.

"The structure viewer provides high quality, interactive visualization of molecular scenes", stated John Moreland of SDSC, the technical lead on the MBT project. "It's written in Java and Java3D, so it's portable and Web-deliverable."

Each active viewer will automatically receive synchronized events from the toolkit in such a way that state changes will be reflected across all viewers. This is important because the ability to interactively view correspondences between different visual representations of the data can enable researchers to see patterns and to make correlations in the data that otherwise might not be noticed. For example, if a user selects certain data in one viewer, other viewers also will respond to that selection. The highlighted regions in one view may give insight as to how the corresponding regions in another viewer relate to one another.

The Molecular Biology Toolkit includes pre-written applications, two at present, with more to come. Some may find these programmes useful as is. They also can be used as examples and starting points when writing custom applications. The SimpleViewer programme is a basic 3D structure viewing application. It offers a quick and easy means of viewing a molecule. There is very little fuss involved in using it, but the programme has relatively few features. The MBT Explorer programme is a visualization application. It offers a more complete set of molecule visualization capabilities than the SimpleViewer programme.

The Molecular Biology Toolkit is a very flexible software base upon which extensions can be built. The toolkit's StructureDocument class enables toolkit-wide events - changes to raw data or application state - to be shared among any number of plug-in event viewer objects. In fact, since each viewer has complete and equal access to all active data sets, plug-in viewer objects have the same access to events and data as built-in toolkit components.

Development team members and beta testers of the MBT have had a number of ideas for extensions, some of which are in active development. The MBT development team consists of Philip E. Bourne, principal investigator; John L. Moreland, project technical lead and toolkit co-developer; and Apostol Gramada, toolkit co-developer, all at SDSC or UCSD. Collaborators and application developers include Sasha Buzko, Wayne Townsend-Merino, Douglas S. Greer, John Tate, and Cindy Zhang of SDSC and/or UCSD, and Paul Craig of the Rochester Institute of Technology.

The MBT project was funded as part of the National Institutes of Health PPG grant number 1-P01-GM63208 and its National Institute of General Medical Sciences (NIGMS) division. The project is administered and supported by the San Diego Supercomputer Center.

Leslie Versweyveld

[Medical IT News][Calendar][Virtual Medical Worlds Community][News on Advanced IT]