The Protein Data Bank (PDB) has received a fresh management. Three American institutions have formed a consortium, called the Research Collaboratory for Structural Bioinformatics (RCSB), and handed in a vast proposal to operate and extend the PDB as a scientific tool, in order to unveil the enigma of biological systems in pharmaceutical and medical research. The triumvirate consisting of Rutgers, the State University of New Jersey; the University of California at San Diego (UCSD); and the National Institute of Standards and Technology (NIST), have obtained a $10 million grant for five years to explore new mechanisms for fast sequence and molecular structure investigation.
The very first origins of the Protein Data Bank date back to 1971, when it was still maintained at Brookhaven National Laboratory as a huge repository of data. At present, the US National Science Foundation (NSF), the Department of Energy, and two units of the National Institutes of Health, namely the National Institute of General Medical Sciences and the National Library of Medicine, have decided to award the RCSB plans with regard to the PDB because they were impressed with the technical merit and with the detailed scheme for monitoring the database across the three participating sites. Improved capabilities for queries and for content of depositions will be deployed subsequently.
The RCSB partners have drawn up a list of priorities to achieve a faster and more reliable data processing as well as a higher scalability of the PDB. The measures undertaken will result in a more efficient data throughput and a larger query capacity. The PDB will be able to perform more complex and accurate queries. Researchers will benefit from both a uniform archive and a dynamic cross-link to other existing databases. Scientists will be offered advanced possibilities for structure validation and sequence neighbouring while receiving a report at their request. The RCSB consortium has already showcased a few tools on their Web site, which enable biologists to execute queries to search several databases simultaneously.
Each participating institution fulfils the responsibilities which correspond with its proper expertise regarding data deposition and processing, and database query, integration and uniformity. The PDB therefore will be stored at the three RCSB locations. The database will also be mirrored at major sites around the world, particularly in Europe and the Pacific Rim. The RCSB members have built a vast experience in data validation, modelling and the development of query languages and visualization tools. At this moment, the group maintains 11 structural biology databases which are publicly accessible. The RCSB members also benefit from the advantages in the National Partnership for Advanced Computational Infrastructure.
The PDB upgrading is led by three principal investigators of Rutgers, UCSD and NIST. In 1971, Helen Berman was involved in the team that created the Protein Data Bank at Brookhaven. At Rutgers, as a professor of chemistry, she manages the Nucleic Acid Database (NDB) for both the collection and distribution of structural information, as well as the atlas, archive and the advanced search engine for data access. At UCSD's Supercomputer Center, Phil Bourne is responsible for the Biological Data Representation and Query project. Recently, his scientific team has produced a database of structure comparisons for over 8.000 structures in the PDB. In addition, the UCSD group has built a series of databases, consisting of derived data on protein structures, and equally maintains a mirror site of the NDB.
At NIST, Gary L. Gilliland, who is Chief of the Biotechnology Division in the Chemical Science and Technology Laboratory, supervises the establishment of data uniformity in the old and new PDB structures. For about 20 years, this scientist has led a research programme in protein crystallography. He also played an essential role in the foundation of the Center for Advanced Research in Biotechnology (CARB). Berman, Bourne and Gilliland will turn the PDB into a powerful instrument to discover the 3D structures of proteins and their relationship to biological functions. This kind of research will help the pharmaceutical industry to develop effective new drugs with little or no side effects and to reveal the origins of human disease.
If detailed information on the atomic structure of complex biological macro-molecules can be detected in the PDB, it is possible to unlock the elements which might cause the disease as well as the potential that can fight it. Via the World Wide Web, researchers will be able to submit the most complex queries to the PDB in order to obtain fast and reliable reports for their future work. The RCSB Web site is hosting a source of useful information on the Protein Data Bank. In the October issue, VMW already dedicated an article to the work of the Phil Bourne team at UCSD with regard to the Biological Data Representation and Query initiative.