The Research Collaboratory for Structural Bioinformatics (RCSB) has taken over full responsibility for the Protein Data Bank (PDB) from the Brookhaven National Laboratory (BNL) on July 1st 1999, some three months ahead of the official schedule. This progress was made possible because all aspects of the project involving the RCSB, from the deposition of structural data, through query and distribution, to long term archival and clean up of original data have proceeded smoothly, thanks to the excellent co-operation between the RCSB and BNL staffs. Currently, different mirror sites of the primary archive are available at Rutgers and NIST, The National Institute of Standards and Technology. Sites in Europe and Asia are planned as well.
The PDB, the most important repository of protein structures in the world, is using powerful SGI Origin 2000 and SGI Origin 200 servers for primary data acquisition and processing. The PDB is managed by RCSB, a non-profit consortium comprised of Rutgers University in New Jersey, San Diego Supercomputer Center and NIST, The National Institute of Standards and Technology. This invaluable collection of data is the major international repository for processing and distribution of 3D macro-molecular structure data, which will expand the scientific knowledge of proteins, and ultimately accelerate new drug discovery.
One protein structure can take a researcher years to determine. The PDB validation software enables scientists to analyse their work at any stage through various reports and images. After the structure is determined, the scientist will deposit the experimental data and coordinates to the RCSB PDB, where it is processed and released into the PDB archives. The data is deposited and processed at Rutgers University using software running on the Origin 2000 and Origin 200 systems. It is of paramount importance that this vital information is not lost and that it is made fully accessible to the global scientific community.
Based on the revolutionary ccNUMA architecture the Origin 2000 and Origin 200 systems provide the highest level of performance and reliability available to the scientific community. Both systems are able to ensure a secure and stable storage environment. The highly scalable structure is uniquely suited to accommodate expansion as a larger number of scientists will continue to deposit 3D structures into the database. The RCSB PDB is being supported by funds from the National Science Foundation, The Office of Biology and Environmental Research at the Department of Energy, and two units of the National Institutes of Health, being The National Institute of General Medical Sciences and the National Library of Medicine.
For structural data deposition, 300 files were processed by RCSB prior to the changeover of data processing responsibilities on last January 27. Since this date all files, 283 in all, have been processed by means of the AutoDep Input Tool (ADIT) developed by the RCSB. Virtually every file was fully processed, reviewed by the depositors and put into the final format within two weeks of submission. As a result, it has been possible to abandon the need for release of not-fully-processed files in favour of higher quality data. Virtually all PDB files designated for immediate distribution could be released within only one week for the majority of the files.
Of the data in the PDB prior to the changeover of responsibilities, some 2300 entries have been reprocessed and put into the latest PDB format. Since its inception, the members of the RCSB have been in collaboration with EBI, the European Bioinformatics Institute. The EBI has been accepting depositions and forwarding the data for further processing to the United States. At some time in 1999, the EBI will start with complete processing of biological macro-molecular structure data. RCSB welcomes the advent of such an endeavour and expects to continue to collaborate technically with this, as well as every other international group, to help ensure full data exchange.
For more details on the RCSB Protein Data Bank, please read the article Protein Data Bank Web site opened, RCSB distributor of new entries in the VMW April 1999 issue. You can equally pay a visit to the RCSB PDB Web site.