The Data Capacitor, so named by analogy with the electric circuit element used to store charge temporarily, is IU's most recent step in developing and implementing technology to help researchers deal with the data deluge, the massive amounts of data being generated by advanced digital instruments. Many types of data analyses and simulations involve the creation of a temporary data set several times larger than the data set being analysed. This means that the size of data sets that may be analysed is limited by available disk capacity. The new data capacitor will allow hundreds of TeraBytes of short-term storage to be dedicated to a single analysis.
The Data Capacitor will make a considerable impact on researchers in the life sciences, where data management challenges are particularly severe; but researchers of many disciplines will be better able to draw from their data the information and meaning it contains. New insights are sure to result form the ability of scientists to better analyse larger data sets than can be feasibly manipulated today.
"The Data Capacitor Project is a major step in Indiana University's ongoing effort to be a leader in the arena of data-intensive computing", stated Michael A. McRobbie, Vice President for Research and Information Technology, "and it is a unique addition to national efforts to develop a 21st century cyberinfrastructure."
The astronomy department at Indiana University is already waiting to employ the Data Capacitor. Astronomy professor Catherine A. Pilachowski explained: "Indiana University is a partner on the WIYN Telescope in Kitt Peak, Arizona. The observatory there is building a digital camera that will capture more than one billion pixels per image. Formerly, processing and serving this kind of data wasn't plausible. The Data Capacitor is the perfect tool. It's what we needed. It puts us in the big league of institutions that are defining the future of science."
As well as astronomy, Data Capacitor researchers are in bioinformatics, x-ray crystallography, proteomics, high energy physics, library sciences and informatics, computer science, and UITS. The success of this project will depend critically on research, development, and deployment of software technologies by IU's computer scientists, including Beth Plale, Minaxi Gupta, and Randall Bramley.
The Data Capacitor is funded by the National Science Foundation's Major Research Instrumentation (MRI) Program, the same programme that provided $1,8 million in 2001 for Indiana University's Analysis and Visualization of Instrument-Driven Data (AVIDD) System. AVIDD was the first important step for Indiana University in the area of data-centric computing, enabling new insights in chemistry, physics and life sciences.
Rita Rodriguez, programme officer for Computing Research Infrastructure at the NSF, stated: "The National Science Foundation has great expectations for IU's success in a much needed area of research."
The Data Capacitor is also expected to play a role in the TeraGrid, the NSF funded national project to build the world's largest, most comprehensive Grid computing cyberinfrastructure for open scientific research. The Data Capacitor will provide TeraGrid researchers - IU, national and international - with a unique facility for temporary data storage. Craig Stewart describes the Data Capacitor as "a new idea in cyberinfrastructure, a system that provides massive, fast short-term storage, and nothing but that".
The Data Capacitor will connect to IU's existing cyberinfrastructure via I-Light, the high-performance, optical-fiber network linking Indiana University Bloomington, Purdue University, and Indiana University-Purdue University Indianapolis (IUPUI). The Data Capacitor will likewise connect to the TeraGrid via I-light. IU researchers hope that the Data Capacitor will prove to be an important and valuable component enabling new computer science developments and new discoveries in life sciences, physics, and astronomy.
The Data Capacitor will have more than a quarter of a petabyte of spinning disk, and will be capable of reading or writing data at a rate of more than 500 GB per minute. There will be multiple network connections to the IU network, the statewide I-light network, owned jointly by Indiana and Purdue Universities, and to the TeraGrid.
The Data Capacitor is a device of significant scale, and it addresses problems that emerge in large-scale cyberinfrastructure. The need for the Data Capacitor emerges from the interconnections of supercomputers, advanced instruments, massive data storage systems, and immersive visualization devices via high speed networks. It is when all of these components are linked that the need for such massive short-term storage of data arises.
One example of the use of the Data Capacitor is that of temporarily holding data created at very high rates by advanced digital instruments. One part of the project involves installing a dedicated link to move data from an experimental spectrometer in the laboratory of Dr. David Clemmer, IU Department of Chemistry, to the Data Capacitor at a rate that is ten times the bandwidth of most buildings on the IU Bloomington campus.
Other needs for the Data Capacitor arise from application work flows arising in Grid computing, mismatches in I/O characteristics of two components in sequence that causes a need for intermediate storage of data, or applications such as Monte Carlo simulations in which a very large amount of data is created temporarily, summarized, and only the summary results are needed.
Craig A. Stewart, Associate Vice President for research and academic computing, is the principal investigator (PI) for the project. Co-investigators are Randall Bramley (Computer Science), Thomas J. Hacker (UITS), Catherine A. Pilachowski (Astronomy) and Beth Plale (Computer Science). The Senior Investigators associated with this grant include dozens of IU's most distinguished researchers, including Geoffrey C. Fox, Director of the Community Grid Lab within the Pervasive Technology Labs of Indiana University. The Data Capacitor grant will facilitate Dr. Fox's efforts to deliver video conferencing and distance education services to Native Americans. The total cost of the three-year project is $2.678.938, of which $1.720.000 will be provided by the NSF, and the balance by Indiana University.