Ph.D. Thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, July, 2003.
Available formats: postscript
This thesis addresses the above problems by taking an approach that is different from most previous research on parallel I/O. Instead of improving the actual data transfer rate, we strive to hide the high I/O costs from the application's point of view by maximizing the overlap between I/O and other tasks. Our approaches take advantage of the I/O operations' periodicity, scientific codes' specific I/O semantic requirements, and the existence of idle system resources. To improve the apparent performance of periodic I/O operations, we present several novel techniques for application level buffering and prefetching. To serve the I/O needs of large-scale, complex applications, we show how to incorporate the above mentioned I/O performance optimizations into existing parallel I/O libraries for simulations and into a new general-purpose data management facility that we created for visualization applications.
We evaluated our proposed I/O optimizations, namely active buffering and the GODIVA framework, with both synthetic benchmarks and real-world applications, including simulations and visualization tools. The performance study shows that our proposed techniques can significantly reduce the application-visible periodic I/O cost. Further, our experience of deploying these techniques in state-of-the-art simulation and visualization codes demonstrates that with careful design, our techniques for hiding I/O costs can be organically combined with mechanisms for performing adaptive performance optimization, application self-configuration, and effective data management.