Enhancing Data Migration Performance via Parallel Data Compression
J. Lee and M. Winslett and X. Ma and S. Yu
In Proceedings of the International Parallel and Distributed Processing Symposium, 2002.
Available format:
postscript(short version),
postscript(long version)
Abstract:
Scientific simulations often produce large volumes of output that are
moved to another platform for visualization or storage.
This long-distance migration is slow
due to the data size and slow network. Compression
can improve migration performance by
reducing the data size, but compression is
computation-intensive and so can raise costs. In this work, we show
how to reduce data migration cost by incorporating compression
into migration. We analyze eight scientific data sets, and
propose three approaches for parallel compression of scientific data.
Our results show that with reasonably fast processors
and typical parallel configurations, the compression cost for large
scientific data is outweighed by the performance gain
obtained by migrating less data. We found that a client-side
compression approach (CC) can improve I/O and migration performance by
an order of magnitude. In our experiments,
CC always matches or outperforms migration without
compression when we overlap migration with computation, even for
not very compressible dense floating point data. We also present a variant
of CC that is well suited for use with implementations
of two-phase I/O.