Declustering Large Multidimensional Data Sets for Range Queries over Heterogeneous Disks
J. Lee and M. Winslett and X. Ma and S. Yu
Proceedings of the 15th International Conference on Scientific and Statistical Database Management, Cambridge, Massachusetts, July, 2003.
Available format:
postscript
Abstract:
Declustering is a technique to distribute data sets over multiple disks
so that future retrievals can be well balanced over the disks
and be performed in parallel.
Although clusters often have heterogeneous disks,
most declustering work has focused only on homogeneous environments.
In this work, we investigate the declustering problem
for a heterogeneous disk environment using virtual servers,
and propose novel approaches for deciding the number of virtual servers
and the mapping between virtual servers and physical disks.
Our experimental results show that
by combining our algorithm for choosing the number of virtual servers
with a greedy algorithm for mapping virtual servers to disks,
users can expect range query retrieval performance
within 4% of the optimum achievable in practice on average,
in all configurations studied.
Compared to an intuitively natural approach to the problem,
this represents an improvement of 8-31% in average fetch ratio,
as well a 26-38% reduction in the standard deviation of performance for
small queries.