Friday, January 30, 2004
EGRC 313 -- NCSU Centennial Campus
(Driving directions and parking suggestions)
One of most well-studied problems in data mining is computing the collection of frequent item sets in large transactional databases. One obstacle in the applicability of frequent set mining, often mentioned by data miners, is that the size of the output collection can be far too large to be carefully examined and understood by the users. Even restricting the output to the border of the frequent item-set collection does not help much to alleviate the problem.
We address the issue of overwhelmingly large output size by introducing and studying the following problem: What are the k sets that best approximate a collection of frequent item sets? Our measure of approximating a set collection by k sets, is defined to be the size of the collection covered by the span of the k sets. Depending on the input representation, we examine different problem variants for which we demonstrate the hardness of the corresponding problems and we provide polynomial-time approximation algorithms. We also give empirical evidence showing that the approximation methods work well in practice.
This is joint work with Aris Gionis and Heikki Mannila (University of Helsinki).
About the speaker: Foto Afrati is Professor of Computer Science at the National Technical University of Athens, Greece. Her research interests include database theory, database query languages, constraint databases, finite model theory, analysis of algorithms, approximation algorithms, scheduling, information integration, data and web mining.