## Approximating a Collection of Frequent Sets

Foto Afrati

National Technical University of Athens, Greece
Friday, January 30, 2004

11am

EGRC 313 -- NCSU Centennial Campus

(Driving directions and parking suggestions)

One of most well-studied problems in data mining is computing the
collection of frequent item sets in large transactional
databases.
One obstacle in the applicability of frequent set mining, often
mentioned by data miners, is that the size of the output
collection can be far too large to be carefully examined and understood by
the users.
Even restricting the output to the border of the frequent
item-set collection does not help much to alleviate the problem.

We address the issue of overwhelmingly large output
size by introducing and studying the following problem:
What are the k sets that best approximate a collection of
frequent item sets?
Our measure of approximating a set collection by k sets, is
defined to be the size of the collection *covered* by the *span*
of the k sets.
Depending on the input representation, we examine different
problem variants for which we demonstrate the hardness of the
corresponding problems and we provide polynomial-time approximation
algorithms.
We also give empirical evidence showing that the approximation
methods work well in practice.

This is joint work with Aris Gionis and Heikki Mannila (University of Helsinki).

*About the speaker:*
Foto Afrati is Professor of Computer Science at the National Technical University of Athens, Greece. Her research interests include database theory, database query languages, constraint databases, finite model theory, analysis of algorithms, approximation algorithms, scheduling, information integration, data and web mining.

Please send your comments to Rada Chirkova