Friday, December 12, 2003
Nelson Hall, Room 2403 -- NCSU Main Campus
(Driving directions and parking suggestions)
Owing to its versatile use in OLAP applications, the data cube operator has received tremendous attention in the research community and industry alike. However, the prohibitive cost (both time and space) in computing a data cube makes it a challenge to compute and store it in its entirety. To mitigate this, two main approaches that have emerged are (i) choosing a subset of views in a data cube to materialize in available space and answering queries using the materialized views, and (ii) compressing the data cube, with or without loss of information.
In this talk, I will describe a lossless compression technique inspired by concepts from lattice theory. Specifically, I will discuss the Quotient Cube, which is the factor lattice obtained when the cube lattice is "divided" by a certain equivalence relation. The idea is to represent cube cells by choosing a representative from the equivalence class they are in. In a precise sense, we can show that the quotient cube is an optimal lossless compression of a data cube. I will also describe the QC-Tree, an efficient data structure for storing a quotient cube as well as our algorithm for computing a quotient cube from a base table. QC-Trees can be used for efficient answering of queries and for efficient incremental maintenance against updates to the base table. Detailed experimental evaluations show that quotient cubes (implemented via QC-Trees) enjoy a substantial reduction in the cube size while at the same time enabling fast and scalable construction, query answering, and incremental maintenance performance.
This is joint work with Jiawei Han, Jian Pei, and Yan Zhao.
About the speaker: Laks V.S. Lakshmanan obtained his Bachelor of Engineering (1981) in Electronics and Communications from the A.C. College of Engineering and Technology, Karaikudi, India, and his Master of Engineering (1983) and Ph.D. (1987) in Computer Science from the Indian Institute of Science, Bangalore, India. His research interests span a wide spectrum of topics in database systems and related areas, including relational and object-oriented databases, advanced data models for novel applications, OLAP and data warehousing, database mining, data integration, semi-structured data and XML, directory-enabled networks, and querying the WWW. A common theme underlying his research is to model problems not traditionally viewed as standard database problems and bring database technology to bear on them, thus pushing the frontiers of database technology. (For more details, please visit this page.)