Current Challenges For Data Integration

Alon Y. Halevy
University of Washington

Friday, October 17, 2003
EGRC 313 -- NCSU Centennial Campus
Integration of data from multiple sources is one of the longest standing problems facing the Database and AI research communities. In addition to being a problem in large enterprises, research on this topic has been fueled by the promise of integrating data on the WWW. In the past few years, we have made very significant progress on data integration, from the conceptual and algorithmic aspects, to the systems and product aspects. This talk will briefly review our successes in data integration, and will describe some significant current challenges. In particular, I will describe peer-data management systems, a novel architecture that enables ad-hoc large scale sharing of data, and discuss recent work on the problem of trying to semi-automatically find a semantic mapping between a pair of schemas. For the latter, I describe an approach to schema matching that is based on analyzing a large corpus of database schemas and learning properties of how terms are used in database structures. The talk will discuss some work in progress, but will also highlight opportunities for future research.

About the speaker: Dr. Alon Halevy received his Bachelors degree in Computer Science and Mathematics from the Hebrew University in Jerusalem in 1988, and his Ph.D in Computer Science from Stanford University in 1993. From 1993 to 1997, Dr. Halevy was a principal member of technical staff at AT&T Bell Laboratories, and then at AT&T Laboratories. He joined the faculty of the Computer Science and Engineering Department at the University of Washington in 1998. Dr. Halevy's research interests are in data integration, management of XML data, web-site management, peer-data management systems, query optimization, database theory, knowledge representation, and more generally, the the intersection between Database and AI technologies. His research developed several systems, such as the Information Manifold data integration system, the Strudel web-site management system, and the Tukwila XML data integration system. He was also a co-developer of XML-QL, which later contributed to the development of XQuery standard for querying XML data. In 1999, Dr. Halevy co-founded Nimble Technology, whose product is a data integration system based on XML. Dr. Halevy was a Sloan Fellow (1999-2000), and received the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2000. He serves on the editorial boards of the VLDB Journal, the Journal of Artificial Intelligence Research and ACM Transactions on Internet Technology, and served as the program chair for the ACM SIGMOD 2003 Conference.

