Old Projects

 

Some of my old projects are listed here.  Two of them, Strudel and Information Manifold, were done while I was in AT&T Labs and the last one, Asset Allocation, was done for my Master's thesis at Univ. of Colorado at Boulder.

 

NIAGARA – UW-Madison

Many projections envision a future in which the Internet is populated with a vast number of Web-accessible XML files--a "World-Wide Database". Recently, there has been a great deal of research into XML query languages to enable the execution of database-style queries over these XML files. However, merely being an XML query-processing engine does not render a system suitable for querying the Internet. A truly useful system must provide mechanisms to (a) find the XML files that are relevant to a given query, and (b) deal with remote data sources that either provide unpredictable data access and transfer rates, or are infinite streams, or both. The Niagara Internet Query System was designed from the bottom-up to provide these mechanisms. It finds relevant XML documents by using a novel collaboration between the Niagara XML-QL query processor and the Niagara "text-in-context" XML search engine. To handle infinite streams and data sources with unpredictable rates, it supports a "get partial" operation on blocking operators in order to produce partial query results, and inserts synchronization packets at critical points in the operator tree to guarantee the consistency of (partial) results. The Niagara Internet Query System is public domain software that can be found at http://www-db.cs.wisc.edu/niagara/.

 

IDB: Toward the Scalable Integration of Queryable Internet Data Sources  - UW-Madison

As the number of databases accessible on the Web grows, the ability to execute queries spanning multiple heterogeneous queryable sources is becoming increasingly important. To date, research in this area has focused on providing semantic completeness, and has generated solutions that work well when querying over a relatively small number of databases that have static and well-defined schemas. Unfortunately, these solutions do not extend to the scale of the present Internet, let alone the Internet of the future. In this project, we present an approach that makes the opposite tradeoff: it provides a scalable, unified view over large numbers of queryable information sources by sacrificing some expressive power in the set of queries supported. We have developed a prototype system, IDB, that implements this approach. The IDB system provides scalability through three main techniques. First, it uses a collection of ontologies organized into hierarchical namespaces as a medium for expressing data semantics. Second, it employs a declarative query language to describe information sources so that source descriptions can be "executed" at run time instead of being pre-compiled into the system. Third, it utilizes inverted-index style operations to identify the subset of information sources that are relevant to a particular user query.

 

STRUDEL: A Web Site Management System – AT&T Labs., Florham Park, NJ.

The key idea in the STRUDEL system is the separation of the logical view of information available at a web site, the structure of that information in linked pages, and the graphical presentation of pages in HTML. Building a web site using STRUDEL involves two steps. First, the designer defines independently the data that will be available at the site. Second, the designer decides how to structure and present that data. Intuitively, the structure of the web site is defined as a view over the underlying data. STRUDEL allows users to manipulate the underlying data independently of where it is stored or how it is presented and to customize the web site by creating different views of the underlying data.

 

Information Manifold – AT&T Labs., Florham Park, NJ.

The Information Manifold system provides uniform access to multiple structured information sources on the World Wide Web (e.g., databases, form-based sources). As such, the system can answer complex queries that require the combination of information from multiple sources. The Information Manifold frees the user from having to find the information sources that are relevant to a given query, access each source separately, and manually combine information from multiple sources. The system contains explicit descriptions of the contents of the information sources. Given a query, the system uses the descriptions to determine which sources are relevant, send the appropriate sub-queries to the relevant sources, and combine information from multiple sources to answer the user query.

 

Asset Allocation: a Novel Neural Net Approach – Univ. of Colorado at Boulder

In this project we present an artificial neural network approach to the problem of asset allocation with an application to commodity futures trading. The key idea is that the task is split into two: a non-neural break-out strategy determines whether to be long, short, or neutral, and the network is trained to optimize the Sharpe Ratio(risk-adjusted return) on the state-multiplied returns of each commodity. The 3-layer neural network uses as inputs the performance of the buying strategy and decides the optimal allocation for each commodity. This project shows the derivations of the neural network cost function (maximization of the Sharpe Ratio) and comparisons between a conventional approach, our neural network approach, and an optimal solution.