Some of my old projects are
listed here. Two of them, Strudel and Information Manifold, were done
while I was in AT&T Labs and the
last one, Asset Allocation, was done for my Master's thesis at
NIAGARA – UW-Madison
Many
projections envision a future in which the Internet is populated with a vast
number of Web-accessible XML files--a "World-Wide Database".
Recently, there has been a great deal of research into XML query languages to
enable the execution of database-style queries over these XML files. However,
merely being an XML query-processing engine does not render a system suitable
for querying the Internet. A truly useful system must provide mechanisms to (a)
find the XML files that are relevant to a given query, and (b) deal with remote
data sources that either provide unpredictable data access and transfer rates,
or are infinite streams, or both. The Niagara Internet Query System was
designed from the bottom-up to provide these mechanisms. It finds relevant XML
documents by using a novel collaboration between the Niagara XML-QL query
processor and the
IDB: Toward the Scalable
Integration of Queryable Internet Data Sources - UW-Madison
As
the number of databases accessible on the Web grows, the ability to execute queries
spanning multiple heterogeneous queryable sources is
becoming increasingly important. To date, research in this area has focused on
providing semantic completeness, and has generated solutions that work well
when querying over a relatively small number of databases that have static and
well-defined schemas. Unfortunately, these solutions do not extend to the scale
of the present Internet, let alone the Internet of the future. In this project,
we present an approach that makes the opposite tradeoff: it provides a
scalable, unified view over large numbers of queryable
information sources by sacrificing some expressive power in the set of queries
supported. We have developed a prototype system, IDB, that
implements this approach. The IDB system provides scalability through three
main techniques. First, it uses a collection of ontologies
organized into hierarchical namespaces as a medium for expressing data
semantics. Second, it employs a declarative query language to describe
information sources so that source descriptions can be "executed" at
run time instead of being pre-compiled into the system. Third, it utilizes
inverted-index style operations to identify the subset of information sources
that are relevant to a particular user query.
STRUDEL: A Web Site
Management System – AT&T Labs.,
The
key idea in the STRUDEL system is the separation of the logical view of
information available at a web site, the structure of that information in
linked pages, and the graphical presentation of pages in HTML. Building a web
site using STRUDEL involves two steps. First, the designer defines
independently the data that will be available at the site. Second, the designer
decides how to structure and present that data. Intuitively, the structure of the
web site is defined as a view over the underlying data. STRUDEL allows users to
manipulate the underlying data independently of where it is stored or how it is
presented and to customize the web site by creating different views of the
underlying data.
Information
Manifold – AT&T Labs.,
The
Information Manifold system provides uniform access to multiple structured
information sources on the World Wide Web (e.g., databases, form-based
sources). As such, the system can answer complex queries that require the
combination of information from multiple sources. The Information Manifold
frees the user from having to find the information sources that are
relevant to a given query, access each source separately, and manually
combine information from multiple sources. The system contains explicit descriptions
of the contents of the information sources. Given a query, the system uses
the descriptions to determine which sources are relevant, send the appropriate
sub-queries to the relevant sources, and combine information from multiple
sources to answer the user query.
Asset Allocation: a Novel
Neural Net Approach –
In
this project we present an artificial neural network approach to the problem of
asset allocation with an application to commodity futures trading. The key idea
is that the task is split into two: a non-neural break-out strategy determines
whether to be long, short, or neutral, and the network is trained to optimize
the Sharpe Ratio(risk-adjusted return) on the
state-multiplied returns of each commodity. The 3-layer neural network uses as
inputs the performance of the buying strategy and decides the optimal
allocation for each commodity. This project shows the derivations of the neural
network cost function (maximization of the Sharpe Ratio) and comparisons
between a conventional approach, our neural network approach, and an optimal
solution.