Meiyappan Nagappan's Research Page
His research interests are in the field of Software Engineering focusing on Software
Fault Identification, Software System Log File Analysis, and Operational Profiling of
Software Systems. He is particularly interested in the analysis of log files to
identify abnormal behavior and help the developer locate the source of the problem.
He is also interested in the application of algorithms that are traditionally used in
the pure sciences domain, for solving day to day software engineering tasks. He also
works in the Scientific Data
Management Center (SDM) - Scientific Process Automation group as part of
his RA work. You can find his Research Goals here.
Log file Abstraction
Log files contain valuable information about the execution of a system.
This information is often used for debugging, operational profiling, finding
anomalies, detecting security threats, measuring performance etc. The log files
are usually too big for extracting this valuable information manually, even
though manual perusal is still one of the more widely used techniques. Recently
a variety of data mining and machine learning algorithms are being used to
analyze the information in the log files. A major road block for the efficient
use of these algorithms is the inherent variability present in every log line of
a log file. Each log line is a combination of a static message type field and a
variable parameter field. Even though both these fields are required, the analyses
algorithm often requires that these be separated out, in order to find correlations
in the repeating log event types. This disentangling of the message and parameter
fields to find the event types is called abstraction of log lines. Each log line is
abstracted to a unique ID or event type and the dynamic parameter value is extracted
to give an insight on the current state of the system. In this paper we present a
technique based on a clustering technique used in the Simple Log file Clustering Tool
for log file abstraction. This solution is especially useful when we don't have
access to the source code of the application or when the lines in the log file do not
conform to a rigid structure.
Efficiently Extracting Operational Profiles from Execution Logs using Suffix Arrays
An important software reliability engineering tool is operational
profiles. In this paper we propose a cost effective
automated approach for creating second generation operational
profiles using execution logs of a software product.
Our algorithm parses the execution logs into sequences of
events and produces an ordered list of all possible subsequences
by constructing a suffix-array of the events. The
difficulty in using execution logs is that the amount of data
that needs to be analyzed is often extremely large (more than
a million records per day in many applications). Our approach
is very efficient. We show that our approach requires
O(N) in space and time to discover all possible patterns in N
events. We discuss a practical implementation of the algorithm
in the context of the logs from a large cloud computing
system.
A Model for Sharing of Confidential Provenance Information in a Query Based System
Workflow management systems are increasingly being used to automate
scientific discovery. Provenance meta-data is collected about scientific
workflows, processes, simulations and data to add value. There is a variety of
workflow management tools that cater to this. The provenance information may
have as much value as the raw data. Typically, sensitive information produced
by a computational processes or experiments is well guarded. However, this may
not necessarily be true when it comes to provenance information. The issue is
how to share confidential provenance information. We present a model for sharing
provenance information when the confidentiality level is decided by the user
dynamically. The key feature of this model is the Query Sharing concept.
We illustrate the model for workflows implemented using provenance enabled Kepler
system.
|