Meiyappan Nagappan's Research Page

His research interests are in the field of Software Engineering focusing on Software Fault Identification, Software System Log File Analysis, and Operational Profiling of Software Systems. He is particularly interested in the analysis of log files to identify abnormal behavior and help the developer locate the source of the problem. He is also interested in the application of algorithms that are traditionally used in the pure sciences domain, for solving day to day software engineering tasks. He also works in the Scientific Data Management Center (SDM) - Scientific Process Automation group as part of his RA work. You can find his Research Goals here.

Log file Abstraction
Log files contain valuable information about the execution of a system. This information is often used for debugging, operational profiling, finding anomalies, detecting security threats, measuring performance etc. The log files are usually too big for extracting this valuable information manually, even though manual perusal is still one of the more widely used techniques. Recently a variety of data mining and machine learning algorithms are being used to analyze the information in the log files. A major road block for the efficient use of these algorithms is the inherent variability present in every log line of a log file. Each log line is a combination of a static message type field and a variable parameter field. Even though both these fields are required, the analyses algorithm often requires that these be separated out, in order to find correlations in the repeating log event types. This disentangling of the message and parameter fields to find the event types is called abstraction of log lines. Each log line is abstracted to a unique ID or event type and the dynamic parameter value is extracted to give an insight on the current state of the system. In this paper we present a technique based on a clustering technique used in the Simple Log file Clustering Tool for log file abstraction. This solution is especially useful when we don't have access to the source code of the application or when the lines in the log file do not conform to a rigid structure.

Efficiently Extracting Operational Profiles from Execution Logs using Suffix Arrays
An important software reliability engineering tool is operational profiles. In this paper we propose a cost effective automated approach for creating second generation operational profiles using execution logs of a software product. Our algorithm parses the execution logs into sequences of events and produces an ordered list of all possible subsequences by constructing a suffix-array of the events. The difficulty in using execution logs is that the amount of data that needs to be analyzed is often extremely large (more than a million records per day in many applications). Our approach is very efficient. We show that our approach requires O(N) in space and time to discover all possible patterns in N events. We discuss a practical implementation of the algorithm in the context of the logs from a large cloud computing system.

A Model for Sharing of Confidential Provenance Information in a Query Based System
Workflow management systems are increasingly being used to automate scientific discovery. Provenance meta-data is collected about scientific workflows, processes, simulations and data to add value. There is a variety of workflow management tools that cater to this. The provenance information may have as much value as the raw data. Typically, sensitive information produced by a computational processes or experiments is well guarded. However, this may not necessarily be true when it comes to provenance information. The issue is how to share confidential provenance information. We present a model for sharing provenance information when the confidentiality level is decided by the user dynamically. The key feature of this model is the Query Sharing concept. We illustrate the model for workflows implemented using provenance enabled Kepler system.