DyGen: Automatic Generation of High-Coverage Tests via Mining Gigabytes of Dynamic Traces.
Unit tests of object-oriented code exercise particular sequences of method calls. A key problem when automatically generating unit tests that achieve high structural code coverage is the selection of relevant method-call sequences. However, the number of potentially relevant sequences explodes with the number of methods. To address this issue, we propose a novel approach, called DyGen, that generates tests via mining dynamic traces recorded during program executions. Typical program executions tend to exercise only happy paths that do not include error-handling code, and thus recorded traces often do not achieve high structural coverage. To increase coverage, DyGen transforms traces into parameterized unit tests (PUTs). DyGen next uses dynamic symbolic execution to generate new tests inputs for the PUTs that can achieve high structural coverage of the code under test. In this paper, we use DyGen to automatically generate regression tests on a stable version of software. In our evaluations, we show that DyGen records 1.5 GB (size of corresponding C# source code) of dynamic traces and generates 500,000 regression tests, where each test exercises a unique path, on two core libraries of .NET 2.0 framework. The generated regression tests covered 27,485 basic blocks, which are 24.3% higher than the number of blocks covered by recorded dynamic traces. These statistics show that DyGen is scalable and can be used in practice to deal with large real-world code bases.
Mining API Mapping for Language Migration.
Since the inception of programming languages, researchers and practitioners developed various languages such as Java and C#. To address business requirements and to survive in competing markets, companies or open source organizations often have to release different versions of their projects in different languages. Migrating projects from one language to another language (such as from Java to C#) manually is a tedious and error-prone task. To reduce manual effort or human errors, tools can be developed for automatic translation of projects from one language to another, but these tools require the knowledge of how Application Programming Interfaces (APIs) of one language are mapped to the APIs of the other language, referred to as API mapping relations. In this paper, we propose a novel approach, called MAM (Mining API Mapping), that mines API mapping relations from one language to another using API client code. MAM accepts a set of projects with versions in two languages and mines API mapping relations between those two languages based on how APIs are used by the two versions. These mined API mapping relations assist in translation of projects from one language to another. We implemented a tool and conducted two evaluations to show the effectiveness of MAM. The results show that our tool mines 25,805 unique mapping relations of APIs between Java and C# with more than 80% accuracy. The results also show that mined API mapping relations reduce 54.4% compilation errors and 43.0% defects during translation of projects with an existing translation tool, called Java2CSharp. The reduction in compilation errors and defects is due to our new mined mapping relations that are not available with the existing translation tools.
Alattin: Mining Alternative Patterns for Detecting Neglected Conditions.
To improve software quality, static or dynamic veriﬁcation tools accept programming rules as input and detect their violations in software as defects. As these programming rules are often not well documented in practice, previous work developed various approaches that mine programming rules as frequent patterns from program source code. Then these approaches use static defect-detection techniques to detect violations in source code under analysis. These existing approaches often produce many false positives due to various factors. To reduce false positives produced by these mining approaches, we develop a novel approach, called Alattin, that includes a new mining algorithm and a technique that detects neglected conditions based on our mining algorithm. Our new mining algorithm mines alternative patterns in example form "P1 or P2", where P1 and P2 are alternative rules such as condition checks on method arguments or return values related to the same API. We conduct two evaluations to show the effectiveness of our Alattin approach. Our evaluation results show that (1) alternative patterns reach more than 40% of all mined patterns for APIs provided by six open source libraries; (2) the mining of alternative patterns help reduce nearly 28% of false positives among detected violations.
MSeqGen: Object-Oriented Unit-Test Generation via Mining Source Code.
An objective of unit testing is to achieve high structural coverage of the code under test. Achieving high structural coverage of object-oriented code requires desirable method-call sequences that create and mutate objects. These sequences help generate target object states such as argument or receiver object states (in short as target states) of a method under test. Automatic generation of sequences for achieving target states is often challenging due to a large search space of possible sequences. On the other hand, code bases that use object types (such as argument or receiver object types) include sequences that can be used to assist automatic test-generation approaches in achieving target states. In this paper, we propose a novel approach, called MSeqGen, that mines code bases and extracts sequences related to receiver or argument object types of a method under test. Our approach uses these extracted sequences to enhance two state-of-the-art test-generation approaches: random testing and dynamic symbolic execution. We conduct two evaluations to show the effectiveness of our approach. Using sequences extracted by our approach, we show that a random testing approach achieves 8.69% (with a maximum of 20% for one namespace) higher branch coverage and a dynamic-symbolic-execution-based approach achieves 17.4% (with a maximum of 22.45% for one namespace) higher branch coverage than without using our approach. Such an improvement is signiﬁcant as the branches that are not covered by these state-of-the-art approaches are generally quite difficult to cover.
An Empirical Study on the Maintenance of Source Code Clones.
Code cloning has been very often indicated as a bad software development practice. However, many studies appearing in the literature indicate that this is not always the case. In fact, either changes occurring in cloned code are consistently propagated, or cloning is used as a sort of templating strategy, where cloned source code fragments evolve independently. This paper (i) proposes an automatic approach to classify the evolution of source code clone fragments, and (ii) reports a fine-grained analysis of clone evolution in four different Java and C software systems, aimed at investigating to what extent clones are consistently propagated or they evolve independently. Also, the paper investigates the relationship between the presence of clone evolution patterns and other characteristics such as clone radius, clone size and the kind of change the clones underwent, i.e., corrective maintenance or enhancement.
Mining Exception-Handling Rules as Conditional Association Rules.
Programming languages such as Java and C++ provide exception-handling constructs to handle exception conditions. Applications are expected to handle these exception conditions and take necessary recovery actions such as releasing opened database connections. Failing to take necessary recovery actions such as rolling back transactions can not only cause performance degradation, but also lead to critical issues. In this paper, we propose a novel approach that mines exception-handling rules, which describe expected behavior when exceptions occur. Existing mining approaches mine association rules of the form “FCa ⇒ FCe”, which describes that the function call FCa should be followed by the function call FCe in all paths. In this paper, we develop the first novel mining algorithm to mine conditional association rules of the form “(FC1c ...FCnc ) ∧ FCa ⇒ (FC1e ...FCne )”, which describe that FCa should be followed by a sequence of function calls (FC1e ...FCne ) only when FCa is preceded by the sequence (FC1c ...FCnc ). Such form of rules is required to characterize common exception-handling rules. We show the usefulness of these rules by applying these rules on five real-world applications to detect violations. In our evaluation, we show that our approach detects 294 real exception-handling rules in five benchmark applications including 285 kLOC and also finds 160 defects, where 87 defects are new defects that are not found by a previous related approach.
SpotWeb: Detecting Framework Hotspots and Coldspots via Mining Open Source Code on the Web
Software developers often face challenges in reusing open source frameworks due to several factors such as the framework complexity and lack of proper documentation. In this paper, we propose a code-search-engine-based approach that detects hotspots in a given framework by mining code examples gathered from open source repositories available on the web; these hotspots are API classes and methods that are frequently reused. Hotspots can serve as starting points for developers in understanding and reusing the given framework. Our approach also detects coldspots, which are API classes and methods that are rarely used. Coldspots serve as caveats for developers as there can be difficulties in finding relevant code examples and are generally less exercised compared to hotspots. We developed a tool, called SpotWeb, for frameworks or libraries written in Java and used our tool to detect hotspots and coldspots of eight widely used open source frameworks. We show the utility of our detected hotspots by comparing these hotspots with the API classes reused by a real application and compare our results with the results of a previous related approach.
PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web
Programmers commonly reuse existing frameworks or libraries to reduce software development efforts. One common problem in reusing frameworks or libraries is that the programmers often know what type of object that they need, but do not know how to get that object with a specific method sequence. To help programmers to address this issue, we have developed an approach that takes queries of the form "Source object type -> Destination object type" as input, and suggests relevant method-invocation sequences that can serve as solutions that yield the destination object from the source object given in the query. Our approach interacts with a code search engine (CSE) to gather relevant code samples and performs static analysis over the collected samples to extract required sequences. As code samples are collected on demand through CSE, our approach is not limited to any specific set of frameworks or libraries. We have implemented our approach with a tool called PARSEWeb, and conducted four different evaluations to show that our approach is effective in addressing programmers' queries. We also show that PARSEWeb performs better than existing related tools: Prospector and Strathcona.
NEGWeb: Static Defect Detection via Searching Billions of Lines of Open Source Code
To find defects in programs, existing approaches mine programming rules as common patterns out of program source code and classify defects as violations of these mined programming rules. However, these existing approaches often cannot surface out many programming rules as common patterns because these approaches mine patterns from only one or a few project code bases. To better support static bug finding based on mining code, we develop a novel framework, called NEGWeb, for substantially expanding the min- ing scope to billions of lines of open source code based on a code search engine. NEGWeb detects violations related to neglected conditions around individual API calls. We evaluated NEGWeb to detect violations in local code bases or open source code bases. In our evaluation, we show that NEGWeb finds three real defects in Java code reported in the literature and also finds three previously unknown defects in a large-scale open source project called Columba (91, 508 lines of Java code) that reuses 2225 APIs. We also report a high percentage of real rules among the top 25 reported patterns mined for five popular open source applications.
UnitPlus: Assisting Developer Testing in Eclipse
In the software development life cycle, unit testing is an im- portant phase that helps in early detection of bugs. A unit test case consists of two parts: a test input, which is often a sequence of method calls, and a test oracle, which is of- ten in the form of assertions. The effectiveness of a unit test case depends on its test input as well as its test oracle because the test oracle helps in exposing bugs during the ex- ecution of the test input. The task of writing effective test oracles is not trivial as this task requires domain or appli- cation knowledge and also needs knowledge of the intricate details of the class under test. In addition, when developers write new unit test cases, much test code (including code in test inputs or oracles) such as method argument values is the same as some previously written test code. To as- sist developers in writing test code in unit test cases more efficiently, we have developed an Eclipse plugin for JUnit test cases, called UnitPlus, that runs in the background and recommends test-code pieces for developers to choose (and revise when needed) to put in test oracles or test inputs. The recommendation is based on static analysis of the class under test and already written unit test cases. We have con- ducted a feasibility study for our UnitPlus plugin with four Java libraries to demonstrate its potential utility.
Exploiting Code Search Engines to Improve Programmer Productivity
Code Search Engines (CSE) can serve as powerful resources of open source code, as they can search in billions of lines of open source code available on the web. The strength of CSEs can be used for several tasks like searching relevant code samples, identifying hotspots, and finding bugs. However, the major limitations in using CSEs for these tasks are that the returned samples are too many and they are often partial. Our framework addresses the preceding limitations and thereby helps in using CSEs for these tasks. We showed the effectiveness of our framework with two tools developed based on our framework.
|Thank you for visiting my Home Page
Created on: 12th Jan 2007