Replication Package for the Paper on "Towards Building a Universal Defect Prediction Model with Rank Transformed Predictors"

Abstract

Software defects can lead to undesired results. Correcting defects costs 50% to 75% of the total software development budgets. To predict defective files, a prediction model must be built with predictors (e.g., software metrics) obtained from either a project itself (within-project) or from other projects (cross-project). A universal defect prediction model that is built from a large set of diverse projects would relieve the need to build and tailor prediction models for an individual project. A formidable obstacle to build a universal model is the variations in the distribution of predictors among projects of diverse contexts (e.g., size and programming language). Hence, we propose to cluster projects based on the similarity of the distribution of predictors, and derive the rank transformations using quantiles of predictors for a cluster. We fit the universal model on the transformed data of 1,385 open source projects hosted on SourceForge and GoogleCode. The universal model obtains prediction performance comparable to the within-project models, yields similar results when applied on five external projects (one Apache and four Eclipse projects), and performs similarly among projects with different context factors. At last, we investigate what predictors should be included in the universal model. We expect that this work could form a basis for future work on building a universal model and would lead to software support tools that incorporate it into a regular development workflow.

Anything unclear, please don't hestitate to contact any of the authors.

Data Sources

Our empirical study uses SourceForge projects as our subject projects.
We have two types of data sources: CVS repositories by Audris Mockus and D'Ambros et al. dataset.

CVS repositories collected by Audris Mockus.

This dataset was collected from SourceForge and GoogleCode. In our study, we select 1,385 projects. As the original dataset is large, we provide the list of subject projects, raw metric values, and converted metric values. Our primary tool for computing metrics is a commercial tool called Understand.
List of subject projectsDownload link
Raw metric values of subject projectsDownload link
Converted metric values of subject projectsDownload link
The metric files are in csv format, including context factors, code metrics, and process metrics. The defect-proneness,

D'Ambros et al. dataset.

This dataset was collected by D'Ambros. It contains four Eclipse projects (i.e., Eclipse JDT Core, Eclipse PDE UI, Equinox Framework, and Mylyn), and one Apache project (i.e., Lucene). This dataset is publicly available at http://bug.inf.usi.ch/download.php. In our study, we selected the common metrics from both our dataset and their dataset. They are: ... . We also provide the raw and rank-transformed metric values of these five projects.
Raw metric values of five external projectsDownload link
Converted metric values of five external projectsDownload link

Experimental Results

RQ1. Can a context-aware rank transformation provide predictive power comparable to the power of log transformation?

The description of the approach can be found in the paper. Here, we put the results of the predictions, i.e., the performance measures of log-transformed and rank-transformed values of within-project models for each project. The results can be downloaded from this link.

RQ2. What is the performance of the universal defect prediction model?

The description of the approach can be found in the paper. Here, we put the results of the performances. For RQ2.1, the performance of three different metric set (i.e., code, code and process, code and process and context factors) can be downloaded from this link. For RQ2.2, the performance of within-project and universal models can be downloaded from this link.

RQ3. What is the performance of the universal defect prediction model on external projects?

The description of the approach can be found in the paper. Here, we put the results of the performances. The performance of within-project and universal models on five external projects can be downloaded from this link.

RQ4. Do context factors affect the performance of the universal defect prediction model?

The description of the approach can be found in the paper. Here, we put the results of the performance. For each project, we present its context factors, and performance of the universal model on each project (when it is as the target project). The results can be downloaded from this link.

RQ5. What predictors should be included in the universal defect prediction model?

The results have been described clearly in the paper. And it can be obtained using the metrics from this link. So we don't present results for RQ5 here.

Quantiles of Numerical Context Factors

Context Factor0%25%50%75%100%
TLOC7618,63647,708118,9271,168,864
TNF21182626794,658
TNC351,0712,7568,139466,968
TND1310241,662

Our Ranking Functions

Ranking Functions
TO BE UPDATED.

Unified Model

IV.2 Unified Model
TO BE UPDATED.

Authors

Feng Zhang
(first name <at> cs.queensu.ca)
Audris Mockus
(first name <at> avaya.com)
Iman Keivanloo
(first name <dot> last name <at> queensu.ca)
Ying Zou
(first name <dot> last name <at> queensu.ca)