Micro Interaction Metrics for Defect Prediction

There is a common belief that developers' behavioral interaction patterns may affect software quality. However, widely used defect prediction metrics such as software complexity metrics, change churns, and the number of previous defects do not capture developers' direct interactions. We propose 56 novel micro interaction metrics (MIMs) that leverage developers' interaction information stored in the Mylyn data. Mylyn is an Eclipse plug-in, which captures developers' interactions such as file editing and selection events with time spent. To evaluate the performance of MIMs in defect prediction, we build defect prediction (classification and regression) models using MIMs, traditional metrics, and their combinations. Our experimental results show that MIMs significantly improve defect classification and regression accuracy.

We provide data sets (arff files for Weka) to reproduce our experimental results. (Also, you can download all datasets in a zip file. datasets.zip)

Section 4.2.1 Different subjects. (Please, click your mouse-right button on each link and save it.)
- Subject All: MIM+CM+HM, MIM, CM+HM, CM, HM
- Subject Mylyn: MIM+CM+HM, MIM, CM+HM, CM, HM
- Subject Team: MIM+CM+HM, MIM, CM+HM, CM, HM
- Etc.: MIM+CM+HM, MIM, CM+HM, CM, HM
Section 4.2.2 Different Machine Learner
- MIM+CM+HM, MIM, CM+HM, CM, HM
Section 4.2.3 Different Split Points
- 5:5 Time Split: MIM+CM+HM, MIM, CM+HM, CM, HM
- 7:3 Time Split: MIM+CM+HM, MIM, CM+HM, CM, HM
- 8:2 Time Split: MIM+CM+HM, MIM, CM+HM, CM, HM
Section 4.3 Predicting Defect Numbers
- MIM+CM+HM, MIM, CM+HM, CM, HM
Section 4.4 Predicting CVS-log-based Defects
- MIM+CM+HM, MIM, CM+HM, CM, HM

To reproduce experimental results with the above data sets, Please, follow the steps below:

Download a Weka 3.6.1 archive file (download) and get "weka.jar" in the zip file
Run a command with a specific machine learning algorithm and one of arff files above:
- java -Xmx1024m -cp weka.jar weka.classifiers.trees.J48 -t put_file_name.arff -x 10 -i -o -s 1
- For different algorithms, you can replace "weka.classfier.trees.J48" into as follows:
  - weka.classifiers.bayes.BayesNet (Pleaes, add "-D" option for this algorithm.)
  - weka.classifiers.functions.Logistic
- Option "-x 10": 10-fold cross validation
- Option "-s 1": defining a seed for generating random folds (need to change 1 into other numbers to get different results for each trial.)
- For other options, please refer http://weka.wikispaces.com/Primer
Results from the command running display many infomation Weka provides. The value we are interesting is F-measure of "buggy" Class Please, see F-measure value of "buggy" Class under "=== Stratified cross-validation ===" line.

* For MIMs, each metric name matched in Appendix defined here. (names of MIMs)

Micro Interaction Metrics for Defect Prediction

Introduction

Data Sets for Experiments

Publication

Project Members