Multiple Bug Detection And Effort Estimation Framework For Open-Source Projects
Date
2021-09
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University Of Ghana
Abstract
Bug reports are essential in the development and maintenance of software. Bug tracking
systems allow testers to submit bug reports which allow for report analysis and assignment of
reports to fixers to address them. A given bug x is described as multiple bug when it is reported
by more than two bug reporters. It is described as a duplicate bug when it was reported by two
reporters. In a given pool of bug reports from a tracking system, estimating the effort required
to identify multiple bug is a challenge, and hence the need to conduct this study. Thus, a
plausible solution based on an effort estimation framework to detect multiple bugs will reduce
the effort software testers spend in analyzing bug reports and also improve software reliability
and productivity. Although several studies are attempting to solve the problem, there is the
need to introduce an effort estimation framework to detect multiple bugs in software projects,
specifically open-source projects. However, the following constraints exist when detecting
multiple bugs in open-source projects: - (1) a large number of existing bug reports, and (2)
much effort is required when detecting and analyzing multiple bug reports. This study seeks to
develop a framework to detect multiple bugs and estimate the effort required in identifying
such bugs in open-source projects. This study implements the bugdetector tool, which uses bug
information and code features to find similar bugs. It will first extract features from bug
information in a bug tracking system, next it locates bug methods in source code and extracts
bug method code features. It calculates similarities between each overridden and overload
method, and finally, it determines which method may cause potentially related or similar bugs.
Empirical analysis was conducted on bug reports from two open-source projects, namely
Mozilla Firefox and Eclipse. Thus, empirical analysis was conducted on the extracted bug
reports by the bugdetector tool. The analysis was conducted using Deep learning algorithms
(LSTM, Bidirectional LSTM and CNN) and conventional machine learning algorithms (SVM
and Random Forest). Accuracy, Precision, Recall, and F1-score metrics were used to evaluate the models' performance. Estimating the required effort for identifying multiple bugs was done
using a proposed effort estimation metric. Empirical result shows that the deep learning
method, namely the Bidirectional LSTM algorithm yielded improved performance for multiple
bug detection across the two-studied datasets. Thus, for Mozilla Firefox, the Bidirectional
LSTM yielded the best performance accuracy (71.09%), precision (68.30%), and recall
(45.7%). For Eclipse, Bidirectional LSTM dominated the best performance about accuracy
(82.6%) and F1-score (50.9%). The effort required for detecting multiple bugs on average
ranges from 1255.7 to 1383.2 days for the studied Eclipse bug repository, and 1049.8 to 1139.2
days for the studied Mozilla Firefox bug repository. The study concluded that the deep learning
method has a better tendency in detecting multiple bugs in open-source projects as compared
to the conventional machine learning approach. An effort estimation metric is introduced to
compute the effort required to detect multiple bugs in open-source projects. This will assist
software testers/fixers to differentiate between severity levels of the detected bugs based on the
respective efforts computed.
Keywords: Duplicate bugs, Effort estimation, Bug detection, Deep learning, Open-source
projects.
Description
MPhil. Computer Science
Keywords
Open-Source Projects, Bug Detection, Duplicate bugs, Effort estimation, Deep learning