An Optimal Spacing Approach for Sampling Small-sized Datasets for Software Effort Estimation
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Context: There has been a growing research focus
in conventional machine learning techniques for software
effort estimation (SEE). However, there is a limited number
of studies that seek to assess the performance of deep
learning approaches in SEE. This is because the sizes of SEE
datasets are relatively small. Purpose: This study seeks to
define a threshold for small-sized datasets in SEE, and
investigates the performance of selected conventional
machine learning and deep learning models on small-sized
datasets. Method: Plausible SEE datasets with their number
of project instances and features are extracted from existing
literature and ranked. Eubank’s optimal spacing theory is
used to discretize the ranking of the project instances into
three classes (small, medium and large). Five conventional
machine learning models and two deep learning models are
trained on each dataset classified as small-sized using the
leave-one-out cross-validation. The mean absolute error is
used to assess the prediction performance of each model.
Result: Findings from the study contradicts existing
knowledge by demonstrating that deep learning models
provide improved prediction performance as compared to
the conventional machine learning models on small-sized
datasets. Conclusion: Deep learning can be adopted for SEE
with the application of regularisation techniques.
Description
Research Article
Keywords
Deep learning, Conventional Machine learning, Software effort estimation, Small-sized, Optimal spacing theory