Machine Learning Algorithms on Small-Sized Datasets in Software Effort Estimation: A Comparative Study

Abedu, S.

Machine Learning Algorithms on Small-Sized Datasets in Software Effort Estimation: A Comparative Study

dc.contributor.author	Abedu, S.
dc.date.accessioned	2021-04-13T08:27:43Z
dc.date.available	2021-04-13T08:27:43Z
dc.date.issued	2021-01
dc.description	MPhil. Computer Science	en_US
dc.description.abstract	Context: Software effort estimation is crucial in the software development process. Overestimating or underestimating the effort for a software project can have consequences on the bidding or development process for a company. Over the years, there has been growing research interest in machine learning approaches in software effort estimation. Though deep learning has been described as the state of the art in the field of machine learning, much has not been done to assess the performance of deep learning approaches in the field. Objective: This study defines a discretization scheme for setting a threshold for a small-sized dataset in software effort estimation. Also, it investigates the performance of selected machine learning models on the small-sized datasets. Method: Software effort estimation datasets were identified with their number of project instances and features from existing literature and ranked according to the number of project instances. Eubank’s optimal spacing theory was used to discretize the ranking of the project instances into three classes. The performance of selected conventional machine learning models and two deep learning models were assessed on the datasets classified as small-sized. The leave-one-out cross-validation as recommended by Kitchenham was adopted to assess the training and validation needs of the selected model. The performance of each model on the selected datasets was measured using the mean absolute error (MAE). Robust statistical tests were conducted using the Yuen’s t-test and Cliff’s delta effect size. Results: Results showed that the conventional machine learning models achieved improved prediction performance as compared to the deep learning models. Nonetheless, after applying early stopping regularisation to the deep learning models, it was found that the deep learning models achieved improved prediction accuracy than the conventional machine learning models but failed to outperform the Automatically Transformed Linear Model (ATLM). Conclusion: The study concluded that conventional machine learning approaches achieve better performance than deep learning approaches on small-sized datasets. However, applying the early stopping regularisation technique to the deep learning models can improve the performance of the deep learning model. Also, a given software effort estimation dataset can be classified as small-sized if the number of project instances in the dataset is less than 43. Keyword: Deep Learning, Machine Learning, Software Effort Estimation, Small-Sized Dataset	en_US
dc.identifier.uri	http://ugspace.ug.edu.gh/handle/123456789/36174
dc.language.iso	en	en_US
dc.publisher	University Of Ghana	en_US
dc.subject	Deep Learning	en_US
dc.subject	Machine Learning	en_US
dc.subject	Software Effort Estimation	en_US
dc.subject	Small-Sized Dataset	en_US
dc.title	Machine Learning Algorithms on Small-Sized Datasets in Software Effort Estimation: A Comparative Study	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Machine Learning Algorithms on Small-Sized Datasets in Software Effort Estimation A Comparative Study.pdf
Size:: 2.44 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.6 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Department of Computer Science