An automatic software vulnerability classification framework using term frequency-inverse gravity moment and feature selection

No Thumbnail Available

Date

2020-05-15

Journal Title

Journal ISSN

Volume Title

Publisher

Journal of Systems and Software

Abstract

Vulnerability classification is an important activity in software development and software quality main- tenance. A typical vulnerability classification model usually involves a stage of term selection, in which the relevant terms are identified via feature selection. It also involves a stage of term-weighting, in which the document weights for the selected terms are computed, and a stage for classifier learning. Generally, the term frequency-inverse document frequency (TF-IDF) model is the most widely used term-weighting metric for vulnerability classification. However, several issues hinder the effectiveness of the TF-IDF model for document classification. To address this problem, we propose and evaluate a general framework for vulnerability severity classification using the term frequency-inverse gravity moment (TF-IGM). Specifi- cally, we extensively compare the term frequency-inverse gravity moment, term frequency-inverse doc- ument frequency, and information gain feature selection using five machine learning algorithms on ten vulnerable software applications containing a total number of 27,248 security vulnerabilities . The exper- imental result shows that: (i) the TF-IGM model is a promising term weighting metric for vulnerability classification compared to the classical term-weighting metric, (ii) the effectiveness of feature selection on vulnerability classification varies significantly across the studied datasets and (iii) feature selection improves vulnerability classification.

Description

Research Article

Keywords

Software vulnerability, Classification, Feature selection, Machine learning algorithms, Severity, Term-weighting

Citation

Endorsement

Review

Supplemented By

Referenced By