Abstract:
Software vulnerability analysis is one of the critical
issues in the software industry, and vulnerability classification
plays a major role in this analysis. A typical vulnerability classification
model usually involves a stage of term selection, in which
the relevant terms are identified via feature selection. It also involves
a stage of term weighting, in which document weights for
the selected terms are computed, and a stage for classifier learning.
Generally, the term frequency-inverse document frequency
(TF-IDF) is the most widely used term-weighting method. However,
empirical evidence shows that the TF-IDF is plagued with
issues pertaining to its effectiveness. This paper introduces a new
approach for vulnerability classification, which is based on term
frequency and inverse gravity moment (TF-IGM). The proposed
method is validated by empirical experiments using three machine
learning algorithms on ten publicly available vulnerability
datasets. The result shows that TF-IGM outperforms the benchmark
method across the applications studied.