Henrique Alves, Baldoino Fonseca, Nuno Antunes
In: Latin-American Symposium on Dependable Computing (LADC 2016)
Software metrics can be used as a indicator of the presence of software vulnerabilities. These metrics have been used with machine learning to predict source code prone to contain vulnerabilities. Although it is not possible to find the exact location of the flaws, the models can show which components require more attention during inspections and testing. Each new technique uses his own evaluation dataset, which many times has limited size and representativeness. In this experience report, we use a large and representative dataset to evaluate several state of the art vulnerability prediction techniques. This dataset was built with information of 2186 vulnerabilities from five widely used open source projects. Results show that the dataset can be used to distinguish which are the best techniques. It is also shown that some of the techniques can predict nearly all of the vulnerabilities present in the dataset, although with very low precisions. Finally, accuracy, precision and recall are not the most effective to characterize the effectiveness of this tools.