“…Dataset preparation: Authors used existing labeled datasets as well as created their own datasets to train ml models. Specifically, a set of studies [48,156,219,243,254,263,298] used available labeled datasets for php, Java, C, C++, and Android applications to train vulnerability detection models. In other cases, Russell et al [261] extended an existing dataset with millions of C and C++ functions and then labeled it based on the output of three static analyzers (i.e., Clang, CppCheck, and Flawfinder).…”