Integrated development environment (IDE) plugins aimed at detecting web application security vulnerabilities can help developers create secure applications in the first place. Most of such IDE plugins use static source code analysis approaches. Although several empirical studies evaluated the plugins and compared their precision and recall of detecting web application security, few follow-up studies tried to understand the evaluation results. We analyzed more than 20,000 vulnerability reports based on 7,215 distinct test cases spanning 11 categories of web application vulnerabilities to understand the evaluation results of three open-source IDE plugins, namely, SpotBugs, FindSecBugs, and Early Security Vulnerability Detector (ESVD), which aimed at detecting security vulnerabilities of Java-based web applications. Our results identify many factors besides the source code analysis approach that can dramatically bias the detection performance. Based on our insights, we improved the studied plugins. In addition, our study raises the alarm that, without solid root cause analyses, the evaluation and comparisons of security vulnerability detection approaches and tools could be misleading. Thus, we proposed a guideline on reporting the evaluation results of the security vulnerability detection approaches.Index Terms-Software security, Source code analysis, Vulnerability detection, Empirical study, Root cause analysis• First, we provide insights on issues and solutions of implementing taint analysis-based security vulnerability detection tools. Some of the insights can also be generalized to other types of static source code analysis-based vulnerability detectors. • A more significant contribution is that our deep under-