Vehicular networks are networks of communicating vehicles, a major enabling technology for future cooperative and autonomous driving technologies. The most important messages in these networks are broadcast-authenticated periodic one-hop beacons, used for safety and traffic efficiency applications such as collision avoidance and traffic jam detection. However, broadcast authenticity is not sufficient to guarantee message correctness. The goal of misbehavior detection is to analyze application data and knowledge about physical processes in these cyber-physical systems to detect incorrect messages, enabling local revocation of vehicles transmitting malicious messages. Comparative studies between detection mechanisms are rare due to the lack of a reference dataset. We take the first steps to address this challenge by introducing the Vehicular Reference Misbehavior Dataset (VeReMi) and a discussion of valid metrics for such an assessment. VeReMi is the first public extensible dataset, allowing anyone to reproduce the generation process, as well as contribute attacks and use the data to compare new detection mechanisms against existing ones. The result of our analysis shows that the acceptance range threshold and the simple speed check are complementary mechanisms that detect different attacks. This supports the intuitive notion that fusion can lead to better results with data, and we suggest that future work should focus on effective fusion with VeReMi as an evaluation baseline.