A major challenge in many research areas is reproducibility of implementations, experiments, or evaluations. New data sources and research directions complicate the reproducibility even more. For example, Twitter continues to gain popularity as a source of up-to-date news and information. As a result, numerous event detection techniques have been proposed to cope with the steadily increasing rate and volume of social media data streams. Although some of these works provide their implementation or conduct an evaluation of the proposed technique, it is almost impossible to reproduce their experiments. The main drawback is that Twitter prohibits the release of crawled datasets that are used by researchers in their experiments. In this work, we present a survey of the vast landscape of implementations, experiments, and evaluations presented by the different research works. Furthermore, we propose a reproducibility toolkit including Twistor (Twitter Stream Simulator), which can be used to simulate an artificial Twitter data stream (including events) as input for the experiments or evaluations of event detection techniques. We further present the experimental application of the reproducibility toolkit to stateof-the-art event detection techniques.