Tests with zebrafish embryos have gained wide acceptance as an alternative test model for drug development and toxicity testing. In particular, the behavioral response of the zebrafish embryo is currently seen as a useful endpoint to diagnose neuroactive substances. Consequently, several behavioral test methods have been developed addressing various behavioral endpoints such as spontaneous tail coiling (STC), photomotor response (PMR), locomotor response (LMR) and alternating light/dark-induced locomotor response (LMR-L/D). Although these methods are distinct in their application, most of their protocols differ quite strongly in the use of experimental parameters and this is usually driven by different research questions. However, if a single mode of action is to be diagnosed, then varying experimental parameters may cause incoherent behavioral responses (hypo-or hyperactivity) of zebrafish during toxicity assessment. This could lead to inconclusiveness of behavioral test results for use within a prospective and diagnostic risk assessment framework. To investigate the influence of these parameters, we conducted a review of existing behavioral assays to address the following two questions: (1) To what extent do varying experimental parameters influence observed effects in published behavioral test methods? (2) Is the observed behavior change (hypo-or hyperactivity) of zebrafish embryos consistent with the expected mode of action of a chemical? We compiled a set of 18 substances which are anticipated to be neuroactive. We found that behavioral changes are not only affected by chemicals but also variation in the use of experimental parameters across studies seems to have a high impact on the outcome and thus comparability between studies. Four parameters, i.e., exposure concentration, exposure duration, endpoint parameter and developmental stage were the most influential parameters. Varying combinations of these parameters caused a non-reproducible outcome for the hyperactivity expected for the organophosphates; chlorpyrifos and diazinon. We highlighted that the STC test shows a higher capacity to predict the hyperactivity of organophosphates, while PMR and LMR-L/D were more suitable to predict the hypoactivity expected for anticonvulsants. We provide a list of recommendations which, when implemented, may help to exclude the risk of bias due to experimental parameters if similar goals are desired.