Progress in pre-clinical research is built on reproducible findings, yet reproducibility has different dimensions and even meanings. Indeed, the terms reproducibility, repeatability, and replicability are often used interchangeably, although each has a distinct definition. Moreover, reproducibility can be discussed at the level of methods, analysis, results, or conclusions (1, 2). Despite these differences in definitions and dimensions, the main aim for an individual research group is the ability to develop new studies and hypotheses based on firm and reliable findings from previous experiments. In practice this wish is often difficult to accomplish. In this review, issues affecting reproducibility in the field of mouse behavioral phenotyping are discussed. Crisis in reproducibility. Over the last ten years, the "reproducibility crisis" has often appeared in the headlines of scientific journals (3-6). Several factors have been identified as the major causes for irreproducibility-including p-hacking, cherry-picking, low statistical power, publication bias, and hypothesizing after results are known (7). However, these issues mostly occur after the animal experiments are done. There are many more items to consider during the planning and running of an experiment-good experimental design includes considerations regarding randomization, blinding, details of housing, husbandry and animal care, the definition of the experimental unit, inclusion and exclusion criteria, and the choice of animal subjects (the source, health status, strain, sex and age of the animal), among others. (8-10). Guidelines and recommendations (e.g. ARRIVE, PREPARE) are available for addressing these factors (11, 12). However, despite the fact that the ARRIVE guidelines have existed for ten years and are