Anomaly detection has become pervasive in modern technology, covering applications from cybersecurity, to medicine or system failure detection. Before outputting a binary outcome (i.e., anomalous or non-anomalous), most algorithms evaluate instances with outlierness scores. But what does a score of 0.8 mean? Or what is the practical difference compared to a score of 1.2? Score ranges are assumed non-linear and relative, their meaning established by weighting the whole dataset (or a dataset model). While this is perfectly true, algorithms also impose dynamics that decisively affect the meaning of outlierness scores. In this work, we aim to gain a better understanding of the effect that both algorithms and specific data particularities have on the meaning of scores. To this end, we compare established outlier detection algorithms and analyze them beyond common metrics related to accuracy. We disclose trends in their dynamics and study the evolution of their scores when facing changes that should render them invariant. For this purpose we abstract characteristic S-curves and propose indices related to discriminant power, bias, variance, coherence and robustness. We discovered that each studied algorithm shows biases and idiosyncrasies, which habitually persist regardless of the dataset used. We provide methods and descriptions that facilitate and extend a deeper understanding of how the discussed algorithms operate in practice. This information is key to decide which one to use, thus enabling a more effective and conscious incorporation of unsupervised learning in real environments.