Software engineering research is evolving and papers are increasingly based on empirical data from a multitude of sources, using statistical tests to determine if and to what degree empirical evidence supports their hypotheses. This is not only crucial for research progress but also for practitioners in judging the practical significance. To investigate the practices and trends of statistical analysis in empirical software engineering (ESE), this paper presents a review of a large pool of papers from top-ranked software engineering journals. First, we manually reviewed 161 papers producing a review protocol based on a view of the recent state of art concerning statistical analysis and how researchers discuss practical significance. In a second phase of our method, we used the protocol as ground truth for a more extensive semi-automatic classification of papers spanning the years 2001-2015 targeting a total of 5,196 papers.We use the results from both review processes to: i) identify and analyse the predominant practices in ESE (e.g., using t-test or ANOVA), as well as relevant trends in usage of specific statistical methods (e.g., nonparametric tests and effect size measures); and ii) create a conceptual model for a statistical analysis workflow with suggestions on how to apply different statistical methods as well as guidelines to avoid pitfalls with their use, such as the arbitrary α cut-off and neglecting to correct for multiple tests. Additionally, we discuss different techniques to further expand the statistical toolkit of ESE researchers: Bayesian data analysis, techniques to handle missing data, and causal analysis.Lastly, we confirm existing claims that current ESE practices lack a standard to report practical significance of results. We illustrate how practical significance can be discussed in terms of both the statistical analysis and in the practitioner's context. 1 Even though the approaches mentioned cover Software Engineering (SE) as a whole, this paper focuses on the broad branch of all SE that is mainly interested in empirical data, i.e. Empirical Software Engineering (ESE).2 Even though the common approach of judging statistical significance by the use of p-values have recently come under severe scrutiny [96] we here talk about statistical significance in a broader sense of the word.