The typical hospital Length of Stay (LOS) distribution is known to be right-skewed, to considerably vary across Diagnosis Related Groups (DRG), and to contain markedly high values, in significant proportions. These very long stays are often considered outliers, and thin-tailed statistical distributions are assumed. Moreover, modeling is typically performed by Diagnosis Related Group (DRG) and is consequently based on small empirical samples, thus justifying the previous assumption. However, resource consumption and planning occur at the level of medical specialty departments covering multiple DRG, and when considered at this decision-making scale, extreme LOS values represent a significant component of the distribution of LOS (the right tail) that determines many of its statistical properties.
Through a study of 46,364 electronic health records over four medical specialty departments (Pediatrics, Obstetrics/Gynecology, Surgery, and Rehabilitation Medicine) in the largest hospital in Thailand (Siriraj Hospital in Bangkok), we show that the distribution of LOS exhibits a tail behavior that is consistent with a subexponential distribution. We analyze some empirical properties of such a distribution that are of relevance to cost and resource planning, notably the concentration of resource consumption among a minority of admissions/patients, an increasing residual LOS, where the longer a patient has been admitted, the longer they would, counter-intuitively, be expected to remain admitted, and a slow convergence of the Law of Large Numbers, making empirical estimates of moments (e.g. mean, variance) unreliable. Consequently, we propose a novel Beta-Geometric model that shows a good fit with observed data and reproduces these empirical properties of LOS. Finally, we use our findings to make practical recommendations regarding the pricing and management of LOS.