Randomised controlled trials are deemed to be the strongest class of evidence in evidence-based medicine. Failure of trials to prove superiority of T3/T4 combination therapy over standard LT4 monotherapy has greatly influenced guidelines, while not resolving the ongoing debate. Novel studies have recently produced more evidence from the examination of homeostatic equilibria in humans and experimental treatment protocols in animals. This has exacerbated a serious disagreement with evidence from the clinical trials. We contrasted the weight of statistical evidence against strong physiological counterarguments. Revisiting this controversy, we identify areas of improvement for trial design related to validation and sensitivity of QoL instruments, patient selection, statistical power, collider stratification bias, and response heterogeneity to treatment. Given the high individuality expressed by thyroid hormones, their interrelationships, and shifted comfort zones, the response to LT4 treatment produces a statistical amalgamation bias (Simpson's paradox), which has a key influence on interpretation. In addition to drug efficacy, as tested by RCTs, efficiency in clinical practice and safety profiles requires reevaluation. Accordingly, results from RCTs remain ambiguous and should therefore not prevail over physiologically based counterarguments. In giving more weight to other forms of valid evidence which contradict key assumptions of historic trials, current treatment options should remain open and rely on personalised biochemical treatment targets. Optimal treatment choices should be guided by strict requirements of organizations such as the FDA, demanding treatment effects to be estimated under actual conditions of use. Various improvements in design and analysis are recommended for future randomised controlled T3/T4 combination trials.