Three-arm ‘gold-standard’ non-inferiority trials are recommended for indications where only unstable reference treatments are available and the use of a placebo group can be justified ethically. For such trials, several study designs have been suggested that use the placebo group for testing ’assay sensitivity’, that is, the ability of the trial to replicate efficacy. Should the reference fail in the given trial, then non-inferiority could also be shown with an ineffective experimental treatment and hence becomes useless. In this article, we extend the so-called Koch-Röhmel design where a proof of efficacy for the experimental treatment is required in order to qualify for the non-inferiority test. While the efficacy of the experimental treatment is an indication of assay sensitivity, it does not guarantee that the reference is sufficiently efficient to let the non-inferiority claim be meaningful. It has, therefore, been suggested to adaptively test the non-inferiority only if the reference demonstrates superiority to placebo and otherwise to test [Formula: see text]-superiority of the experimental treatment over placebo, where [Formula: see text] is chosen in such a way that it provides proof of non-inferiority with regard to the reference’s historical effect. In this article, we extend the previous work by complementing its adaptive test with compatible simultaneous confidence intervals. Confidence intervals are commonly used and suggested by regulatory guidelines for non-inferiority trials. We show how to adopt different approaches to simultaneous confidence intervals from the literature to the setting of three-arm non-inferiority trials and compare these methods in a simulation study. Finally, we apply these methods to a real clinical trial example.