Numerical simulations of two-phase flows have reached a certain level of importance for process intensification and optimization in industry involving microfluidic applications. Improved schemes and algorithms for numerical simulations of systems with two immiscible fluids continuously arise and are subject to ongoing research efforts. However, the most common approach to verification and validation (V&V) of such methods is to qualitatively compare obtained interface shapes with those of reference solutions. For validation, such reference solutions commonly are compared to the bubble-shape diagrams of Clift et al. [1], Banga and Weber [2] or the interfacial shape from dam break experiments [3]. For verification, often Zalesak's moving disc [4] and the well-known Raleigh-Taylor instability are used. Such simple or simplified test problems can be very helpful for code development but they clearly are not sufficient for a comprehensive assessment in V&V context.In the absence of exact, resp. analytical, reference solutions, numerical benchmarking (i.e. code-to-code comparison) is needed to assess the accuracy of the numerical models more rigorously. An important step in this direction was the benchmark proposed by Hysing et al [5] in 2009. It considers a single rising bubble in a liquid column in 2D and quantitatively compares benchmark quantities as rise velocity, bubble position and circularity over time. The original paper considers Level-Set and ALE methods, later Aland and Voigt [6] applied it to diffuse interface models. Although this benchmark has been intensely used since its publication, in the authors' view it lacks some aspects.The benchmark idea in [5] is based on a code-to-code comparison, which certainly has to be considered critical given the fact that different approaches to numerical simulations of two-phase flows might yield apparently similar and physically plausible results while failing to reproduce experimental results quantitatively. Agreement among distinct methods are delusive and might be caused by (even different) errors and shortcomings of the underlying schemes or algorithms. Moreover, a single benchmark based on code-to-code comparison might involve the risk of another even more surreptitious problem: within one numerical approach two errors might compensate each other while reproducing the results of this single benchmark in what one would call 'good agreement'.Another aspect to be addressed here in order to stress the need for further two-phase flow benchmarks is the somewhat limited scope in [5]: It is only interesting for 2D codes (effectively excluding pure 3D codes) and for a rather small range of surface tension coefficients. However, surface tension is of major importance for all codes aiming at two-phase flow simulations, since in many applications surface tension effects are predominant or at least play a central role. A simple bubble rise scenario as used in [5] can not assess the case of large surface tensions, since this just leads to spherical bubbles, making the surface e...