Long-term balancing selection typically leaves narrow footprints of increased genetic diversity, and therefore most detection approaches only achieve optimal performances when sufficiently small genomic regions (i.e., windows) are examined. Such methods are sensitive to window sizes and suffer substantial losses in power when windows are large. This issue creates a tradeoff between noise and power in empirical applications. Here, we employ mixture models to construct a set of five composite likelihood ratio test statistics, which we collectively term B statistics. These statistics are agnostic to window sizes and can operate on diverse forms of input data. Through simulations, we show that they exhibit comparable power to the best-performing current methods, and retain substantially high power regardless of window sizes. Moreover, these methods display considerable robustness to high mutation rates and uneven recombination landscapes, as well as an array of other common confounding scenarios. Further, we applied these statistics on a bonobo population-genomic dataset. In addition to the MHC-DQ genes, we uncovered several novel candidate genes, such as KLRD1, involved in viral defense, and NKAIN3, associated with sexuality. Finally, we integrated the set of statistics into open-source software named BalLeRMix, for future applications by the scientific community. * mxd60@psu.edu Early methods applied to this problem evaluated departures from neutral expectations of genetic diversity at a particular genomic region. For example, the Hudson-Kreitman-Aguadé (HKA) test (Hudson et al., 1987) uses a chi-square statistic to assess whether genomic regions have higher density of polymorphic sites when compared to a putative neutral genomic background. In contrast, Tajima's D (Tajima, 1989) measures the distortion of allele frequencies from the neutral site frequency spectrum (SFS) under a model with constant population size. However, these early approaches were not tailored for balancing selection, and have limited power. Recently, novel and more powerful summary statistics (Siewert and Voight, 2017, 2018; Bitarello et al., 2018) and model-based approaches (DeGiorgio et al., 2014; Cheng and DeGiorgio, 2019) have been developed to specifically target regions under balancing selection. In general, the summary statistics capture deviations of allele frequencies from a putative equilibrium frequency of a balanced polymorphism. In particular, the non-central deviation statistic (Bitarello et al., 2018) adopts an assigned value as this putative equilibrium frequency, whereas the β and β (2) statistics of Siewert and Voight (2017, 2018) use the frequency of the central polymorphic site instead. On the other hand, the T statistics of DeGiorgio et al. (2014) and Cheng and DeGiorgio (2019) compare the composite likelihood of the data under an explicit coalescent model of long-term balancing selection (Hudson et al., 1987; to the composite likelihood under the genome-wide distribution of variation, which is