This paper proposes a Bayesian model selection framework to determine optimal, parsimonious models for given building structures under ground motion excitation using stochastic filtering. Structural system identification at a regional scale after an earthquake is nominally a monumental task due to its inherently high computational cost. This challenge is currently addressed in the literature by using simplified structural models that bear analytical solutions, such as Timoshenko or shear beams, and shear building models. However, a low computational effort usually leads to an increased prediction error and consequently, a higher model uncertainty. This poses the dilemma of prediction accuracy versus model complexity when selecting a model class to represent a building structure. The proposed framework selects the model class that strikes the best balance between the two. To this end, the notion of cumulative evidence is introduced here as the integral of local evidence over the ground motion duration, which is then formulated as the difference between the cumulative likelihood of the observed measurements and the cumulative penalty. The likelihood measure promotes models whose predictions better match the observations. This is counteracted by the penalty, which is devised as an "Ockham factor," and penalizes models with higher information gains due to their higher complexities, for example, a larger number of parameters. The proposed approach also yields the best tuning parameters of the stochastic filter for each model class and the best initial values of the identification parameters. The proposed framework is verified using a synthetic example, and validated using recorded data from the Millikan Library building in California and the ANX building in Japan.