Abstract. When building probabilistic relational models it is often difficult to determine what formulae or factors to include in a model. Different models make quite different predictions about how probabilities are affected by population size. We show some general patterns that hold in some classes of models for all numerical parametrizations. Given a data set, it is often easy to plot the dependence of probabilities on population size, which, together with prior knowledge, can be used to rule out classes of models, where just assessing or fitting numerical parameters will be misleading. In this paper we analyze the dependence on population for relational undirected models (in particular Markov logic networks) and relational directed models (for relational logistic regression). Finally we show how probabilities for real data sets depend on the population size.
We consider the problem of, given a probabilistic model on a set of random variables, how to add a new variable that depends on the other variables, without changing the original distribution. In particular, we consider relational models (such as Markov logic networks (MLNs)), where we cannot directly define conditional probabilities. In relational models, there may be an unbounded number of parents in the grounding, and conditional distributions need to be defined in terms of aggregators. The question we ask is whether and when it is possible to represent conditional probabilities at all in various relational models. Some aggregators have been shown to be representable by MLNs, by adding auxiliary variables; however it was unknown whether they could be defined without auxiliary variables. For other aggregators, it was not known whether they can be represented by MLNs at all. We obtained surprisingly strong negative results on the capability of flexible undirected relational models such as MLNs to represent aggregators without affecting the original model's distribution. We provide a map of what aspects of the models, including the use of auxiliary variables and quantifiers, result in the ability to represent various aggregators. In addition, we provide proof techniques which can be used to facilitate future theoretic results on relational models, and demonstrate them on relational logistic regression (RLR).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.