Previous work analyzing social networks has mainly focused on binary friendship relations. However, in online social networks the low cost of link formation can lead to networks with heterogeneous relationship strengths (e.g., acquaintances and best friends mixed together). In this case, the binary friendship indicator provides only a coarse representation of relationship information. In this work, we develop an unsupervised model to estimate relationship strength from interaction activity (e.g., communication, tagging) and user similarity. More specifically, we formulate a link-based latent variable model, along with a coordinate ascent optimization procedure for the inference. We evaluate our approach on real-world data from Facebook, showing that the estimated link weights result in higher autocorrelation and lead to improved classification accuracy.
In this work, we study the problem of within-network relational learning and inference, where models are learned on a partially labeled relational dataset and then are applied to predict the classes of unlabeled instance in the same graph. We categorized recent work in statistical relational learning into three alternative approaches for this setting: disjoint learning with disjoint inference, disjoint learning with collective inference, and collective learning with collective inference. Models from each of these categories has been employed previously in different settings, but to our knowledge there has been no systematic comparison of models from all three categories. In this paper, we develop a novel pseudolikelihood EM method that facilitates more general collective learning and collective inference on partially labeled relational networks. We then compare this method to competing methods from the other categories on both synthetic and real-world data. We show that there is a region of performance, when there is a moderate number of labeled examples, where the pseudolikelihood EM approach achieves significantly higher accuracy.
Abstract-Recent empirical evaluation has shown that the performance of collective classification models can vary based on the amount of class label information available for use during inference. In this paper, we further demonstrate that the relative performance of statistical relational models learned with different estimation methods changes as the availability of test set labels increases. We reason about the cause of this phenomenon from an information-theoretic perspective and this points to a previously unidentified consideration in the development of relational learning algorithms. In particular, we characterize the high propagation error of collective inference models that are estimated with maximum pseudolikelihood estimation (MPLE), and show how this affects performance across the spectrum of label availability when compared to MLE, which has low propagation error. Our formal study leads to a quantitative characterization that can be used to predict the confidence of local propagation for MPLE models. We use this to propose a mixture model that can learn the best trade-off between high and low propagation models. Empirical evaluation on synthetic and real-world data show that our proposed method achieves comparable, or superior, results to both MPLE and low propagation models across the full spectrum of label availability.
The popularity of online social networks and social media has increased the amount of linked data available in Web domains. Relational and Gaussian Markov networks have both been applied successfully for classification in these relational settings. However, since Gaussian Markov networks model joint distributions over continuous label space, it is difficult to use them to reason about uncertainty in discrete labels. On the other hand, relational Markov networks model probability distributions over discrete label space, but since they condition on the graph structure, the marginal probability for an instance will vary based on the structure of the subnetwork observed around the instance. This implies that the marginals will not be identical across instances and can sometimes result in poor prediction performance. In this work, we propose a novel latent relational model based on copulas which allows use to make predictions in a discrete label space while ensuring identical marginals and at the same time incorporating some desirable properties of modeling relational dependencies in a continuous space. While copulas have recently been used for descriptive modeling, they have not been used for collective classification in large scale network data and the associated conditional inference problem has not been considered before. We develop an approximate inference algorithm, and demonstrate empirically that our proposed Copula Latent Markov Network models based on approximate inference outperform a number of competing relational classification models over a range of real-world relational classification tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.