Matthias Broecheler scite author profile

Shakarian

2010

Multiple phenomena often diffuse through a social network, sometimes in competition with one another. Product adoption and political elections are two examples where network diffusion is inherently competitive in nature. For example, individuals may choose to only select one product from a set of competing products (i.e. most people will need only one cell-phone provider) or can only vote for one person in a slate of political candidate (in most electoral systems). We introduce the weighted generalized annotated program (wGAP) framework for expressing competitive diffusion models. Applications are interested in the eventual results from multiple competing diffusion models (e.g. what is the likely number of sales of a given product, or how many people will support a particular candidate). We define the "most probable interpretation" (MPI) problem which technically formalizes this need. We develop algorithms to efficiently solve MPI and show experimentally that our algorithms work on graphs with millions of vertices.

Hinge-Loss Markov Random Fields and Probabilistic Soft Logic

Bach¹,

Broecheler²,

Huang³

et al. 2015

Preprint

A fundamental challenge in developing high-impact machine learning technologies is balancing the need to model rich, structured domains with the ability to scale to big data. Many important problem areas are both richly structured and large scale, from social and biological networks, to knowledge graphs and the Web, to images, video, and natural language. In this paper, we introduce two new formalisms for modeling structured data, and show that they can both capture rich structure and scale to big data. The first, hingeloss Markov random fields (HL-MRFs), is a new kind of probabilistic graphical model that generalizes different approaches to convex inference. We unite three approaches from the randomized algorithms, probabilistic graphical models, and fuzzy logic communities, showing that all three lead to the same inference objective. We then define HL-MRFs by generalizing this unified objective. The second new formalism, probabilistic soft logic (PSL), is a probabilistic programming language that makes HL-MRFs easy to define using a syntax based on first-order logic. We introduce an algorithm for inferring most-probable variable assignments (MAP inference) that is much more scalable than general-purpose convex optimization methods, because it uses message passing to take advantage of sparse dependency structures. We then show how to learn the parameters of HL-MRFs. The learned HL-MRFs are as accurate as analogous discrete models, but much more scalable. Together, these algorithms enable HL-MRFs and PSL to model rich, structured data at scales not previously possible.

Using Generalized Annotated Programs to Solve Social Network Diffusion Optimization Problems

Shakarian

ACM Trans. Comput. Logic

et al. 2013

There has been extensive work in many different fields on how phenomena of interest (e.g., diseases, innovation, product adoption) "diffuse" through a social network. As social networks increasingly become a fabric of society, there is a need to make "optimal" decisions with respect to an observed model of diffusion. For example, in epidemiology, officials want to find a set of k individuals in a social network which, if treated, would minimize spread of a disease. In marketing, campaign managers try to identify a set of k customers that, if given a free sample, would generate maximal "buzz" about the product. In this article, we first show that the well-known Generalized Annotated Program (GAP) paradigm can be used to express many existing diffusion models. We then define a class of problems called Social Network Diffusion Optimization Problems (SNDOPs). SNDOPs have four parts: (i) a diffusion model expressed as a GAP, (ii) an objective function we want to optimize with respect to a given diffusion model, (iii) an integer k > 0 describing resources (e.g., medication) that can be placed at nodes, (iv) a logical condition VC that governs which nodes can have a resource (e.g., only children above the age of 5 can be treated with a given medication). We study the computational complexity of SNDOPs and show both NP-completeness results as well as results on complexity of approximation. We then develop an exact and a heuristic algorithm to solve a large class of SNDOPproblems and show that our GREEDY-SNDOP algorithm achieves the best possible approximation ratio that a polynomial algorithm can achieve (unless P = NP). We conclude with a prototype experimental implementation to solve SNDOPs that looks at a real-world Wikipedia dataset consisting of over 103,000 edges. ACM Reference Format:Shakarian, P., Broecheler, M., Subrahmanian, V. S. and Molinaro, C. 2013. Using generalized annotated programs to solve social network diffusion optimization problems.

Efficient multi-view maintenance in the social semantic web

Pugliese

2012

The Social Semantic Web (SSW) refers to the mix of RDF data in web content, and social network data associated with those who posted that content. Applications to monitor the SSW are becoming increasingly popular. For instance, marketers want to look for semantic patterns relating to the content of tweets and Facebook posts relating to their products. Such applications allow multiple users to specify patterns of interest, and monitor them in real-time as new data gets added to the web or to a social network. In this paper, we develop the concept of SSW view servers in which all of these types of applications can be simultaneously monitored from such servers. The patterns of interest are views. We show that a given set of views can be compiled in multiple possible ways to take advantage of common substructures, and define the concept of an optimal merge. We develop a very fast MultiView algorithm that scalably and efficiently maintains multiple subgraph views. We show that our algorithm is correct, study its complexity, and experimentally demonstrate that our algorithm can scalably handle updates to hundreds of views on real-world SSW databases with up to 540M edges.

Using Histograms to Better Answer Queries to Probabilistic Logic Programs

Simari

2009

Probabilistic logic programs (PLPs) define a set of probability distribution functions (PDFs) over the set of all Herbrand interpretations of the underlying logical language. When answering a query Q, a lower and upper bound on Q is obtained by optimizing (min and max) an objective function subject to a set of linear constraints whose solutions are the PDFs mentioned above. A common critique not only of PLPs but many probabilistic logics is that the difference between the upper bound and lower bound is large, thus often providing very little useful information in the query answer. In this paper, we provide a new method to answer probabilistic queries that tries to come up with a histogram that "maps" the probability that the objective function will have a value in a given interval, subject to the above linear constraints. This allows the system to return to the user a histogram where he can directly "see" what the most likely probability range for his query will be. We prove that computing these histograms is #P -hard, and show that computing these histograms is closely related to polyhedral volume computation. We show how existing randomized algorithms for volume computation can be adapted to the computation of such histograms. A prototype experimental implementation is discussed.