We examine document spanners, a formal framework for information extraction that was introduced by Fagin, Vansummeren (PODS 2013, JACM 2015). A document spanner is a function that maps an input string to a relation over spans (intervals of positions of the string). We focus on document spanners that are defined by regex formulas, which are basically regular expressions that map matched subexpressions to corresponding spans, and on core spanners, which extend the former by standard algebraic operators and string equality selection. First, we compare the expressive power of core spanners to three models -namely, patterns, word equations, and a rich and natural subclass of extended regular expressions (regular expressions with a repetition operator). These results are then used to analyze the complexity of query evaluation and various aspects of static analysis of core spanners. Finally, we examine the relative succinctness of different kinds of representations of core spanners and relate this to the simplification of core spanners that are extended with difference operators.
We focus on belief propagation for the assignment problem, also known as the maximum weight bipartite matching problem. We provide a constructive proof that the well-known upper bound on the number of iterations (Bayati, Shah, Sharma 2008) is tight up to a factor of four. Furthermore, we investigate the behavior of belief propagation when convergence is not required. We show that the number of iterations required for a sharp approximation consumes a large portion of the convergence time. Finally, we propose an "approximate belief propagation" algorithm for the assignment problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.