In this paper, we propose a global, selfexplainable solution to solve a prominent NLP problem: Entity Resolution (ER). We formulate ER as a graph partitioning problem. Every mention of a real-world entity is represented by a node in the graph, and the pairwise similarity scores between the mentions are used to associate these nodes to exactly one clique, which represents a real-world entity in the ER domain. In this paper, we use Clique Partitioning Problem (CPP), which is an Integer Program (IP) to formulate ER as a graph partitioning problem and then highlight the explainable nature of this method. Since CPP is NP-Hard, we introduce an efficient solution procedure, the xER algorithm, to solve CPP as a combination of finding maximal cliques in the graph and then performing generalized set packing using a novel formulation. We discuss the advantages of using xER over the traditional methods and provide the computational experiments and results of applying this method to ER data sets.
We develop a Graphics Processing Unit (GPU) accelerated algorithm for the NP-Hard Multi-dimensional Assignment Problem (MAP), suitable for target tracking applications. First, the original MAP formulation with a quadratic objective function is reformulated using a creative linearization technique. This formulation lends itself well to Lagrangian Relaxation, which decomposes into pairwise Linear Assignment Problems (LAPs). These LAPs are solved in parallel and are each solved using a recent GPU-accelerated approach. Next, we propose a dual-ascent scheme for the Lagrange multiplier updates. The advantage of this scheme is that it results in monotonically increasing lower bounds and converges in a fraction of the iterations typically needed for a subgradient method. The dual-ascent technique is also parallelized for the GPU. Finally, we develop a creative gap closure scheme with M-best LAP solutions for each dimension and find the shortest path in the resulting staged graph. The algorithm is applied to the Multi-Target Tracking problem and tested on datasets for maneuverable targets. Scaling studies are also performed, and note that the processing time goes down approximately linearly in the number of GPU devices. The algorithm can efficiently solve up to a problem size of 400 targets in 400 time-frames, which corresponds to 25 billion variables, with high accuracy.
Note to Practitioners-TheMulti-Target Tracking problem (MTT) has been a longstanding problem with various variants and solution algorithms. Still, the problem remains challenging, especially when dealing with a large number of targets for many time frames, when solution speed and optimality are concerns. Many problems including, entity resolution, weapon target assignment, resource allocation, and data association can be formulated as MAP. Our overall algorithm, implemented with GPU acceleration enables addressing large-dimensioned MAPs, e.g., number of observed targets for a long horizon, for around 25 billion variables. As per our knowledge, no algorithm could tackle this large-scale data either for MAP or MTT.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.