Anshumali Shrivastava scite author profile

2015

Minwise hashing (Minhash) is a widely popular indexing scheme in practice. Minhash is designed for estimating set resemblance and is known to be suboptimal in many applications where the desired measure is set overlap (i.e., inner product between binary vectors) or set containment. Minhash has inherent bias towards smaller sets, which adversely affects its performance in applications where such a penalization is not desirable. In this paper, we propose asymmetric minwise hashing (MH-ALSH), to provide a solution to this well-known problem. The new scheme utilizes asymmetric transformations to cancel the bias of traditional minhash towards smaller sets, making the final "collision probability" monotonic in the inner product. Our theoretical comparisons show that, for the task of retrieving with binary inner products, asymmetric minhash is provably better than traditional minhash and other recently proposed hashing algorithms for general inner products. Thus, we obtain an algorithmic improvement over existing approaches in the literature. Experimental evaluations on four publicly available highdimensional datasets validate our claims. The proposed scheme outperforms, often significantly, other hashing algorithms on the task of near neighbor retrieval with set containment. Our proposal is simple and easy to implement in practice.

Scalable and Sustainable Deep Learning via Randomized Hashing

Spring

2017

Current deep learning architectures are growing larger in order to learn from complex datasets. These architectures require giant matrix multiplication operations to train millions of parameters. Conversely, there is another growing trend to bring deep learning to low-power, embedded devices. The matrix operations, associated with both training and testing of deep networks, are very expensive from a computational and energy standpoint. We present a novel hashing based technique to drastically reduce the amount of computation needed to train and test deep networks. Our approach combines recent ideas from adaptive dropouts and randomized hashing for maximum inner product search to select the nodes with the highest activation efficiently. Our new algorithm for deep learning reduces the overall computational cost of forward and back-propagation by operating on significantly fewer (sparse) nodes. As a consequence, our algorithm uses only 5% of the total multiplications, while keeping on average within 1% of the accuracy of the original model. A unique property of the proposed hashing based back-propagation is that the updates are always sparse. Due to the sparse gradient updates, our algorithm is ideally suited for asynchronous and parallel training leading to near linear speedup with increasing number of cores. We demonstrate the scalability and sustainability (energy efficiency) of our proposed algorithm via rigorous experimental evaluations on several real datasets.

Learning Feasibility for Task and Motion Planning in Tabletop Environments

Wells

Dantam

IEEE Robot. Autom. Lett.

et al. 2019

Task and motion planning (TMP) combines discrete search and continuous motion planning. Earlier work has shown that to efficiently find a task-motion plan, the discrete search can leverage information about the continuous geometry. However, incorporating continuous elements into discrete planners presents challenges. We improve the scalability of TMP algorithms in tabletop scenarios with a fixed robot by introducing geometric knowledge into a constraint-based task planner in a robust way. The key idea is to learn a classifier for feasible motions and to use this classifier as a heuristic to order the search for a task-motion plan. The learned heuristic guides the search towards feasible motions and thus reduces the total number of motion planning attempts. A critical property of our approach is allowing robust planning in diverse scenes. We train the classifier on minimal exemplar scenes and then use principled approximations to apply the classifier to complex scenarios in a way that minimizes the effect of errors. By combining learning with planning, our heuristic yields order-of-magnitude run time improvements in diverse tabletop scenarios. Even when classification errors are present, properly biasing our heuristic ensures we will have little computational penalty.

Fast Near Neighbor Search in High-Dimensional Binary Data

2012

Abstract. Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data, which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating hash tables directly using the bits generated by b-bit minwise hashing. The advantages of our method are demonstrated through thorough comparisons with two strong baselines: spectral hashing and sign (1-bit) random projections.

Using Local Experiences for Global Motion Planning

Chamzas

Kavraki

2019

Sampling-based planners are effective in many real-world applications such as robotics manipulation, navigation, and even protein modeling. However, it is often challenging to generate a collision-free path in environments where key areas are hard to sample. In the absence of any prior information, sampling-based planners are forced to explore uniformly or heuristically, which can lead to degraded performance. One way to improve performance is to use prior knowledge of environments to adapt the sampling strategy to the problem at hand. In this work, we decompose the workspace into local primitives, memorizing local experiences by these primitives in the form of local samplers, and store them in a database. We synthesize an efficient global sampler by retrieving local experiences relevant to the given situation. Our method transfers knowledge effectively between diverse environments that share local primitives and speeds up the performance dramatically. Our results show, in terms of solution time, an improvement of multiple orders of magnitude in two traditionally challenging high-dimensional problems compared to state-of-the-art approaches. Fig. 1: The task of grasping the red can without removing the green cans is very challenging for traditional methods. The proposed approach efficiently solves it by using prior experiences to guide the search.