Nima Asadi scite author profile

This paper examines a multi-stage retrieval architecture consisting of a candidate generation stage, a feature extraction stage, and a reranking stage using machine-learned models. Given a fixed set of features and a learning-to-rank model, we explore effectiveness/efficiency tradeoffs with three candidate generation approaches: postings intersection with SvS, conjunctive query evaluation with Wand, and disjunctive query evaluation with Wand. We find no significant differences in end-to-end effectiveness as measured by NDCG between conjunctive and disjunctive Wand, but conjunctive query evaluation is substantially faster. Postings intersection with SvS, while fast, yields substantially lower end-to-end effectiveness, suggesting that document and term frequencies remain important in the initial ranking stage. These findings show that conjunctive Wand is the best overall candidate generation strategy of those we examined.

show abstract

Runtime Optimizations for Tree-Based Machine Learning Models

Asadi

Lin

Vries

2014

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

Abstract-Tree-based models have proven to be an effective solution for web ranking as well as other machine learning problems in diverse domains. This paper focuses on optimizing the runtime performance of applying such models to make predictions, specifically using gradient-boosted regression trees for learning to rank. Although exceedingly simple conceptually, most implementations of tree-based models do not efficiently utilize modern superscalar processors. By laying out data structures in memory in a more cache-conscious fashion, removing branches from the execution flow using a technique called predication, and micro-batching predictions using a technique called vectorization, we are able to better exploit modern processor architectures. Experiments on synthetic data and on three standard learning-to-rank datasets show that our approach is significantly faster than standard implementations.Index Terms-Web search, general information storage and retrieval, information technology and systems, scalability and efficiency, learning to rank, regression trees

show abstract

Fast candidate generation for two-phase document ranking

Asadi

Lin

2012

View full text Add to dashboard Cite

Document vector representations for feature extraction in multi-stage document ranking

Asadi

Lin

2012

Inf Retrieval

View full text Add to dashboard Cite

Pseudo test collections for learning web search ranking functions

Asadi

Metzler

Elsayed

et al. 2011

View full text Add to dashboard Cite

Test collections are the primary drivers of progress in information retrieval. They provide yardsticks for assessing the effectiveness of ranking functions in an automatic, rapid, and repeatable fashion and serve as training data for learning to rank models. However, manual construction of test collections tends to be slow, labor-intensive, and expensive. This paper examines the feasibility of constructing web search test collections in a completely unsupervised manner given only a large web corpus as input. Within our proposed framework, anchor text extracted from the web graph is treated as a pseudo query log from which pseudo queries are sampled. For each pseudo query, a set of relevant and non-relevant documents are selected using a variety of webspecific features, including spam and aggregated anchor text weights. The automatically mined queries and judgments form a pseudo test collection that can be used for training ranking functions. Experiments carried out on TREC web track data show that learning to rank models trained using pseudo test collections outperform an unsupervised ranking function and are statistically indistinguishable from a model trained using manual judgments, demonstrating the usefulness of our approach in extracting reasonable quality training data "for free".

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nima Asadi

Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures

Runtime Optimizations for Tree-Based Machine Learning Models

Fast candidate generation for two-phase document ranking

Document vector representations for feature extraction in multi-stage document ranking

Pseudo test collections for learning web search ranking functions

Contact Info

Product

Resources

About