Emerging trends and products pose a challenge to modern search engines since they must adapt to the constantly changing needs and interests of users. For example, vertical search engines, such as Amazon, eBay, Walmart, Yelp and Yahoo! Local, provide business category hierarchies for people to navigate through millions of business listings. The category information also provides important ranking features that can be used to improve search experience. However, category hierarchies are often manually crafted by some human experts and they are far from complete. Manually constructed category hierarchies cannot handle the everchanging and sometimes long-tail user information needs. In this paper, we study the problem of how to expand an existing category hierarchy for a search/navigation system to accommodate the information needs of users more comprehensively. We propose a general framework for this task, which has three steps: 1) detecting meaningful missing categories; 2) modeling the category hierarchy using a hierarchical Dirichlet model and predicting the optimal tree structure according to the model; 3) reorganizing the corpus using the complete category structure, i.e., associating each webpage with the relevant categories from the complete category hierarchy. Experimental results demonstrate that our proposed framework generates a high-quality category hierarchy and significantly boosts the retrieval performance.
Many vertical search tasks such as local search focus on specific domains. The meaning of relevance in these verticals is domain-specific and usually consists of multiple well-defined aspects (e.g., text matching and distance in local search). Thus the overall relevance between a query and a document is a tradeoff between multiple relevance aspects. Such a tradeoff can vary for different types of queries or in different contexts. In this paper, we explore these vertical-specific aspects in the learning to rank setting. We propose a novel formulation in which the relevance between a query and a document is assessed with respect to each aspect, forming the multi-aspect relevance. In order to compute a ranking function, we study two types of learning-based approaches to estimate the tradeoff between these relevance aspects: a label aggregation method and a model aggregation method. Since there are only a few aspects, a minimal amount of training data is needed to learn the tradeoff. We conduct both offline and online test experiments on a local search engine and the experimental results show that our proposed multi-aspect relevance formulation is very promising. The two types of aggregation methods perform more effectively than a set of baseline methods including a conventional learning to rank method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.