In this paper we present a method to enrich the classical web searching with entity mining that is performed at query time. The results of entity mining (entities grouped in categories) can complement the query answers with useful for the user information which can be further exploited in a faceted search-like interaction scheme. We show that the application of entity mining over the snippets of the top-hits of the answers, can be performed at real-time. However mining over the snippets returns less entities than mining over the full contents of the hits, and for this reason we report comparative results for these two scenarios. In addition, we show how Linked Data can be exploited for specifying the entities of interest and for providing further information about the identified entities, implementing a kind of entity-based integration of documents and (semantic) data. Finally, we discuss the applicability of this approach on professional search, specifically for the domains of fisheries/aquaculture and patents.
The last years there is an increasing interest on providing the top search results while the user types a query letter by letter. In this paper we present and demonstrate a family of instant search applications which apart from showing instantly only the top search results, they can show various other kinds of precomputed aggregated information. This paradigm is more helpful for the end user (in comparison to the classic search-as-you-type), since it can combine autocompletion, search-as-you-type, results clustering, faceted search, entity mining, etc. Furthermore, apart from being helpful for the end user, it is also beneficial for the server's side. However, the instant provision of such services for large number of queries, big amounts of precomputed information, and large number of concurrent users is challenging. We demonstrate how this can be achieved using very modest hardware. Our approach relies on (a) a partitioned trie-based index that exploits the available main memory and disk, and (b) dedicated caching techniques. We report performance results over a server running on a modest personal computer (with 3 GB main memory) that provides instant services for millions of distinct queries and terabytes of precomputed information. Furthermore these services are tolerant to user typos and different word orders.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.