Abstract. Record linkage identifies multiple records referring to the same entity even if they are not bit-wise identical. It is thus an essential technology for data integration and data cleansing. Existing record linkage approaches are mainly relying on similarity functions based on the surface forms of the records, and hence are not able to identify complex coreference records. This seriously limits the effectiveness of existing approaches. In this work, we propose an automatic method to extract top-k high quality transformation rules given a set of possibly coreferent record pairs. We propose an effective algorithm that performs careful local analyses for each record pair and generates candidate rules; the algorithm finally chooses top-k rules based on a scoring function. We have conducted extensive experiments on real datasets, and our proposed algorithm has substantial advantage over the previous algorithm in both effectiveness and efficiency.
Abstract:An algorithm to synthesise a web search query from example documents is described. A user searching for information on the Web can use a rudimentary query to locate a set of potentially relevant documents. The user classifies the retrieved documents as being relevant or irrelevant to his or her needs. A query can be synthesised from these categorised documents to perform a definitive search with good recall and precision characteristics.
Abstract:Popular web search engines use Boolean queries as their main interface for users to search their information needs. The paper presents results a user survey employing volunteer web searchers to determine the effectiveness of the Boolean queries in meeting the information needs. A metric for measuring the quality of a web search query is presented. This enables us to relate attributes of the search session and the Boolean query with its success. Certain easily identified characteristics of a good web search query are identified.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.