Abstract. Linking Open Data (LOD) has become one of the most important community efforts to publish high-quality interconnected semantic data. Such data has been widely used in many applications to provide intelligent services like entity search, personalized recommendation and so on. While DBpedia, one of the LOD core data sources, contains resources described in multilingual versions and semantic data in English is proliferating, there is very few work on publishing Chinese semantic data. In this paper, we present Zhishi.me, the first effort to publish large scale Chinese semantic data and link them together as a Chinese LOD (CLOD). More precisely, we identify important structural features in three largest Chinese encyclopedia sites (i.e., Baidu Baike, Hudong Baike, and Chinese Wikipedia) for extraction and propose several data-level mapping strategies for automatic link discovery. As a result, the CLOD has more than 5 million distinct entities and we simply link CLOD with the existing LOD based on the multilingual characteristic of Wikipedia.
Recently, more and more short texts (e.g., ads, tweets) appear on the Web. Classifying short texts into a large taxonomy like ODP or Wikipedia category system has become an important mining task to improve the performance of many applications such as contextual advertising and topic detection for micro-blogging. In this paper, we propose a novel multi-stage classification approach to solve the problem. First, explicit semantic analysis is used to add more features for both short texts and categories. Second, we leverage information retrieval technologies to fetch the most relevant categories for an input short text from thousands of candidates. Finally, a SVM classifier is applied on only a few selected categories to return the final answer. Our experimental results show that the proposed method achieved significant improvements on classification accuracy compared with several existing state of art approaches.
In recent years, online shopping is becoming more and more popular. Users type keyword queries on product search systems to find relevant products, accessories, and even related products. However, existing product search systems always return very similar products on the first several pages instead of taking diversity into consideration. In this paper, we propose a novel approach to address the diversity issue in the context of product search. We transform search result diversification into a combination of diversifying product categories and diversifying product attribute values within each category. The two sub-problems are optimization problems which can be reduced into well-known NP-hard problems respectively. We further leverage greedy-based approximation algorithms for efficient product search results re-ranking.
The Web as a global information space is developing from a Web of documents to a Web of data. This development opens new ways for addressing complex information needs. Search is no longer limited to matching keywords against documents, but instead complex information needs can be expressed in a structured way, with precise answers as results. In this paper, we demonstrate Hermes, an infrastructure for data web search. To provide an end-user oriented interface, we support expressive keyword search by translating user information needs into structured queries. We integrate heterogeneous web data sources with automatically computed mappings. Schema-level mappings are exploited in constructing structured queries against the integrated schema. These structured queries are decomposed into queries against the local web data sources, which are then processed in a distributed way.
In recent years, more and more images have been uploaded and published on the Web. Along with text Web pages, images have been becoming important media to place relevant advertisements. Visual contextual advertising, a young research area, refers to finding relevant text advertisements for a target image without any textual information (e.g., tags). There are two existing approaches, advertisement search based on image annotation, and more recently, advertisement matching based on feature translation between images and texts. However, the state of the art fails to achieve satisfactory results due to the fact that recommended advertisements are syntactically matched but semantically mismatched. In this paper, we propose a semantic approach to improving the performance of visual contextual advertising. More specifically, we exploit a large high-quality image knowledge base (ImageNet) and a widely-used text knowledge base (Wikipedia) to build a bridge between target images and advertisements. The image-advertisement match is built by mapping images and advertisements into the respective knowledge bases and then finding semantic matches between the two knowledge bases. The experimental results show that semantic match outperforms syntactic match significantly using test images from Flickr. We also show that our approach gives a large improvement of 16.4% on the precision of the top 10 matches over previous work, with more semantically relevant advertisements recommended.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.