Abstract. Explicit Semantic Analysis (ESA) has been recently proposed as an approach to computing semantic relatedness between words (and indirectly also between texts) and has thus a natural application in information retrieval, showing the potential to alleviate the vocabulary mismatch problem inherent in standard Bag-of-Word models. The ESA model has been also recently extended to cross-lingual retrieval settings, which can be considered as an extreme case of the vocabulary mismatch problem. The ESA approach actually represents a class of approaches and allows for various instantiations. As our first contribution, we generalize ESA in order to clearly show the degrees of freedom it provides. Second, we propose some variants of ESA along different dimensions, testing their impact on performance on a cross-lingual mate retrieval task on two datasets (JRC-ACQUIS and Multext). Our results are interesting as a systematic investigation has been missing so far and the variations between different basic design choices are significant. We also show that the settings adopted in the original ESA implementation are reasonably good, which to our knowledge has not been demonstrated so far, but can still be significantly improved by tuning the right parameters (yielding a relative improvement on a cross-lingual mate retrieval task of between 62% (Multext) and 237% (JRC-ACQUIS) with respect to the original ESA model).
Wikipedia provides an interesting amount of text for more than hundred languages. This also includes languages where no reference corpora or other linguistic resources are easily available. We have extracted background language models built from the content of Wikipedia in various languages. The models generated from Simple and English Wikipedia are compared to language models derived from other established corpora. The differences between the models in regard to term coverage, term distribution and correlation are described and discussed. We provide access to the full dataset and create visualizations of the language models that can be used exploratory. The paper describes the newly released dataset for 33 languages, and the services that we provide on top of them.
Abstract. This PhD proposal is about the development of new methods for information access. Two new approaches are proposed: Multi-Grained Query Answering that bridges the gap between Information Retrieval and Question Answering and Learning-Enhanced Query Answering that enables the improvement of retrieval performance based on the experience of previous queries and answers.
WORKSHOP SUMMARYWith the constantly increasing reach of the Web in general and Social Media in particular, more and more people of different nationalities, cultures, origins and beliefs contribute and access online information. These differences express themselves in language, habits, behavioural patterns, socio-cultural norms and values. They also strongly influence the way users provide and formulate content as well as the way they request, acquire, interpret and access information. Therefore, the detection and use of cultural differences and diversity will become more and more a key challenge in both, Information Retrieval and Knowledge Management. The aim of the DETECT workshop was to bring together researchers and practitioners dealing with intercultural, multi-lingual and multi-national information environments in distinct contexts, and discover synergies between their research fields.The workshop program reflects very well the variety of aspects and applications to culture related topics in social media. Two key notes address the topics of diversity and ] as well as approaches for a behavioural analysis of users [6]. Thereby, the two opposite view points on cultural aspects in social media are addressed: the consumers side of searching and reading social news items and the authors side of creating and publishing such information. Also the wide range of contributions to DETECT'11 reflects the various relevant technologies and research questions in the workshop's context. Ranging from works addressing fundamental key technologies like text and image classification over mining patterns of common usage or user behaviour over time and space to end user issues of how to deal with cultural diversity on the social web.Boato et al.[1] consider the application of classifying the emotions associated with pictures. Given the recent interest in sentiment analysis before the background of social media, this topic provides insights suitable to the foundations of social media research.Nishida et al.[5] address the difficult task of classification on microblogs. Using a data compression approach introduces a method which might be very suitable to handle the immanent feature sparsity of Twitter and the like.Martin [4] presents an analytical tool to follow web users in order to improve the usability of web sites. Identifying common usage patterns and relating them to different cultural backgrounds is a step in the direction of tailoring online services to particular user needs.Kling and Gottron [2] automatically extract cultural regions from flickr images annotated with geographic coordinates, which in turn allows to detect culturally similar areas directly from social media data. Wijaya and Yeniterzi [8] look at the evolution of the semantics of words over time. This work adds a time component to cultural topics which enriches social media analysis with another additional and important dimension.Welzer et al.[7] look at cultural awareness in social media and consider in particular the question of creating awareness for cul...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.