Abstract. Twitter has become a major outlet for news, discussion and commentary of on-going events and trends. Effective searching of Twitter collections poses a number of issues for traditional document-based information retrieval (IR) approaches, such as limited document term statistics and spam. In this paper we propose a novel approach to pseudo-relevance feedback, based upon the temporal profiles of n-grams extracted from the top N relevance feedback tweets. A weighted graph is used to model temporal correlation between n-grams, with a PageRank variant employed to combine both pseudo-relevant document term distribution and temporal collection evidence. Preliminary experiments with the TREC Microblogging 2011 Twitter corpus indicate that through parameter optimisation, retrieval effectiveness can be improved.
Since time is an omnipresent feature of our existence, many elements of time are embedded in information itself, and related behaviours such as creation, seeking and utilisation. In IR, time can distinguish the interpretation of information, and influence the intentions and expectations of users' information seeking activity. Many time-based patterns and trendsnamely temporal dynamics-are evident in streams of information behaviour by individuals and crowds. A temporal dynamic refers to a periodic regularity, or, a one-off or irregular past, present or future of a particular element (e.g., word, topic or query popularity)-driven by predictable and unpredictable time-based events and phenomena. Several challenges and opportunities related to temporal dynamics emerge in IR. This thesis explores temporal dynamics from the perspective of (i) query popularity and meaning, and (ii) word use and relationships over time. In particular, I consider how real-time temporal dynamics in information seeking should be supported for consistent user satisfaction over time, and moreover, how previously observed temporal dynamics offer a complementary dimension which can be exploited to inform more effective IR systems. Uncertainty about user expectations is a perennial problem for IR systems, further confounded by changes over time. Addressing this, IR systems can either assist the user to submit an effective query (e.g., error-free and descriptive), or better anticipate what the user is most likely to want in relevance ranking. I first explore methods to always help users formulate queries with time-aware query auto-completion capable of suggesting both recent and always popular queries. I propose and evaluate several novel approaches, and demonstrate state-of-the-art performance of up to +9.2% improvement above existing baselines for diverse search scenarios in different languages. Furthermore, I explore the impact of temporal dynamics on the motives behind users' information seeking, and thus how relevance itself is subject to temporal dynamics. I find the most likely meaning of ambiguous queries is affected over short and long-term periods (e.g., hours to months) by several periodic and oneoff event-driven temporal dynamics. Finally, I find that for many event-driven multi-faceted queries, relevance can often be inferred by modelling the temporal dynamics of changes in related information. IR approaches are typically based on methods which characterize the nature of information through the statistical distributions of words and phrases. I model and exploit the temporal dimension of the collection, captured by temporal dynamics, in these established IR
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.