We are concerned with the effect of using a surrogate assessor to train a passive (i.e., batch) supervised-learning method to rank documents for subsequent review, where the effectiveness of the ranking will be evaluated using a different assessor deemed to be authoritative. Previous studies suggest that surrogate assessments may be a reasonable proxy for authoritative assessments for this task. Nonetheless, concern persists in some application domains-such as * The views expressed herein are solely those of the author and should not be attributed to her firm or its clients.
Push notifications from social media provide a method to keep up-to-date on topics of personal interest. To be effective, notifications must achieve a balance between pushing too much and pushing too little. Push too little and the user misses important updates; push too much and the user is overwhelmed by unwanted information. Using data from the TREC 2015 Microblog track, we explore simple dynamic emission strategies for microblog push notifications. The key to effective notifications lies in establishing and maintaining appropriate thresholds for pushing updates. We explore and evaluate multiple threshold setting strategies, including purely static thresholds, dynamic thresholds without user feedback, and dynamic thresholds with daily feedback. Our best technique takes advantage of daily feedback in a simple yet effective manner, achieving the best known result reported in the literature to date.
We examine the effects of expanding a judged set of sentences with their duplicates from a corpus. Including new sentences that are exact duplicates of the previously judged sentences may allow for better estimation of performance metrics and enhance the reusability of a test collection. We perform experiments in context of the Temporal Summarization Track at TREC 2013. We find that adding duplicate sentences to the judged set does not significantly affect relative system performance. However, we do find statistically significant changes in the performance of nearly half the systems that participated in the Track. We recommend adding exact duplicate sentences to the set of relevance judgements in order to obtain a more accurate estimate of system performance.
How do we evaluate systems that filter social media streams and send users updates via push notifications on their mobile phones? Such notifications must be relevant, timely, and novel. In this paper, we explore various evaluation metrics for this task, focusing specifically on measuring relevance. We begin with an analysis of metrics deployed at the TREC 2015 Microblog evaluations. A simple change to the metrics, reflecting a different assumption, dramatically alters system rankings. Applying another metric, previously used in the TREC Microblog evaluations, again yields different system rankings. We find little correlation between a number of "reasonable" evaluation metrics, which suggests that system effectiveness depends on how you measure it-an undesirable state in IR evaluation. However, we argue that existing evaluation metrics can be generalized into a framework that uses the same underlying contingency table, but places different weights and penalties. Although we stop short of proposing the "one true metric", this framework can guide the future development of a family of metrics that more accurately models user needs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.