Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023
DOI: 10.1145/3539618.3591802
|View full text |Cite
|
Sign up to set email alerts
|

VoMBaT: A Tool for Visualising Evaluation Measure Behaviour in High-Recall Search Tasks

Abstract: The objective of High-Recall Information Retrieval (HRIR) is to retrieve as many relevant documents as possible for a given search topic. One approach to HRIR is Technology-Assisted Review (TAR), which uses information retrieval and machine learning techniques to aid the review of large document collections. TAR systems are commonly used in legal eDiscovery and systematic literature reviews. Successful TAR systems are able to find the majority of relevant documents using the least number of assessments. Common… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…When the screening problem is treated as a ranking task, such as screening prioritisation or stopping prediction; the performance is measured in terms of rank-based metrics and metrics at a fixed cut-off, such as nDCG@n, P recision@n, and last relevant found [69,28]. On the other hand, when the screening problem is treated as a classification task, the performance in this case is measured based on the confusion matrix and the notions of Precision and Recall are commonly used [41,59]. One challenge arising from these two distinct approaches is the difficulty in going beyond simple effectiveness measures and comparing the real-world savings for users.…”
Section: Citation Screening Automationmentioning
confidence: 99%
See 2 more Smart Citations
“…When the screening problem is treated as a ranking task, such as screening prioritisation or stopping prediction; the performance is measured in terms of rank-based metrics and metrics at a fixed cut-off, such as nDCG@n, P recision@n, and last relevant found [69,28]. On the other hand, when the screening problem is treated as a classification task, the performance in this case is measured based on the confusion matrix and the notions of Precision and Recall are commonly used [41,59]. One challenge arising from these two distinct approaches is the difficulty in going beyond simple effectiveness measures and comparing the real-world savings for users.…”
Section: Citation Screening Automationmentioning
confidence: 99%
“…We evaluate models using nDCG@10, M AP , Recall at rank k with k in {10, 50, 100} (R@k). Additionally, we compute three measures specifically designed for the task of CS: True Negative Rate at 95% Recall (T N R@95%) [40,41], normalised Precision at 95% Recall (nP @95%) [41], and average position at which the last relevant item is found [30,31,32], calculated as a percentage of the dataset size, where a lower value indicates better performance (Last Rel).…”
Section: Baseline Experimentsmentioning
confidence: 99%
See 1 more Smart Citation
“…Automated citation screening is an umbrella term for using NLP, machine learning and information retrieval (IR) techniques with the goal of decreasing the time spent on manual screening. Classification approaches train a supervised model on an annotated dataset to determine whether a paper should be included or excluded from the review [23,24].…”
Section: Automated Citation Screeningmentioning
confidence: 99%