Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1513
|View full text |Cite
|
Sign up to set email alerts
|

Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction

Abstract: While the fast-paced inception of novel tasks and new datasets helps foster active research in a community towards interesting directions, keeping track of the abundance of research activity in different areas on different datasets is likely to become increasingly difficult. The community could greatly benefit from an automatic system able to summarize scientific results, e.g., in the form of a leaderboard. In this paper we build two datasets and develop a framework (TDMS-IE) aimed at automatically extracting … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
56
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 57 publications
(68 citation statements)
references
References 19 publications
0
56
0
Order By: Relevance
“…We now evaluate the end-to-end performance of AXCELL on the results extraction task. We evaluate on two datasets: the NLP-TDMS dataset introduced in Hou et al (2019), in order to compare our method to the state of the art, and on our PWC LEADERBOARDS dataset, which contains many more leaderboards and acts as a more challenging benchmark.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…We now evaluate the end-to-end performance of AXCELL on the results extraction task. We evaluate on two datasets: the NLP-TDMS dataset introduced in Hou et al (2019), in order to compare our method to the state of the art, and on our PWC LEADERBOARDS dataset, which contains many more leaderboards and acts as a more challenging benchmark.…”
Section: Methodsmentioning
confidence: 99%
“…Closer to our formulation, Hou et al (2019) extract absolute metric values alongside the metric name, task and dataset. They also use text excerpts as well as direct tabular information to make inferences for table contents.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…The textual entailment approach forces our model to focus on learning the similarity patterns between text and various triples. We trained our module on a dataset consisting of 332 papers in the NLP domain, and it achieves a macro-F1 score of 56.6 and a micro-F1 score of 66.0 for predicting TDM triples on a testing dataset containing 162 papers (Hou et al, 2019). In total, our system indexed 872 tasks, 345 datasets, and 62 metrics from the entire corpus.…”
Section: Ingestion Pipelinementioning
confidence: 99%