Coreference Resolution through a seq2seq Transition-Based System

Bohnet, Bernd; Alberti, Chris; Collins, Michael

doi:10.1162/tacl_a_00543

Cited by 9 publications

(14 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As shown in Table 4, Bohnet and others [26] and Liu and others [25] reported average F1 scores of 83.3 and 82.3 for CR, respectively. However, to achieve such a high performance, they used mT5 XXL (LM with a parameter size of 13 billion) and T0 3B (LM with a parameter size of 3 billion).…”

Section: Resultsmentioning

confidence: 94%

See 1 more Smart Citation

CR‐M‐SpanBERT: Multiple embedding‐based DNN coreference resolution using self‐attention SpanBERT

Jung

2024

ETRI Journal

View full text Add to dashboard Cite

This study introduces CR‐M‐SpanBERT, a coreference resolution (CR) model that utilizes multiple embedding‐based span bidirectional encoder representations from transformers, for antecedent recognition in natural language (NL) text. Information extraction studies aimed to extract knowledge from NL text autonomously and cost‐effectively. However, the extracted information may not represent knowledge accurately owing to the presence of ambiguous entities. Therefore, we propose a CR model that identifies mentions referring to the same entity in NL text. In the case of CR, it is necessary to understand both the syntax and semantics of the NL text simultaneously. Therefore, multiple embeddings are generated for CR, which can include syntactic and semantic information for each word. We evaluate the effectiveness of CR‐M‐SpanBERT by comparing it to a model that uses SpanBERT as the language model in CR studies. The results demonstrate that our proposed deep neural network model achieves high‐recognition accuracy for extracting antecedents from NL text. Additionally, it requires fewer epochs to achieve an average F1 accuracy greater than 75% compared with the conventional SpanBERT approach.

show abstract

Section: Resultsmentioning

confidence: 94%

“…Various language models (LMs), such as LSTM [18], BERT [19], SpanBERT [21], T‐zero (T0) [25], and multilingual pretrained text‐to‐text transfer transformer (mT5) [26], have been utilized to enhance the performance of CR. In CR, mention clusters in NL texts are recognized using the output embeddings of the LM.…”

Section: Cr‐m‐spanbertmentioning

confidence: 99%

CR‐M‐SpanBERT: Multiple embedding‐based DNN coreference resolution using self‐attention SpanBERT

Jung

2024

ETRI Journal

View full text Add to dashboard Cite

show abstract

“…The current state of the art in the field is presented in [51]. This paper presents a simplified text-to-text (seq2seq) method for Coreference Resolution that synergizes with modern encoder-decoder or decoder-only models.…”

Section: Coreference Resolutionmentioning

confidence: 99%

A Survey on Challenges and Advances in Natural Language Processing with a Focus on Legal Informatics and Low-Resource Languages

Krasadakis,

Sakkopoulos,

Verykios

2024

Electronics

View full text Add to dashboard Cite

The field of Natural Language Processing (NLP) has experienced significant growth in recent years, largely due to advancements in Deep Learning technology and especially Large Language Models. These improvements have allowed for the development of new models and architectures that have been successfully applied in various real-world applications. Despite this progress, the field of Legal Informatics has been slow to adopt these techniques. In this study, we conducted an extensive literature review of NLP research focused on legislative documents. We present the current state-of-the-art NLP tasks related to Law Consolidation, highlighting the challenges that arise in low-resource languages. Our goal is to outline the difficulties faced by this field and the methods that have been developed to overcome them. Finally, we provide examples of NLP implementations in the legal domain and discuss potential future directions.

show abstract

“…The recent work of Bohnet et al (2023) pushes the end-to-end approach even further, solving both mention detection and coreference linking jointly via a text-to-text paradigm, reaching state-of-the-art results on the CoNLL 2012 dataset (Pradhan et al, 2012). Given that our system uses the same pretrained encoder but a custom decoder designed specifically for coreference resolution instead of a general but pretrained decoder, it would be interesting to perform a direct comparison of these systems.…”

Section: Related Workmentioning

confidence: 99%

“…In the original architecture, we employed largesized models XLM-R large (Conneau et al, 2020) and RemBERT (Chung et al, 2021). However, even bigger models consistently deliver better performance in various applications (Kale and Rastogi, 2020;Xue et al, 2021;Rothe et al, 2021;Bohnet et al, 2023). We therefore decided to utilize the largest possible pretrained multilingual model.…”

Section: The Mt5 Pretrained Modelsmentioning

confidence: 99%

ÚFAL CorPipe at CRAC 2023: Larger Context Improves Multilingual Coreference Resolution

Straka

2023

Proceedings of the CRAC 2023 Shared Task on Multilingual Coreference Resolution

View full text Add to dashboard Cite

We present CorPipe, the winning entry to the CRAC 2023 Shared Task on Multilingual Coreference Resolution. Our system is an improved version of our earlier multilingual coreference pipeline, and it surpasses other participants by a large margin of 4.5 percent points. CorPipe first performs mention detection, followed by coreference linking via an antecedent-maximization approach on the retrieved spans. Both tasks are trained jointly on all available corpora using a shared pretrained language model. Our main improvements comprise inputs larger than 512 subwords and changing the mention decoding to support ensembling. The source code is available at https://github.com/ufal/crac2023-corpipe.

show abstract

Coreference Resolution through a seq2seq Transition-Based System

Cited by 9 publications

References 13 publications

CR‐M‐SpanBERT: Multiple embedding‐based DNN coreference resolution using self‐attention SpanBERT

CR‐M‐SpanBERT: Multiple embedding‐based DNN coreference resolution using self‐attention SpanBERT

A Survey on Challenges and Advances in Natural Language Processing with a Focus on Legal Informatics and Low-Resource Languages

ÚFAL CorPipe at CRAC 2023: Larger Context Improves Multilingual Coreference Resolution

Contact Info

Product

Resources

About