“…While named entity recognition in text has been studied extensively in the NLP community (Mikheev et al, 1999;Florian et al, 2003;Nadeau and Sekine, 2007;Ratinov and Roth, 2009;Ritter et al, 2011;Lample et al, 2016;Chiu and Nichols, 2016;Akbik et al, 2019;Wang et al, 2021b;Yamada et al, 2020), relatively little work has been conducted on extracting named entities from speech (Kim and Woodland, 2000;Sudoh et al, 2006;Parada et al, 2011;Caubrière et al, 2020;Yadav et al, 2020;Shon et al, 2021). Recognizing named entities from speech is a more challenging task which is commonly done through a pipeline approach: combining an automatic speech recognition (ASR) system with a text-based NER model (Sudoh et al, 2006;Raymond, 2013;Jannet et al, 2015).…”