Dolores Lemmenmeier-Batinić scite author profile

Dolores Lemmenmeier-Batinić

3Publications

3Citation Statements Received

44Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Zurich, Zurich University of Teacher Education

Publications

Order By: Most citations

Converting raw transcripts into an annotated and turn-aligned TEI-XML corpus: the example of the Corpus of Serbian Forms of Address

Lemmenmeier-Batinić

2021

SLO2.0

View full text Add to dashboard Cite

This paper describes the procedure of building a TEI-XML corpus of spoken Serbian starting from raw transcripts. The corpus consists of semi–structured interviews, which were gathered with the aim of investigating forms of address in Serbian. The interviews were thoroughly transcribed according to GAT transcribing conventions. However, the transcription was carried out without tools that would control the validity of the GAT syntax, or align the transcript with the audio records. In order to offer this resource to a broader audience, we resolved the inconsistencies in the original transcripts, normalised the semi-orthographic transcriptions and converted the corpus into a TEI-format for transcriptions of speech. Further, we enriched the corpus by tagging and lemmatising the data. Lastly, we aligned the corpus turns to the corresponding audio segments by using a force-alignment tool. In addition to presenting the main steps involved in converting the corpus to the XML-format, this paper also discusses current challenges in the processing of spoken data, and the implications of data re-use regarding transcriptions of speech. This corpus can be used for studying Serbian from the perspective of interactional linguistics, for investigating morphosyntax, grammar, lexicon and phonetics of spoken Serbian, for studying disfluencies, as well as for testing models for automatic speech recognition and forced alignment. The corpus is freely available for research purposes.

show abstract

Lexical Explorer: extending access to the Database for Spoken German for user-specific purposes

Lemmenmeier-Batinić¹

2020

Corpora

View full text Add to dashboard Cite

This paper presents Lexical Explorer, 2 a tool that allows interactive browsing and filtering of quantitative corpus information. It further describes how this tool can be used to support linguistic work on corpora of spoken German. By using Lexical Explorer, users can analyse quantitative corpus data by interacting with frequency tables and obtaining customised word profiles of word distribution across word form variation, co-occurrences and metadata. Interaction with corpus examples of particular corpus counts is also enabled. Lexical Explorer was developed as a prototype for user-specific corpus access and is aimed at researchers of German lexicon in spoken interaction. Although Lexical Explorer was developed on the basis of two small speech corpora of the German language, the underlying principle of this tool can be easily adapted to other corpora and other user groups. Moreover, the tool can be used to gain insights into the corpus structure as well as to study and verify corpus content in a transparent and user-friendly way.

show abstract

Map Task Corpus of Heritage BCMS spoken by second-generation speakers in Switzerland

Lemmenmeier-Batinić

Batinić

Escher

2023

Lang Resources & Evaluation

View full text Add to dashboard Cite

In this paper, we present a corpus for heritage Bosnian/Croatian/Montenegrin/Serbian (BCMS) spoken in German-speaking Switzerland. The corpus consists of elicited conversations between 29 second-generation speakers originating from different regions of former Yugoslavia. In total, the corpus contains 30 turn-aligned transcripts with an average length of 6 min. It is enriched with extensive speakers’ metadata, annotations, and pre-calculated corpus counts. The corpus can be accessed through an interactive corpus platform that allows for browsing, querying, and filtering, but also for creating and sharing custom annotations. Principal user groups we address with this corpus are researchers of heritage BCMS, as well as students and teachers of BCMS living in diaspora. In addition to introducing the corpus platform and the workflows we adopted to create it, we also present a case study on BCMS spoken by a pair of siblings who participated in the map task, and discuss advantages and challenges of using this corpus platform for linguistic research.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.