All language is characterised by variation which language users employ to construct complex social identities and express social meaning. Like other machine learning technologies, speech and language technologies (re)produce structural oppression when they perform worse for marginalised language communities. Using knowledge and theories from sociolinguistics, I explore why commercial automatic speech recognition systems and other language technologies perform significantly worse for already marginalised populations, such as second-language speakers and speakers of stigmatised varieties of English in the British Isles. Situating language technologies within the broader scholarship around algorithmic bias, consider the allocative and representational harms they can cause even (and perhaps especially) in systems which do not exhibit predictive bias, narrowly defined as differential performance between groups. This raises the question whether addressing or "fixing" this "bias" is actually always equivalent to mitigating the harms algorithmic systems can cause, in particular to marginalised communities. CCS CONCEPTS• Computing methodologies → Natural language processing; Speech recognition.
The Lothian Diary Project is an interdisciplinary effort to collect self-recorded audio or video diaries of people’s experiences of COVID-19 in and around Edinburgh, Scotland. In this paper we describe how the project emerged from a desire to support community members. The diaries have been disseminated through public events, a website, an oral history project, and engagement with policymakers. The data collection method encouraged the participation of people with disabilities, racialized individuals, immigrants, and low-proficiency English/Scots speakers, all of whom are more likely to be negatively affected by COVID-19. This is of interest to sociolinguists, given that these groups have been under-represented in previous studies of linguistic variation in Edinburgh. We detail our programme of partnering with local charities to help ensure that digitally disadvantaged groups and their caregivers are represented. Accompanying survey and demographic data means that this self-recorded speech can be used to complement existing Edinburgh speech corpora. Additional sociolinguistic goals include a narrative analysis and a stylistic analysis, to characterize how different people engage creatively with the act of creating a COVID-19 diary, especially as compared to vlogs and other video diaries.
The COVID-19 pandemic brought about a profound change to the organization of space and time in our daily lives. In this paper we analyze the self-recorded audio/video diaries made by residents of Edinburgh and the Lothian counties during the first national lockdown. We identify three ways in which diarists describe a shift in place-time, or “chronotope”, in lockdown. We argue that the act of making a diary for an audience of the future prompts diarists to contrast different chronotopes, and each of these orientations illuminates the differential impact of the COVID-19 lockdowns across the community.
The ongoing Lothian Diary Project consists of 125+ audio/video recordings collected since May 2020 from residents of Edinburgh and the Lothian counties in Scotland. The diaries comprise self-recorded monologues or semi-structured interviews in which participants discuss their experiences during different stages of the COVID-19 pandemic. Recordings were uploaded to an online survey that also collected consent, demographic information, and opinion regarding Covid-related policies. All data marked for reuse are and will be housed in the University of Edinburgh's DataShare and DataVault repositories. A partial deposit is available now and another will be made available upon completion of data collection. Data from consenting participants will form an oral history archive with Museums and Galleries, Edinburgh.
Despite the fact that variation is a fundamental characteristic of natural language, automatic speech recognition systems perform systematically worse on non-standardised and marginalised language varieties. In this paper we use the lens of language policy to analyse how current practices in training and testing ASR systems in industry lead to the data bias giving rise to these systematic error differences. We believe that this is a useful perspective for speech and language technology practitioners to understand the origins and harms of algorithmic bias, and how they can mitigate it. We also propose a re-framing of language resources as (public) infrastructure which should not solely be designed for markets, but for, and with meaningful cooperation of, speech communities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.