Microaggressions are subtle, often veiled, manifestations of human biases. These uncivil interactions can have a powerful negative impact on people by marginalizing minorities and disadvantaged groups. The linguistic subtlety of microaggressions in communication has made it difficult for researchers to analyze their exact nature, and to quantify and extract microaggressions automatically. Specifically, the lack of a corpus of real-world microaggressions and well-defined criteria for annotating them have prevented researchers from addressing these problems at scale. In this paper, we devise a general but nuanced, computationally operationalizable typology of microaggressions based on a small subset of microaggression data that we have. We then create two datasets: one with examples of diverse types of microaggressions recollected by their targets, and another with gender-based microaggressions in public conversations on social media. We introduce a new, more objective criterion for annotation and an activelearning based procedure that increases the likelihood of surfacing posts containing microaggressions. Finally, we analyze the trends that emerge from these new datasets.
Advances in speech and language processing have enabled the creation of applications that could, in principle, accelerate the process of language documentation, as speech communities and linguists work on urgent language documentation and reclamation projects. However, such systems have yet to make a significant impact on language documentation, as resource requirements limit the broad applicability of these new techniques. We aim to exploit the framework of shared tasks to focus the technology research community on tasks which address key pain points in language documentation. Here we present initial steps in the implementation of these new shared tasks, through the creation of data sets drawn from endangered language repositories and baseline systems to perform segmentation and speaker labeling of these audio recordings—important enabling steps in the documentation process. This paper motivates these tasks with a use case, describes data set curation and baseline systems, and presents results on this data. We then highlight the challenges and ethical considerations in developing these speech processing tools and tasks to support endangered language documentation.
Our study tests the acoustic fidelity of remote recordings, using a large variety of stimuli and recording environments. With standard recording environments not available due to COVID-19, more studies investigate remote recordings for acoustic analyses [e.g., Guan and Li (2021); Freeman and De Decker (2021)]. High fidelity remote recordings also support crucial uses like reaching isolated populations and more speakers. A 188-word list was constructed from each English consonant followed by each vowel. Words recorded by a male and female speaker in a sound attenuated booth were input for test recordings. Stimuli were recorded on six devices across five operating systems, four tele-conferencing platforms, and three browsers, using internal and external microphones. Acoustic analysis investigates the impact of these recording configurations on features including pitch, relative intensity, vowel formant measures, spectral moments, spectral tilt, spectral rolloff, and Mel Frequency Cepstral Coefficients. Preliminary analyses found durational differences between original and test-recorded stimuli, posing challenges for automatic segmentation and alignment. Aperiodic noise was also introduced. We hypothesize further distortions in other measures. The findings from this study will allow us to identify acoustic measures which are robust across varied remote recording conditions and to highlight configurations least likely to introduce problematic artifacts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.