We introduce the psychometric concepts of bias and fairness in a multimodal machine learning context assessing individuals' hireability from prerecorded video interviews. We collected interviews from 733 participants and hireability ratings from a panel of trained annotators in a simulated hiring study, and then trained interpretable machine learning models on verbal, paraverbal, and visual features extracted from the videos to investigate unimodal versus multimodal bias and fairness. Our results demonstrate that, in the absence of any bias mitigation strategy, combining multiple modalities only marginally improves prediction accuracy at the cost of increasing bias and reducing fairness compared to the least biased and most fair unimodal predictor set (verbal). We further show that gendernorming predictors only reduces gender predictability for paraverbal and visual modalities, while removing gender-biased features can achieve gender blindness, minimal bias, and fairness (for all modalities except for visual) at the cost of some prediction accuracy. Overall, the reduced-feature approach using predictors from all modalities achieved the best balance between accuracy, bias, and fairness, with the verbal modality alone performing almost as well. Our analysis highlights how optimizing model prediction accuracy in isolation and in a multimodal context may cause bias, disparate impact, and potential social harm, while a more holistic optimization approach based on accuracy, bias, and fairness can avoid these pitfalls. CCS CONCEPTS• Applied computing → Law, social and behavioral sciences;• Information systems → Multimedia and multimodal retrieval; Content analysis and feature selection; • Computing methodologies → Artificial intelligence.
Background Recent advances in mobile technologies for sensing human biosignals are empowering researchers to collect real-world data outside of the laboratory, in natural settings where participants can perform their daily activities with minimal disruption. These new sensing opportunities usher a host of challenges and constraints for both researchers and participants. Objective This viewpoint paper aims to provide a comprehensive guide to aid research teams in the selection and management of sensors before beginning and while conducting human behavior studies in the wild. The guide aims to help researchers achieve satisfactory participant compliance and minimize the number of unexpected procedural outcomes. Methods This paper presents a collection of challenges, consideration criteria, and potential solutions for enabling researchers to select and manage appropriate sensors for their research studies. It explains a general data collection framework suitable for use with modern consumer sensors, enabling researchers to address many of the described challenges. In addition, it provides a description of the criteria affecting sensor selection, management, and integration that researchers should consider before beginning human behavior studies involving sensors. On the basis of a survey conducted in mid-2018, this paper further illustrates an organized snapshot of consumer-grade human sensing technologies that can be used for human behavior research in natural settings. Results The research team applied the collection of methods and criteria to a case study aimed at predicting the well-being of nurses and other staff in a hospital. Average daily compliance for sensor usage measured by the presence of data exceeding half the total possible hours each day was about 65%, yielding over 355,000 hours of usable sensor data across 212 participants. A total of 6 notable unexpected events occurred during the data collection period, all of which had minimal impact on the research project. Conclusions The satisfactory compliance rates and minimal impact of unexpected events during the case study suggest that the challenges, criteria, methods, and mitigation strategies presented as a guide for researchers are helpful for sensor selection and management in longitudinal human behavior studies in the wild.
We present a novel longitudinal multimodal corpus of physiological and behavioral data collected from direct clinical providers in a hospital workplace. We designed the study to investigate the use of off-the-shelf wearable and environmental sensors to understand individual-specific constructs such as job performance, interpersonal interaction, and well-being of hospital workers over time in their natural day-to-day job settings. We collected behavioral and physiological data from n = 212 participants through Internet-of-Things Bluetooth data hubs, wearable sensors (including a wristband, a biometrics-tracking garment, a smartphone, and an audio-feature recorder), together with a battery of surveys to assess personality traits, behavioral states, job performance, and well-being over time. Besides the default use of the data set, we envision several novel research opportunities and potential applications, including multi-modal and multi-task behavioral modeling, authentication through biometrics, and privacy-aware and privacy-preserving machine learning.
Given significant concerns about fairness and bias in the use of artificial intelligence (AI) and machine learning (ML) for assessing psychological constructs, we provide a conceptual framework for investigating and mitigating machine learning measurement bias (MLMB) from a psychometric perspective. MLMB is defined as differential functioning of the trained ML model between subgroups. MLMB can empirically manifest when a trained ML model produces different predicted score levels for individuals belonging to different subgroups (e.g., race, gender) despite them having the same ground truth level for the underlying construct of interest (e.g., personality), and/or when the model yields differential predictive accuracies across the subgroups. Because the development of ML models involves both data and algorithms, both biased data and algorithm training bias are potential sources of MLMB. Data bias can occur in the form of nonequivalence between subgroups in the ground truth, platform-based construct, behavioral expression, and/or feature computing. Algorithm training bias can occur when algorithms are developed with nonequivalence in the relation between extracted features and ground truth (i.e., algorithm features are differentially used, weighted, or transformed between subgroups). We explain how these potential sources of bias may manifest during ML model development and share initial ideas on how to mitigate them, recognizing that the development of new statistical and algorithmic procedures will need to follow. We also discuss how this framework brings clarity to MLMB but does not reduce the complexity of the issue.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.