Health-related data (HRD) about individuals are increasingly generated and processed. The sources and volume of such data have grown larger over the past years, they include wearable devices, health-related mobile apps, and electronic health records. HRD are sensitive, have important privacy implications, hence hold a special status under existing privacy laws and regulations. In this work, we focus on shadow HRD: these HRD are generated and/or processed by individuals by using general-purpose digital tools outside of a professional healthcare information system. Some examples are health-related queries made by individuals on general-purpose search engines and LLM-based chatbots, or medical appointments and contact information of health professionals synced to the cloud. Such data, and the privacy risks stemming from them, are often overlooked when studying digital health. Using information from two focus group sessions (23 participants in total), we identified and categorized a broad variety of user behaviors that, including the aforementioned examples, lead to the creation of shadow HRD. Then, informed by this categorization, we designed a questionnaire and deployed it through an online survey (300 respondents) to assess the prevalence of such behaviors among the general public, as well as user awareness of (and concerns about) the privacy risks stemming from their shadow HRD. Our findings show that most respondents adopt numerous and diverse behaviors that create shadow HRD, and that very few resort to mechanisms to protect their privacy.
CCS CONCEPTS• Security and privacy → Human and societal aspects of security and privacy; • Applied computing → Health informatics; • Human-centered computing → Empirical studies in HCI.