Growing research on re-identification through app usage behavior reveals the privacy threat in having smartphone usage data to third parties. However, re-identifiability of a vulnerable group like the depressed is unexplored. We fill this knowledge gap through an in the wild study on 100 students' PHQ-9 scale's data and 7 days' logged app usage data. We quantify the uniqueness and re-identifiability through exploration of minimum hamming distance in terms of the set of used apps. Our findings show that using app usage data, each of the depressed and non-depressed students is re-identifiable. In fact, using only 7 h' data of a week, on average, 91% of the depressed and 88% of the non-depressed are re-identifiable. Moreover, data of a single app category (i.e., Tools) can also be used to re-identify each depressed student. Furthermore, we find that the rate of uniqueness among the depressed students is significantly higher in some app categories. For instance, in the Social Media category, the rate of uniqueness is 9% higher (P = .02, Cohen's d = 1.31) and in the Health & Fitness category, this rate is 8% higher (P = .005, Cohen's d = 1.47) than the non-depressed group. Our findings suggest that each of the depressed students has a unique app signature which makes them re-identifiable. Therefore, during the design of the privacy protecting systems, designers need to consider the uniqueness of them to ensure better privacy for this vulnerable group.
Background Existing robust, pervasive device-based systems developed in recent years to detect depression require data collected over a long period and may not be effective in cases where early detection is crucial. Additionally, due to the requirement of running systems in the background for prolonged periods, existing systems can be resource inefficient. As a result, these systems can be infeasible in low-resource settings. Objective Our main objective was to develop a minimalistic system to identify depression using data retrieved in the fastest possible time. Another objective was to explain the machine learning (ML) models that were best for identifying depression. Methods We developed a fast tool that retrieves the past 7 days’ app usage data in 1 second (mean 0.31, SD 1.10 seconds). A total of 100 students from Bangladesh participated in our study, and our tool collected their app usage data and responses to the Patient Health Questionnaire-9. To identify depressed and nondepressed students, we developed a diverse set of ML models: linear, tree-based, and neural network–based models. We selected important features using the stable approach, along with 3 main types of feature selection (FS) approaches: filter, wrapper, and embedded methods. We developed and validated the models using the nested cross-validation method. Additionally, we explained the best ML models through the Shapley additive explanations (SHAP) method. Results Leveraging only the app usage data retrieved in 1 second, our light gradient boosting machine model used the important features selected by the stable FS approach and correctly identified 82.4% (n=42) of depressed students (precision=75%, F1-score=78.5%). Moreover, after comprehensive exploration, we presented a parsimonious stacking model where around 5 features selected by the all-relevant FS approach Boruta were used in each iteration of validation and showed a maximum precision of 77.4% (balanced accuracy=77.9%). Feature importance analysis suggested app usage behavioral markers containing diurnal usage patterns as being more important than aggregated data-based markers. In addition, a SHAP analysis of our best models presented behavioral markers that were related to depression. For instance, students who were not depressed spent more time on education apps on weekdays, whereas those who were depressed used a higher number of photo and video apps and also had a higher deviation in using photo and video apps over the morning, afternoon, evening, and night time periods of the weekend. Conclusions Due to our system’s fast and minimalistic nature, it may make a worthwhile contribution to identifying depression in underdeveloped and developing regions. In addition, our detailed discussion about the implication of our findings can facilitate the development of less resource-intensive systems to better understand students who are depressed and take steps for intervention.
The resource-constrained nature in developing regions and also the positive impact of early intervention shows the need for a minimal and faster system to identify loneliness. However, existing pervasive device-based promising systems’ requirement to run in the background for prolonged periods can be costly in terms of resources and also may not be effective for early intervention. Thus, we do a study (N=105) in Bangladesh by developing a minimal system that can retrieve the past 7 days’ app usage behavioral data within a second (Mean=0.31 second, SD=1.1 second). Leveraging only the instantly accessed data, we developed models through features selected by 3 different methods and exploration of 14 diverse machine learning (ML) algorithms including 8-tree-based algorithms. We found that the Gaussian Naïve Bayes model, developed by filter method Information Gain selected features, can identify 90.7% of lonely participants correctly with an F1 score of 82.4%. Through SHapley Additive exPlanations (SHAP), we explained the ML models showing how the features impacted the model’s outcome. Due to being minimal, faster, and explainable, our system can play a role in resource-limited settings for early identification of loneliness which may create a positive impact by mitigating the loneliness rate.
While there are studies exploring the relation of Games with depression, none of the studies used objective data of Games app usage which could provide unbiased and real-time insights. To fill this research gap, we developed an app that retrieves the past 7 days’ actual app usage data accurately. In our study (N=100), the app retrieved 817,404 foreground and background app events’ data from which we extracted the behavioral markers of Games app usage. To explore the relation between Games and depression, we mined rules, did correlation analysis, and built Bayesian networks. We found that the students who spent higher time and had a higher launch per Games app on weekends were more likely to be depressed (p<.05). Bayesian analysis showed while some usage behavior impacts depression, depression also impacts some behavior of Games app usage. Apart from raising awareness about the negative impact of Games, insights from our study can facilitate design of systems to improve mental health.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.