2014
DOI: 10.1145/2639988.2661641
|View full text |Cite
|
Sign up to set email alerts
|

Privacy, Anonymity, and Big Data in the Social Sciences

Abstract: Open data has tremendous potential for science, but, in human subjects research, there is a tension between privacy and releasing high-quality open data. Federal law governing student privacy and the release of student records suggests that anonymizing student data protects student privacy. Guided by this standard, we de-identified and released a data set from 16 MOOCs (massive open online courses) from MITx and HarvardX on the edX platform. In this article, we show that these and other de-identification proce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
11
0
1

Year Published

2015
2015
2021
2021

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(14 citation statements)
references
References 11 publications
0
11
0
1
Order By: Relevance
“…They highlighted the difficulty of ensuring complete anonymity of the data and prevent re-identification of participants in Big Data research, especially since high level of anonymization could cause the loss of essential information for the research project. The appropriate trade-off between ensuring maximum anonymization for participants while maintaining quality of the dataset is still hotly debated [ 12 ]. Growing research in data science strives towards developing data models to ensure maximum protection for participants [ 46 ].…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…They highlighted the difficulty of ensuring complete anonymity of the data and prevent re-identification of participants in Big Data research, especially since high level of anonymization could cause the loss of essential information for the research project. The appropriate trade-off between ensuring maximum anonymization for participants while maintaining quality of the dataset is still hotly debated [ 12 ]. Growing research in data science strives towards developing data models to ensure maximum protection for participants [ 46 ].…”
Section: Discussionmentioning
confidence: 99%
“…As data stemming from human interactions is more and more available to scholars, thanks to a) the increased distribution of technological devices, b) the growing use of digital services, and c) the implementation of new digital technologies [ 8 , 9 ], researchers and institutional bodies are confronted with novel ethical questions. These encompass harm, that might be caused by the linkage of publicly available datasets on research participants [ 10 ], the level of privacy users expect in digital platforms such as social media [ 11 ], the level of protection that investigators should ensure for the anonymity of their participants in research using sensing devices and tracking technologies [ 12 ], and the role of individuals in consenting in participating in large scale data studies [ 13 ].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Critiques about the lack of consistent evaluation standards across American IRBs (Green et al, 2006) and the shortcomings of IRB procedures in sociological research in the US (Schrag, 2011) have been raised also before the advent of Big Data research. However, this inadequacy is becoming more problematic in the era of Big Data since the increased possibilities to store and share big datasets are boosting academic policies for data reuse and multi-university and multi-country collaboration (Daries et al, 2014; Fenner et al, 2019). Without proper guidelines and proper harmonization, global research will progressively be hindered.…”
Section: Discussionmentioning
confidence: 99%
“…Due to the increased use of these technologies, ethical issues that researchers and ECs are usually confronted with are becoming more complex and are challenging previous mechanisms and structures. For instance, enhanced concerns regarding data protection and privacy emerge when linkage of different digital datasets might reveal sensitive information about research participants (Boyd & Crawford, 2012), or when the quality of the dataset clashes with ensuring anonymity of research participants (Daries et al, 2014). Concerns about consent are raised when data from digital spaces (e.g., social media) are used for research purposes without the subjects’ explicit consent or awareness (Henderson et al, 2013; Xafis, 2015).…”
Section: Introductionmentioning
confidence: 99%