2021
DOI: 10.31235/osf.io/9etqm
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Big Data, Big Noise: The Challenge of Finding Issue Networks on the Web

Abstract: In this paper, we focus on noise in the sense of irrelevant information in a data set as a specific methodological challenge of web research in the era of big data. We empirically evaluate several methods for filtering hyperlink networks in order to reconstruct networks that contain only web pages that deal with a particular issue. The test corpus of web pages was collected from hyperlink networks on the issue of food safety in the United States and Germany. We applied three filtering strategies and evaluated … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…Without a proper implementation of the data minimisation principle and the specific safeguards contained in the Europol Regulation, data subjects run the risk of wrongfully being linked to a criminal activity across the EU, with all of the potential damage for their personal and family life, freedom of movement and occupation that this entails. 30 The big data challenge was the subject of further consultations between Europol and the EDPS, which resulted in, inter alia, introducing a pre-analysis of personal data received with the sole purpose of determining whether such data fall into the categories of data subjects. This, however, has raised further concerns relating to maximum retention period for data sets lacking Data Subject Categorisation, 31 and the consultations in this regard are still ongoing.…”
Section: Data Protectionmentioning
confidence: 99%
See 2 more Smart Citations
“…Without a proper implementation of the data minimisation principle and the specific safeguards contained in the Europol Regulation, data subjects run the risk of wrongfully being linked to a criminal activity across the EU, with all of the potential damage for their personal and family life, freedom of movement and occupation that this entails. 30 The big data challenge was the subject of further consultations between Europol and the EDPS, which resulted in, inter alia, introducing a pre-analysis of personal data received with the sole purpose of determining whether such data fall into the categories of data subjects. This, however, has raised further concerns relating to maximum retention period for data sets lacking Data Subject Categorisation, 31 and the consultations in this regard are still ongoing.…”
Section: Data Protectionmentioning
confidence: 99%
“…29 European Data Protection Supervisor [10], Point 4.7. 30 European Data Protection Supervisor [10], Point 4.10. 31 European Data Protection Supervisor [11].…”
Section: Data Protectionmentioning
confidence: 99%
See 1 more Smart Citation
“…Although research that exploits data sets with many variables to identify subgroups is promising, it also comes with challenges. One of the most compelling challenges, as stressed by a number of scholars (e.g., Yarkoni and Westfall, 2017;Waldherr et al, 2017;Bzdok & Meyer-Lindenberg, 2018), is that these data sets may comprise a large amount of "irrelevant variables" (Fowlkes & Mallows, 1983). They are variables that do not separate clusters well and therefore do not define cluster structure.…”
Section: Introductionmentioning
confidence: 99%