2021
DOI: 10.48550/arxiv.2104.03702
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Media Cloud: Massive Open Source Collection of Global News on the Open Web

Abstract: We present the first full description of Media Cloud, an open source platform based on crawling hyperlink structure in operation for over 10 years, that for many uses will be the best way to collect data for studying the media ecosystem on the open web. We document the key choices behind what data Media Cloud collects and stores, how it processes and organizes these data, and its open API access as well as userfacing tools. We also highlight the strengths and limitations of the Media Cloud collection strategy … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 21 publications
0
3
0
Order By: Relevance
“…Media Cloud is an open source platform that is used for "collect[ing] data for studying the media ecosystem on the open web" (Roberts et al 2021). The platform includes several web-based tools that operate on a stored set of media data.…”
Section: Related Datasetsmentioning
confidence: 99%
“…Media Cloud is an open source platform that is used for "collect[ing] data for studying the media ecosystem on the open web" (Roberts et al 2021). The platform includes several web-based tools that operate on a stored set of media data.…”
Section: Related Datasetsmentioning
confidence: 99%
“…Second, Media Cloud is a platform that has provided news data from a wide range of national and international outlets since 2011 (Roberts et al 2021). Similar to the NELA datasets, Media Cloud collects news article data and publication metadata, but it does not provide the full text data.…”
Section: Related Datasets and Resourcesmentioning
confidence: 99%
“…To the best of our knowledge, our data is the largest existing, publicly available collection of local news content. The closest publicly available dataset to ours is from Media-Cloud (Roberts et al 2021). While our dataset shares outlets in common with MediaCloud, the local U.S. outlets covered by the latter are embedded almost exclusively in large population centers, while our dataset covers local media outlets from both large and small population areas.…”
Section: Local News Datamentioning
confidence: 99%