Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH) 2014
DOI: 10.3115/v1/w14-0609
|View full text |Cite
|
Sign up to set email alerts
|

Mining the Twentieth Century’s History from the Time Magazine Corpus

Abstract: In this paper we report on an explorative study of the history of the twentieth century from a lexical point of view. As data, we use a diachronic collection of 270,000+ English-language articles harvested from the electronic archive of the well-known Time Magazine (1923Magazine ( -2006. We attempt to automatically identify significant shifts in the vocabulary used in this corpus using efficient, yet unsupervised computational methods, such as Parsimonious Language Models. We offer a qualitative interpretation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 22 publications
(6 citation statements)
references
References 13 publications
0
6
0
Order By: Relevance
“…Our approach is somewhat similar in that the corpus of historical editions of Britannica can be considered a knowledge base, with an important difference being a chronological dimension, absent from such knowledge bases as YAGO or DBpedia. An example of mining a historical corpus for trends using the frequentist approaches to vocabulary shifts as well as normalization to structured sources can be found in the recent work on newspaper and journal historical editions such as Kestemont et al (2014) and Huet et al (2013).…”
Section: Related Workmentioning
confidence: 99%
“…Our approach is somewhat similar in that the corpus of historical editions of Britannica can be considered a knowledge base, with an important difference being a chronological dimension, absent from such knowledge bases as YAGO or DBpedia. An example of mining a historical corpus for trends using the frequentist approaches to vocabulary shifts as well as normalization to structured sources can be found in the recent work on newspaper and journal historical editions such as Kestemont et al (2014) and Huet et al (2013).…”
Section: Related Workmentioning
confidence: 99%
“…By embedding the co-occurrence structure in a lowdimensional space, it has been shown that newspaper content reflects fundamental cultural movements and understandings from the 19 th century onward [20] and that these contextdependent representations are sensitive to cultural bias as reflected in newspapers [23]. In continuation of the 'Culturomics' movement that used Google Books to show how lexical variation is sensitive to events [24], a wide range of studies has demonstrated that simple word and concept frequencies are sufficient for robust offline detection of major historical events [25] and can be used to model the evolution of complicated cultural processes such as the historical interdependencies between media and politics [26]. Fluctuations of time-dependent word frequencies have been shown to discriminate between classes of events that have class-specific fractal signatures, where the social-cultural class displays non-stationary and on-off intermittent behavior [27].…”
Section: Introductionmentioning
confidence: 99%
“…1 This momentum is particularly vivid in the domain of digitized newspaper archives for which there has been a notable increase of research initiatives over the last years. Those range from individual works dedicated to the development of tools [Yang et al, 2011, Dinarelli and Rosset, 2012, Moreux, 2016, Wevers, 2019 or the usage of those tools [Kestemont et al, 2014, Lansdall-Welfare et al, 2017, to evaluation campaigns [Rigaud et al, 2019, Clausner et al, 2019, including the emergence of large consortia projects seeking to apply computational methods to historical newspapers at scale, such as ViralTexts 2 , Oceanic Exchanges 3 , impresso 4 , NewsEye 5 , and Living with Machines 6 [Ridge et al, 2019].…”
Section: Introductionmentioning
confidence: 99%