2008
DOI: 10.3366/e1749503208000075
|View full text |Cite
|
Sign up to set email alerts
|

The identification of stages in diachronic data: variability-based neighbour clustering

Abstract: In this paper, we introduce a data-driven bottom-up clustering method for the identification of stages in diachronic corpus data that differ from each other quantitatively. Much like regular approaches to hierarchical clustering, it is based on identifying and merging the most cohesive groups of data points, but, unlike regular approaches to clustering, it allows for the merging of temporally adjacent data, thus, in effect, preserving the chronological order. We exemplify the method with two case studies, one … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
56
0
3

Year Published

2009
2009
2023
2023

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 187 publications
(59 citation statements)
references
References 8 publications
0
56
0
3
Order By: Relevance
“…The same holds for all other studies in which different recordings/files are associated with quantitative data, opening up new areas of exploration also in, for instance, diachronic studies. For example, in the domain of historical linguistics, Gries and Hilpert (2008) discuss how VNC can be used to determine different historical stages in the development of the English auxiliary shall and the English present perfect.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The same holds for all other studies in which different recordings/files are associated with quantitative data, opening up new areas of exploration also in, for instance, diachronic studies. For example, in the domain of historical linguistics, Gries and Hilpert (2008) discuss how VNC can be used to determine different historical stages in the development of the English auxiliary shall and the English present perfect.…”
Section: Resultsmentioning
confidence: 99%
“…Crucially, his corpora cover six successive 70-year periods from 1500 to 1920, but to arrive at larger sample sizes for his statistical tests, Hilpert collapsed these into three consecutive 140-year periods without testing whether this merging of data is in fact warranted (cf. Gries & Hilpert, 2008).…”
Section: Arbitrariness Problemmentioning
confidence: 95%
“…ln Gries and Hilpert (2010) this uas in fact made ordinal stages more predictable than interval-scaled exact years. )la While these features motivate periodization, it is important to realize that what is needed is an approach that is objective (rather than based on subjective eyeballing), as well a~ data-driven and ~henomenon-specific (rather than based on eriods from theoretical accounts or dtfferent phenomena or even just convenient fhalf-)centu ry splits).…”
Section: Motivations For Vncmentioning
confidence: 97%
“…This is compelling evidence that these one-year recordings represent anomalies, and given that more than 200 data points (covering more than 250 years) enter into the analysis, the exclusion of two anomalies does not damage the substance of the database much, but excludes a huge amount of noise from the data. Gries and Hilpert (2010) thus discarded these two years from the database and re-ran the VNC algorithm, which revealed another one-year anomaly cluster (1649), which was subsequently removed from the analysis. Removal of these three dala points yielded the five-cluster periodization shown in Table 4· A remaining oddity of Lhe data is Lhe ~econd period, which consists of only four years (1479-82) a nd shows a markedly lower frequency of the interdental variant than both of its neighbors.…”
Section: · the Application Of Vncmentioning
confidence: 99%
“…One such approach is variability-based neighbor clustering (VNC; see Gries & Hilpert 2008). VNC differs from traditional clustering approaches in that it only permits temporally adjacent data points to be clustered together.…”
Section: Temporally-ordered Data and The Problem Of Identifying Stagesmentioning
confidence: 99%