The Oxford Handbook of the History of English 2012
DOI: 10.1093/oxfordhb/9780199922765.013.0014
|View full text |Cite
|
Sign up to set email alerts
|

Variability-based neighbor clustering: A bottom-up approach to periodization in historical linguistics

Abstract: One of the linguistic sub-disciplines that has benefited most from corpora is historical linguistics, and many new diachronic resources have become available especially during the last ten to fifteen years. However, the longitudinal nature and the more constrained sampling of diachronic corpora present a number of challenges for historical linguists. One central problem that has received little attention is the periodization of a linguistic phenomenon P; that is, how the development of P over time can be divid… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0
1

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 53 publications
(8 citation statements)
references
References 7 publications
0
7
0
1
Order By: Relevance
“…The variable PERIOD represents larger temporal bins, which were determined by Variability-based Neighbor Clustering (VNC; Gries and Hilpert 2012). VNC clusters adjacent decades in a bottom-up fashion, based on their similarity in relative frequency, in order to avoid arbitrary periodization.…”
Section: Datamentioning
confidence: 99%
“…The variable PERIOD represents larger temporal bins, which were determined by Variability-based Neighbor Clustering (VNC; Gries and Hilpert 2012). VNC clusters adjacent decades in a bottom-up fashion, based on their similarity in relative frequency, in order to avoid arbitrary periodization.…”
Section: Datamentioning
confidence: 99%
“…3Descriptive empirical research shows the extent to which the trajectories of individual changes can alternate between phases of rapid, abrupt change and more or less stable frequency plateaus. This variability may correspond to data granularity, on the one hand, reflecting the structure and coverage of the database, and the quantitative methods and statistical models adopted (Gries and Hilpert 2012;Nevalainen 2015b). Importantly, however, the observed variation can be a reflection of various language-external influences, which impact on the pace of linguistic change either in the short term or over a longer period of time.…”
Section: Historical Comparative Approachesmentioning
confidence: 99%
“…BoilerP lates ← Conflate(P hrases) 10 return Boilerplates 11 Function FindFrequentPhrases(T exts) 12 Initialize dictionary P hrases(phrase, counter) 13 for i ← 1 to T do BoilerP lates ← FindBoilerPlates(T exts) 23 T exts ← [T exts \ BoilerP lates] 24 XtremF reqP hrases ← FindFrequentPhrases(T exts ) 25 T ext ← ConflateFreqPhrasesIntoSingleWords(T exts , XtremF reqP hrases) 26 SkipGrams ← ComputeSkipGrams(T exts ) 27 SkipGramsHashes ← HashSkipGrams(SkipGrams) 28 Initialize list ReusedF ragments 29 for i ← 1 to T do 30 for j ← 1 to T do 31 BaseM atches ← GetMatchingHashes(T i , T j , SkipGramHashes) 32 F ullM atches ← ExtendMatches(BaseM atches) 33 Append(ReusedF ragments, F ullM atches) between the 20-gram segments) and the resulting text fragments are marked as boilerplate passages. These text fragments are ignored by the subsequent stages of the approximate-matching algorithm, allowing them to focus on the more meaningful parts of the text, without getting bogged down in these commonly-recurring exact matches.…”
Section: Identification Of Boilerplate Passagesmentioning
confidence: 99%
“…Below we discuss specific instantiations of these functions. Our algorithm can be seen as a word-embedding-based variant of the Variability-based Neighbor Clustering (VNC) algorithm for periodization [26], where we replace the measure of variability by a distance measure based on word-embedding matrices. Word embeddings are attractive to use for this purpose because they provide a soft notion of language use, with similar words having similar vectors in the word embedding space [47].…”
Section: Word-embedding-based Neighbor Clusteringmentioning
confidence: 99%