2011
DOI: 10.1515/flin.2011.002
|View full text |Cite
|
Sign up to set email alerts
|

The indeterminacy of word segmentation and the nature of morphology and syntax

Abstract: The general distinction between morphology and syntax is widely taken for granted, but it crucially depends on a cross-linguistically valid concept of '(morphosyntactic) word'. I show that there are no good criteria for defining such a concept. I examine ten criteria in some detail (potential pauses, free occurrence, mobility, uninterruptibility, non-selectivity, non-coordinatability, anaphoric islandhood, nonextractability, morphophonological idiosyncrasies, and deviations from bi-uniqueness), and I show that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
164
1
15

Year Published

2014
2014
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 251 publications
(182 citation statements)
references
References 40 publications
2
164
1
15
Order By: Relevance
“…In this paper, we have described the creation of SeedLing, a foundation text for a Universal Corpus, following the guidelines of Abney and Bird (2010;2011). To do this, we cleaned and standardised data from several multilingual data sources: ODIN, Omniglot, the UDHR, Wikipedia.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In this paper, we have described the creation of SeedLing, a foundation text for a Universal Corpus, following the guidelines of Abney and Bird (2010;2011). To do this, we cleaned and standardised data from several multilingual data sources: ODIN, Omniglot, the UDHR, Wikipedia.…”
Section: Discussionmentioning
confidence: 99%
“…Several years ago, Abney and Bird (2010;2011) posed the challenge of building a Universal Corpus, naming it the Human Language Project. Such a corpus would include data from all the world's languages, in a consistent structure, facilitating large-scale cross-linguistic processing.…”
Section: Introductionmentioning
confidence: 99%
“…In this paper, we have described the creation of SeedLing, a foundation text for a Universal Corpus, following the guidelines of 2011). To do this, we cleaned and standardised data from several multilingual data sources: ODIN, Omniglot, the UDHR, Wikipedia.…”
Section: Discussionmentioning
confidence: 99%
“…Ideally, we should encode this information explicitly in a Universal Corpus, assigning a unique identifier to each morpheme (instead of, or in addition to each word). Indeed, Haspelmath (2011) argues that there is no cross-linguistically valid definition of word, which undermines the central position of words in the proposed data structure.…”
Section: Data Sourcesmentioning
confidence: 99%
See 1 more Smart Citation