2018
DOI: 10.2478/yplm-2018-0002
|View full text |Cite
|
Sign up to set email alerts
|

A cross-linguistic database of phonetic transcription systems

Abstract: Contrary to what non-practitioners might expect, the systems of phonetic notation used by linguists are highly idiosyncratic. Not only do various linguistic subfields disagree on the specific symbols they use to denote the speech sounds of languages, but also in large databases of sound inventories considerable variation can be found. Inspired by recent efforts to link cross-linguistic data with help of reference catalogues (Glottolog, Concepticon) across different resources, we present initial efforts to link… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1
1

Relationship

4
5

Authors

Journals

citations
Cited by 39 publications
(29 citation statements)
references
References 18 publications
0
29
0
Order By: Relevance
“…Along with the growing amount of digitally available data for the world's languages, we find a substantial increase in the application of new quantitative techniques. While most of the new methods are inspired by neighboring disciplines and general-purpose frameworks, such as evolutionary biology 1,2 , machine learning 3,4 , or statistical modeling 5,6 , the particularities of cross-linguistic data often necessitate a specific treatment of materials (reflected in recent standardization efforts 7,8 ) and methods (illustrated by the development of new algorithms tackling specifically linguistic problems 9,10 ).…”
Section: Background and Summarymentioning
confidence: 99%
“…Along with the growing amount of digitally available data for the world's languages, we find a substantial increase in the application of new quantitative techniques. While most of the new methods are inspired by neighboring disciplines and general-purpose frameworks, such as evolutionary biology 1,2 , machine learning 3,4 , or statistical modeling 5,6 , the particularities of cross-linguistic data often necessitate a specific treatment of materials (reflected in recent standardization efforts 7,8 ) and methods (illustrated by the development of new algorithms tackling specifically linguistic problems 9,10 ).…”
Section: Background and Summarymentioning
confidence: 99%
“…First, to tokenize the data (split it up into sound segments), an orthography profile, as outlined in Wu et al (2020), was used by the Cross-Linguistic Data Formats Bench (Forkel and List, 2020) on the raw data. The CLDF Bench uses CLTS, or Cross-Linguistic Transcription Systems (Anderson et al, 2018), to consolidate transcriptions of words done by different linguists.…”
Section: Methodsmentioning
confidence: 99%
“…Since translations may lack or one concept may have been represented by more than one word form, the resulting wordlists comprise between 956 and 2,558 word forms. While word forms were provided in orthographic form or phonological transcriptions in the original data, we added phonetic transcriptions which follow the unified Broad IPA transcription system proposed by the Cross-Linguistic Transcription Systems reference catalog [ 33 , 34 ] with the help of orthography profiles [ 35 ] manually compiled by reading the relevant literature for each language. Orthography profiles can be best thought of as a specific look-up table, which allows to convert transcriptions from one orthography into another one (compare the presentation in Wu et al [ 36 ] for details); while such assisted transcription can introduce noise in the data, no comparable lexical database with transcriptions and loanword annotation was available.…”
Section: Methodsmentioning
confidence: 99%