2022
DOI: 10.1075/kl.20002.shi
|View full text |Cite
|
Sign up to set email alerts
|

Automatic analysis of caregiver input and child production

Abstract: The present study explores the applicability of Natural Language Processing (NLP) techniques to investigate child corpora in Korean. We employ caregiver input and child production data in the CHILDES database, currently the largest and open-access Korean child corpus data, and apply NLP techniques to the data in two ways: automatic Part-of-Speech tagging by adapting a machine learning algorithm, and (semi-)automatic extraction of constructional patter… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
5
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(7 citation statements)
references
References 36 publications
2
5
0
Order By: Relevance
“…Along with the general nature of caregiver input, the models may thus have been affected by the specific word order and/or the presence of case markers in conducting the classification, particularly as shown with the two‐argument case‐marked scrambled condition (N ACC N NOM V) and the case‐less conditions (N CASE N CASE V; N CASE V). This aligns with previous reports on language‐specific challenges for automatic processing of Korean (Kim et al., 2007; Shin, 2022a). Since we are not aware of any study on language‐specific properties and NNs’ performance on child language, this claim awaits further examination.…”
Section: Discussionsupporting
confidence: 92%
See 4 more Smart Citations
“…Along with the general nature of caregiver input, the models may thus have been affected by the specific word order and/or the presence of case markers in conducting the classification, particularly as shown with the two‐argument case‐marked scrambled condition (N ACC N NOM V) and the case‐less conditions (N CASE N CASE V; N CASE V). This aligns with previous reports on language‐specific challenges for automatic processing of Korean (Kim et al., 2007; Shin, 2022a). Since we are not aware of any study on language‐specific properties and NNs’ performance on child language, this claim awaits further examination.…”
Section: Discussionsupporting
confidence: 92%
“…One is that the number of first‐noun‐as‐agent patterns (3049 instances) did not exceed that of first‐noun‐as‐theme patterns (3579 instances). The other property is that the number of nominative‐first patterns (overtly marked with the nominative case marker; 3369 instances) outnumbered that of accusative‐first patterns (overtly marked with the accusative case marker; 1989 instances) despite the generally higher omission rate of the accusative case marker than that of the nominative case marker in caregiver input (Shin, 2022a). Given these characteristics, as epoch progressed, the two models may have attended primarily to the form of a specific case marker (overtly attested in a test stimulus) rather than to the meaning/function (i.e., thematic roles) of the initial noun, possibly leading to both success in one‐argument conditions where consideration of thematic role ordering was not required but partial success in the two‐argument conditions where thematic role ordering between the two arguments should be considered.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations