Exploring semantic differences between the Indonesian prefixes<i>PE-</i>and<i>PEN-</i>using a vector space model

Denistia, Karlina; Shafaei-Bajestan, Elnaz; Baayen, R. Harald

doi:10.1515/cllt-2020-0023

Cited by 10 publications

(5 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Shen and Baayen (2021) find that semantic transparency measured by DS is linked to the productivity of adjective-noun compounds in Mandarin. DS models used in investigating the paradigmatic relation between two Indonesian prefixes (Denistia, Shafaei-Bajestan, & Baayen, 2021) corroborated the findings of earlier corpus-based analyses. The discriminative lexicon model of Baayen, Chuang, Shafaei-Bajestan, and Blevins (2019) is a computational model of lexical processing, including morphologically complex words, that incorporates insights from distributional semantics for the representation of word meanings.…”

Section: Introductionsupporting

confidence: 75%

Semantic properties of English nominal pluralization: Insights from word embeddings

Shafaei-Bajestan¹,

Moradipour-Tari²,

Uhrig³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Semantic differentiation of nominal pluralization is grammaticalized in many languages. For example, plural markers may only be relevant for human nouns. English does not appear to make such distinctions. Using distributional semantics, we show that English nominal pluralization exhibits semantic clusters. For instance, pluralization of fruit words is more similar to one another and less similar to pluralization of other semantic classes. Therefore, reduction of the meaning shift in plural formation to the addition of an abstract plural meaning is too simplistic. A semantically informed method, called CosClassAvg, is introduced that outperforms pluralization methods in distributional semantics which assume plural formation amounts to the addition of a fixed plural vector. In comparison with our approach, a method from compositional distributional semantics, called FRACSS, predicted plural vectors that were more similar to the corpus-extracted plural vectors in terms of direction but not vector length. A modeling study reveals that the observed difference between the two predicted semantic spaces by CosClassAvg and FRACSS carries over to how well a computational model of the listener can understand previously unencountered plural forms. Mappings from word forms, represented with triphone vectors, to predicted semantic vectors are more productive when CosClassAvg-generated semantic vectors are employed as gold standard vectors instead of FRACSS-generated vectors.

show abstract

Section: Introductionsupporting

confidence: 75%

Semantic properties of English nominal pluralization: Insights from word embeddings

Shafaei-Bajestan¹,

Moradipour-Tari²,

Uhrig³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Based on the keywords associated with ads and the script associated with the video, the vector space model (VSM; Denistia et al, 2021) is a particularly popular choice for ad matching. In VSM, the similarity between documents

{italicDoc}_{x}

and

{italicDoc}_{y}

R ({Doc}_{x}, {Doc}_{y})

is calculated from the cosine of the angle between two vectors:

R ({Doc}_{x}, {Doc}_{y}) = \frac{ω ({Doc}_{x}) ω ({Doc}_{y})}{ω ({Doc}_{x}) \times ω ({Doc}_{y})},

where

ω ({Doc}_{x})

and

ω ({Doc}_{y})

are the weight vectors of

{italicDoc}_{x}

and

{italicDoc}_{y}

, respectively.…”

Section: Computer Science Studiesmentioning

confidence: 99%

A survey of online video advertising

Zhang

Yan

et al. 2023

WIREs Data Min & Knowl

View full text Add to dashboard Cite

With the development of social media and the ubiquity of the Internet, recent years have witnessed the rapid development of online video advertising among publishers and advertisers. Video advertising, as a new type of advertisement, has gained significant research attention from both academia and industry, coinciding with the ever-growing volume of online videos. In this research, we provide a comprehensive survey of online video advertising in the fields of social science and computer science. We investigate state-of-the-art articles from 1990 to the present and provide a new taxonomy of extant research topics based on these articles. We also highlight the factors that cause advertising to affect people and the most popular video advertising techniques used in computer science. Finally, on the basis of the analytics of the surveyed papers, future challenges are identified and potential solutions to these are discussed.

show abstract

“…Therefore, a set of databases are needed to explore this phenomenon from the quantitative perspective. Recent studies on these prefixes conducted analyses based on corpus data (Denistia & Baayen, 2019, 2022a, 2022b, Denistia et al, 2022. Their research focused on investigating whether PE-and PEN-are allomorphs from their productivity, computational learning, and semantics distribution respectively.…”

Section: Introductionmentioning

confidence: 99%

“…PE-, however, is an outlier in the linearity of the base words' productivity. Apart from productivity analysis, using semantics distribution (Mikolov et al, 2013), Denistia et al (2022) measured the similarity of all possible combination between PE-and PEN-. They found that PE-and PEN-are semantically discriminable.…”

Section: Introductionmentioning

confidence: 99%

“…This paper provides a detailed explanation of the materials and database used in Denistia & Baayen (2019) and Denistia et al (2022). Theoretical grounding on how the information in database were classified (e.g., the classification of PE-and PEN-, allomorph of PEN-, semantics role, cosine similarity, tokens frequency in the corpus) is described.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Databases on the Indonesian Prefixes PE- and PEN-

Denistia¹

2023

j.lang.lit.

View full text Add to dashboard Cite

This paper provides the theoretical grounding in constituting databases related to PE- and PEN-, two Indonesian nominalizing prefixes, which have various meanings (e.g., patient, agent, or instrument). The first database contains the words with PE- and PEN- whereas the second database provides the cosine similarity between two words of interest. Using a written Indonesian corpus as the primary source (Leipzig Corpora Collection), the databases contain the following information: PE- or PEN- prefixes, allomorph of PEN-, base word, semantics role, morphological variation, cosine similarity, as well as the word frequency. Furthermore, this paper elaborates the theoretical consideration on how each information was cultivated. In building the databases, Indonesian morphological parser and Word to Vector were used to analyze the Indonesian morphological status and to put the words in the corpus into a vector. In addition, manual verification for the data against the Indonesian comprehensive dictionary was also conducted. In the end, the databases are available for free so that the data could be used as materials for a corpus-based analysis on Indonesian morphology. This research shed light to a careful and thorough classification of the open-access databases of PE- and PEN- from their allomorphs, base word, semantics role, and morphological variation. The information provided in this article is hoped to be contributive in Indonesian morphology specifically, and other linguistics fields (e.g., corpus linguistics and quantitative linguistics) in general.

show abstract

Exploring semantic differences between the Indonesian prefixesPE-andPEN-using a vector space model

Cited by 10 publications

References 26 publications

Semantic properties of English nominal pluralization: Insights from word embeddings

Semantic properties of English nominal pluralization: Insights from word embeddings

A survey of online video advertising

Databases on the Indonesian Prefixes PE- and PEN-

Contact Info

Product

Resources

About