Proceedings of the Second Workshop on Subword/Character LEvel Models 2018
DOI: 10.18653/v1/w18-1204
|View full text |Cite
|
Sign up to set email alerts
|

Addressing Low-Resource Scenarios with Character-aware Embeddings

Abstract: Most modern approaches to computing word embeddings assume the availability of text corpora with billions of words. In this paper, we explore a setup where only corpora with millions of words are available, and many words in any new text are out of vocabulary. This setup is both of practical interest -modeling the situation for specific domains and low-resource languages -and of psycholinguistic interest, since it corresponds much more closely to the actual experiences and challenges of human language learning… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 12 publications
0
1
0
1
Order By: Relevance
“…This method may be agnostic to the order of characters if the n-gram length is short. Others have used RNN [24], [41] or CNN [24], [70], [71] to better incorporate word morphology information into words embeddings. Unlike English words which are linear sequences of characters, logographs are recursive structures of of subunits.…”
Section: Incorporating Morphology Into Embeddingsmentioning
confidence: 99%
“…This method may be agnostic to the order of characters if the n-gram length is short. Others have used RNN [24], [41] or CNN [24], [70], [71] to better incorporate word morphology information into words embeddings. Unlike English words which are linear sequences of characters, logographs are recursive structures of of subunits.…”
Section: Incorporating Morphology Into Embeddingsmentioning
confidence: 99%
“…y se acumulan para crear un embedding de la palabra completa. De este modo, surgen representaciones más sólidas para las palabras poco utilizadas o desconocidas(Papay et al, 2018). En cada subgrupo, se establecen los 10 términos vecinos de interés más cercanos dentro del word embeddings, de modo que cada palabra que se identificó como distintiva de ambos géneros es visible, junto con los términos que suelen coocurrir más frecuentemente en los textos.Para contrastar los términos de cada subcorpus, a continuación, ilustraremos los diez términos más próximos por subcorpus junto con las similitudes de cada uno, donde la máxima similitud posible tiene valor 1.La palabra clave honor, que se encuentra no solo en las comedias, sino también en las tragedias, no muestra términos vecinos comunes en el subgrupo de las tragedias cuando se evalúa dentro del subgrupo de las comedias, ni tampoco se encontraron estos para la palabra hado.…”
unclassified