2023
DOI: 10.3390/app132212155
|View full text |Cite
|
Sign up to set email alerts
|

esCorpius-m: A Massive Multilingual Crawling Corpus with a Focus on Spanish

Asier Gutiérrez-Fandiño,
David Pérez-Fernández,
Jordi Armengol-Estapé
et al.

Abstract: In recent years, transformer-based models have played a significant role in advancing language modeling for natural language processing. However, they require substantial amounts of data and there is a shortage of high-quality non-English corpora. Some recent initiatives have introduced multilingual datasets obtained through web crawling. However, there are notable limitations in the results for some languages, including Spanish. These datasets are either smaller compared to other languages or suffer from lowe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 41 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?