2021
DOI: 10.18485/infotheca.2021.21.2.3
|View full text |Cite
|
Sign up to set email alerts
|

Annotation of the Serbian ELTeC Collection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…Models trained using the spaCy library, downloaded for each language from the corresponding repository https://spacy.io/models, were used to label the named entities. For the Italian language, the it_core_news_sm-3.4.0, 8 model was downloaded, trained on an automatically created corpus for the recognition of named entities, WikiNER, 9 based on the text and structure of Wikipedia (Nothman et al 2013), while for the Serbian language, a model trained on the corpus was implemented of old Serbian novels SrpCNNER ( Šandrih Todorović et al 2021), downloaded from the European Language Grid (ELG) platform. 10 In addition to recognizing named entities, the goal of the application was also to link them with items in the open knowledge base Wikidata.…”
Section: Service Implementationmentioning
confidence: 99%
See 1 more Smart Citation
“…Models trained using the spaCy library, downloaded for each language from the corresponding repository https://spacy.io/models, were used to label the named entities. For the Italian language, the it_core_news_sm-3.4.0, 8 model was downloaded, trained on an automatically created corpus for the recognition of named entities, WikiNER, 9 based on the text and structure of Wikipedia (Nothman et al 2013), while for the Serbian language, a model trained on the corpus was implemented of old Serbian novels SrpCNNER ( Šandrih Todorović et al 2021), downloaded from the European Language Grid (ELG) platform. 10 In addition to recognizing named entities, the goal of the application was also to link them with items in the open knowledge base Wikidata.…”
Section: Service Implementationmentioning
confidence: 99%
“…15 In addition, entity labels of the PERS class, which mark persons, were set as the basic tag into which corresponding labels from other models were mapped: PER (Italian), PRS (Swedish), PERSON (Macedonian), per-sNAME (Polish). The label NORP (nationalities or religious or political groups) of nationalities, political and religious groups from the Japanese and Finnish models, then NAT_REL_POL from the Romanian one, are mapped into the label of the class DEMO, which are marked with demonyms, ethnic relations (Stanković et al 2021). Since some language models have a much richer set of named entity classes, for example English has 18 classes, Romanian 16, a column with a list of ignored labels is defined in the configuration file.…”
Section: Service Implementationmentioning
confidence: 99%
“…Building on GDEX, Stanković et al (2019) adopted machine learning to identify good candidate examples for Serbian. First, they analyzed lexical and syntactic features in a corpus compiled of illustrative examples from the five digitized volumes of the Serbian Academy of Sciences and Arts (SASA) dictionary.…”
Section: Pedagogical Corpora and Language Examplesmentioning
confidence: 99%
“…Old Serbian novels from the 1840s to the 1920s are collected in SrELTeC (SrELTeC, Table 1) and have been digitally preserved as part of the COST action CA16204 (Stanković et al, 2021). ELTeC 's section for Serbian contains 120 novels (Odebrecht et al, 2021).…”
Section: Monolingual Corporamentioning
confidence: 99%