2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2017
DOI: 10.1109/jcdl.2017.7991557
|View full text |Cite
|
Sign up to set email alerts
|

Descriptor-Invariant Fusion Architectures for Automatic Subject Indexing

Abstract: Documents indexed with controlled vocabularies enable users of libraries to discover relevant documents, even across language barriers. Due to the rapid growth of scienti c publications, digital libraries require automatic methods that index documents accurately, especially with regard to explicit or implicit concept dri , that is, with respect to new descriptor terms and new types of documents, respectively. is paper rst analyzes architectures of related approaches on automatic indexing. We show that their de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 18 publications
0
10
0
Order By: Relevance
“…As a consequence, the performance of these methods largely depends on the availability of appropriate training examples and the stability of term and concept distributions, whereas lexical methods require vocabularies that exhaustively cover the domain. When concept drift occurs, that is, if observed terms and the set of relevant concepts differ between training data and new data, both types of indexing approaches considerably decrease in performance [8]. Interestingly, since these algorithms merely learn to assign recognized subjects of the controlled vocabulary, they will silently miss to assign relevant subjects not covered by the controlled vocabulary, and moreover they are unable to recognize and represent the loss in document-level content representation.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…As a consequence, the performance of these methods largely depends on the availability of appropriate training examples and the stability of term and concept distributions, whereas lexical methods require vocabularies that exhaustively cover the domain. When concept drift occurs, that is, if observed terms and the set of relevant concepts differ between training data and new data, both types of indexing approaches considerably decrease in performance [8]. Interestingly, since these algorithms merely learn to assign recognized subjects of the controlled vocabulary, they will silently miss to assign relevant subjects not covered by the controlled vocabulary, and moreover they are unable to recognize and represent the loss in document-level content representation.…”
Section: Discussionmentioning
confidence: 99%
“…Regarding economics, we use three datasets, which comprise roughly 20,000 (T20k), 60,000 (T60k), and 400,000 documents (T400k), respectively. Each document is associated with several descriptors, for instance 5.89 on average for T400k, from the STW Thesaurus for Economics (STW) 8 . Both, the STW and EUROVOC, comprise thousands of concepts, yielding challenging multi-label classification tasks.…”
Section: Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…We also conduct the split between training set and test set on the time axis (see Figure 5). This is challenging because label annotations suffer from concept drift over time [39]. We use the years 2012 and 2013 as test documents to obtain a train-test ratio similar to the scenario in Section 5.1.…”
Section: Subject Label Recommendationmentioning
confidence: 99%
“…Then, the fusion layer (below) is responsible for combining these predictions. The most interesting property of this layer is the descriptor-invariant decision function [32], i.e., a function that allows to perform predictions for all (also unseen) descriptors. Optionally, the fusion module may additionally consult the knowledge base or the professionally in- dexed documents for its decisions and use a descriptorspecific fusion component.…”
Section: Fusion Architecturesmentioning
confidence: 99%