2021
DOI: 10.1162/qss_a_00152
|View full text |Cite
|
Sign up to set email alerts
|

Identifying constitutive articles of cumulative dissertation theses by bilingual text similarity. Evaluation of similarity methods on a new short text task

Abstract: Cumulative dissertations are doctoral theses comprised of multiple published articles. For studies of publication activity and citation impact of early career researchers it is important to identify these articles and link them to their associated theses. Using a new benchmark dataset, this paper reports on experiments of measuring the bilingual textual similarity between, on the one hand, titles and keywords of doctoral theses, and, on the other hand, articles’ titles and abstracts. The tested methods are cos… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 38 publications
0
4
0
Order By: Relevance
“…FastText: is a free public-source library created by the Facebook AI Research team for learning WE and classification [44,45]. For each word, FastText generates a word vector that contains both the term's meaning and its context in the document.…”
Section: Feature Extractionmentioning
confidence: 99%
“…FastText: is a free public-source library created by the Facebook AI Research team for learning WE and classification [44,45]. For each word, FastText generates a word vector that contains both the term's meaning and its context in the document.…”
Section: Feature Extractionmentioning
confidence: 99%
“…For the first stage, the goal is to rule out as many unlikely matches as possible while discarding as few actual matches as possible with relatively simple rules. We only briefly summarize this stage, as a detailed description is already available in Donner (2021b). Scopus records are filtered (a) by author name similarity for all authors affiliated with Germany and (b) publication year range.…”
Section: Filtering Of Candidate Matchesmentioning
confidence: 99%
“…The second criterion is textual similarity. As this is a considerably complex task in its own right, a separate study for the selection of suitable methods was carried out in Donner (2021b) and the reader is referred to this study for a detailed description of the methods. To summarize, the difficulties relate to the sparsity of the textual information, the bilingual nature of the texts, and domain specificity of the texts.…”
Section: Content Similarity Criteriamentioning
confidence: 99%
“…In a jobmatching model, understanding context, relationships between words, and deeper meanings are required [19][20] [21]. The SI often has to limit the number of dimensions (semantic concepts) used to represent documents (difficult to interpret) [22][23] [24][25], Sensitivity to changes in documents [14] [26], and cognitive abilities [27] [16], or aspects of the job candidate's personality.…”
Section: Introductionmentioning
confidence: 99%