Integrating Query Performance Prediction in Term Scoring for Diachronic Thesaurus

Liebeskind, Chaya; Dagan, Ido

doi:10.18653/v1/w15-3714

Cited by 2 publications

(2 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We plan to investigate additional aggregation methods and explore the impact of the individual models on the combined system to improve our system results. We also plan to try our system on other languages of different families, such as Semitic languages (Liebeskind and Liebeskind, 2020) and use LSC models to construct diachronic thesaurus, which bridges the lexical gap between modern and ancient language (Zohar et al, 2013;Liebeskind and Dagan, 2015;Liebeskind et al, 2016;Liebeskind et al, 2019).…”

Section: Discussionmentioning

confidence: 99%

JCT at SemEval-2020 Task 1: Combined Semantic Vector Spaces Models for Unsupervised Lexical Semantic Change Detection

Amar¹,

Liebeskind

2020

Proceedings of the Fourteenth Workshop on Semantic Evaluation

Self Cite

View full text Add to dashboard Cite

In this paper, we present our contribution in SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection, where we systematically combine existing models for unsupervised capturing of lexical semantic change across time in text corpora of German, English, Latin and Swedish.In particular, we analyze the score distribution of existing models. Then we define a general classification threshold, adjust it independently to each of the models and measure the models' score certainty. Finally, using both the threshold and score certainty, we aggregate the models for the two sub-tasks: binary classification and ranking.

show abstract

Section: Discussionmentioning

confidence: 99%

JCT at SemEval-2020 Task 1: Combined Semantic Vector Spaces Models for Unsupervised Lexical Semantic Change Detection

Amar¹,

Liebeskind

2020

Proceedings of the Fourteenth Workshop on Semantic Evaluation

Self Cite

View full text Add to dashboard Cite

show abstract

“…Responsa documents present various arguments by citing earlier sources, such as the Talmud and its commentators, legal codes, and earlier responses [Koppel, 2011]. Our corpus, used for previous IR and NLP research [Choueka, 1972, Fraenkel, 1976, Choueka et al, 1987, HaCohen-Kerner et al, 2008, Koppel, 2011, Zohar et al, 2013, Liebeskind and Dagan, 2015, contains 76,710 articles and approximately 100 million word tokens. Koppel [2011] emphasized another characteristic of Responsa, Responsa corpus was intended as a source of information and not a source of language use.…”

Section: The Responsa Corpus and Diachronic Tasksmentioning

confidence: 99%

Deep Learning for Period Classification of Historical Hebrew Texts

Liebeskind

Liebeskind²

2020

Journal of Data Mining &Amp; Digital Humanities

Self Cite

View full text Add to dashboard Cite

In this study, we address the interesting task of classifying historical texts by their assumed period of writ-ing. This task is useful in digital humanity studies where many texts have unidentified publication dates.For years, the typical approach for temporal text classification was supervised using machine-learningalgorithms. These algorithms require careful feature engineering and considerable domain expertise todesign a feature extractor to transform the raw text into a feature vector from which the classifier couldlearn to classify any unseen valid input. Recently, deep learning has produced extremely promising re-sults for various tasks in natural language processing (NLP). The primary advantage of deep learning isthat human engineers did not design the feature layers, but the features were extrapolated from data witha general-purpose learning procedure. We investigated deep learning models for period classification ofhistorical texts. We compared three common models: paragraph vectors, convolutional neural networks (CNN) and recurrent neural networks (RNN), and conventional machine-learning methods. We demon-strate that the CNN and RNN models outperformed the paragraph vector model and the conventionalsupervised machine-learning algorithms. In addition, we constructed word embeddings for each timeperiod and analyzed semantic changes of word meanings over time.

show abstract

Integrating Query Performance Prediction in Term Scoring for Diachronic Thesaurus

Cited by 2 publications

References 20 publications

JCT at SemEval-2020 Task 1: Combined Semantic Vector Spaces Models for Unsupervised Lexical Semantic Change Detection

JCT at SemEval-2020 Task 1: Combined Semantic Vector Spaces Models for Unsupervised Lexical Semantic Change Detection

Deep Learning for Period Classification of Historical Hebrew Texts

Contact Info

Product

Resources

About