Deep Learning for Period Classification of Historical Hebrew Texts

Liebeskind, Chaya; Liebeskind, Shmuel

doi:10.46298/jdmdh.5864

Cited by 5 publications

(5 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Traditional machine learning methods focus on statistical features and learning models, such as Naïve Bayes (Boldsen and Wahlberg, 2021), SVM (Garcia-Fernandez et al, 2011) and Random Forests (Ciobanu et al, 2013). Recent studies turn to deep learning methods, and the experiments show their superior performances compared to traditional machine learning ones (Kulkarni et al, 2018;Liebeskind and Liebeskind, 2020;Yu and Huangfu, 2019;Ren et al, 2022). Pre-trained models are also leveraged to represent texts for the dating task, such as Sentence-BERT (Massidda, 2020;Tian and Kübler, 2021) and RoBERTa .…”

Section: Related Workmentioning

confidence: 99%

“…One is to learn word representations by diachronic documents. Current research on word representation either learn static word embedding throughout the corpus (Liebeskind and Liebeskind, 2020;Yu and Huangfu, 2019), or learn dynamic word representations using pre-trained models (Tian and Kübler, 2021). However, neither of them takes into account the relation between time and word meaning.…”

Section: Introductionmentioning

confidence: 99%

“…Initial work on neural networkbased document modeling employ convolutional neural networks(CNN) or recurrent neural networks(RNN) (Liebeskind and Liebeskind, 2020;Yu and Huangfu, 2019), while recent research turns to pre-trained models like BERT (Tian and Kübler, 2021) or RoBERTa . However, these studies always treat time as a prediction target, but not a variable in modeling, which does not help to capture the temporal characteristics of diachronic documents.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Time-Aware Language Modeling for Historical Text Dating

Ren,

Wang,

Zhao

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Automatic text dating(ATD) is a challenging task since explicit temporal mentions usually do not appear in texts. Existing state-of-theart approaches learn word representations via language models, whereas most of them ignore diachronic change of words, which may affect the efforts of text modeling. Meanwhile, few of them consider text modeling for long diachronic documents. In this paper, we present a time-aware language model named TALM, to learn temporal word representations by transferring language models of general domains to those of time-specific ones. We also build a hierarchical modeling approach to represent diachronic documents by encoding them with temporal word representations. Experiments on a Chinese diachronic corpus show that our model effectively captures implicit temporal information of words, and outperforms state-of-the-art approaches in historical text dating as well. Our code is available at: https://github.com/coderlihong/text-dating.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Time-Aware Language Modeling for Historical Text Dating

Ren,

Wang,

Zhao

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

show abstract

“…Existing NLP studies on historical documents primarily focus on tasks such as spelling normalization [18], [23], machine translation [24], and sequence labelling, including part-of-speech tagging [25] and named entity recognition [19], [26]. Recently, the success of deep neural networks has introduced new applications in this domain, including sentiment analysis [27], information retrieval [28], event extraction [29], [30], and text classification [31]. However, only a limited amount of research has been conducted on historical text summarization.…”

Section: Historical Natural Language Processing Applicationsmentioning

confidence: 99%

Historical-Domain Pre-trained Language Model for Historical Extractive Text Summarization

Salima,

Keerthana,

Christoph

2023

World Congress on Electrical Engineering and Computer Systems and Science

View full text Add to dashboard Cite

In recent years, pre-trained language models (PLMs) have shown remarkable advancements in the extractive summarization task across diverse domains. However, there remains a lack of research specifically in the historical domain. In this paper, we propose a novel method for extractive historical single-document summarization that leverages the potential of a domain-aware historical bidirectional language model, pre-trained on a large-scale historical corpus. Subsequently, we fine-tune the language model specifically for the task of extractive historical single-document summarization. One major challenge for this task is the lack of annotated datasets for historical summarization. To address this issue, we construct a dataset by collecting archived historical documents from the Centre Virtuel de la Connaissance sur l'Europe (CVCE) group at the University of Luxembourg. Furthermore, to better learn the structural features of the input documents, we use a sentence position embedding mechanism that enables the model to learn the position information of sentences. The overall experimental results on our historical dataset collected from the CVCE group show that our method outperforms recent state-of-the-art methods in terms of ROUGE-1, ROUGE-2, and ROUGE-L F1 scores. To the best of our knowledge, this is the first work on extractive historical text summarization.

show abstract

“…Refs. [ 7 , 43 , 57 , 58 , 59 , 60 , 61 , 62 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 ] are cited in the Supplementary Materials.…”

mentioning

confidence: 99%

Comparative Analysis of Preference in Contemporary and Earlier Texts Using Entropy Measures

Mohseni

Redies

Gast³

2023

Entropy

View full text Add to dashboard Cite

Research in computational textual aesthetics has shown that there are textual correlates of preference in prose texts. The present study investigates whether textual correlates of preference vary across different time periods (contemporary texts versus texts from the 19th and early 20th centuries). Preference is operationalized in different ways for the two periods, in terms of canonization for the earlier texts, and through sales figures for the contemporary texts. As potential textual correlates of preference, we measure degrees of (un)predictability in the distributions of two types of low-level observables, parts of speech and sentence length. Specifically, we calculate two entropy measures, Shannon Entropy as a global measure of unpredictability, and Approximate Entropy as a local measure of surprise (unpredictability in a specific context). Preferred texts from both periods (contemporary bestsellers and canonical earlier texts) are characterized by higher degrees of unpredictability. However, unlike canonicity in the earlier texts, sales figures in contemporary texts are reflected in global (text-level) distributions only (as measured with Shannon Entropy), while surprise in local distributions (as measured with Approximate Entropy) does not have an additional discriminating effect. Our findings thus suggest that there are both time-invariant correlates of preference, and period-specific correlates.

show abstract

Deep Learning for Period Classification of Historical Hebrew Texts

Cited by 5 publications

References 32 publications

Time-Aware Language Modeling for Historical Text Dating

Time-Aware Language Modeling for Historical Text Dating

Historical-Domain Pre-trained Language Model for Historical Extractive Text Summarization

Comparative Analysis of Preference in Contemporary and Earlier Texts Using Entropy Measures

Contact Info

Product

Resources

About