2008
DOI: 10.1080/01969720801944299
|View full text |Cite
|
Sign up to set email alerts
|

Words as Classifiers of Documents According to Their Historical Period and the Ethnic Origin of Their Authors

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
4
3
1

Relationship

4
4

Authors

Journals

citations
Cited by 16 publications
(4 citation statements)
references
References 10 publications
0
4
0
Order By: Relevance
“…For instance, Forman (2003), in his study on feature selection metrics for TC, claimed that stop words are ambiguous and therefore should be removed. However, HaCohen-Kerner et al (2008) demonstrated that the use of word unigrams including stop words lead to improved TC results compared to the results obtained using word unigrams excluding stop words. In our system, we applied various combinations of preprocessing types depending on the language.…”
Section: Text Preprocessingmentioning
confidence: 96%
“…For instance, Forman (2003), in his study on feature selection metrics for TC, claimed that stop words are ambiguous and therefore should be removed. However, HaCohen-Kerner et al (2008) demonstrated that the use of word unigrams including stop words lead to improved TC results compared to the results obtained using word unigrams excluding stop words. In our system, we applied various combinations of preprocessing types depending on the language.…”
Section: Text Preprocessingmentioning
confidence: 96%
“…Most of the studies use a relatively small number of the following components: datasets, ML methods, preprocessing methods, and combinations of these. Moreover, portions of the conclusions of these studies seemingly contradict each other (e.g., stopword removal improves classification accuracy [15,17,21] or does not improve classification accuracy [16,31,37]). Table 1 summarizes the attributes of twelve studies described in this section that addressed preprocessing for TC.…”
Section: Plos Onementioning
confidence: 99%
“…Other studies that are related to document classification and address the challenges of Hebrew involve the classification of Hebrew-Aramaic documents according to style (Koppel et al, 2006;Mughaz, 2003); authorship verification, including forgeries and pseudonyms (Koppel et al, 2003(Koppel et al, , 2004 and classification of texts according to their ethnic origin and their historical period (HaCohen-Kerner, Beck, Yehudai & Mughaz, 2006;HaCohen-Kerner, Mughaz et al, 2008;HaCohen-Kerner, Beck, Yehudai, Rosenstein & Mughaz, 2010).…”
Section: Related Workmentioning
confidence: 99%