Proceedings of TextGraphs-11: The Workshop on Graph-Based Methods For Natural Language Processing 2017
DOI: 10.18653/v1/w17-2408
|View full text |Cite
|
Sign up to set email alerts
|

Work Hard, Play Hard: Email Classification on the Avocado and Enron Corpora

Abstract: In this paper, we present an empirical study of email classification into two main categories "Business" and "Personal". We train on the Enron email corpus, and test on the Enron and Avocado email corpora. We show that information from the email exchange networks improves the performance of classification. We represent the email exchange networks as social networks with graph structures. For this classification task, we extract social networks features from the graphs in addition to lexical features from email… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
15
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(16 citation statements)
references
References 9 publications
1
15
0
Order By: Relevance
“…The vec (word embeddings) feature was used to compute an average vector for the entire text. Computing an average of word vectors has been shown effective in other document classification tasks (Alkhreyf and Rambow, 2017). However, clearly such a vector loses a lot of information about a text, and more fine-grained modeling is needed.…”
Section: Discussionmentioning
confidence: 99%
“…The vec (word embeddings) feature was used to compute an average vector for the entire text. Computing an average of word vectors has been shown effective in other document classification tasks (Alkhreyf and Rambow, 2017). However, clearly such a vector loses a lot of information about a text, and more fine-grained modeling is needed.…”
Section: Discussionmentioning
confidence: 99%
“…Since Enron is the only freely available email data set, many researchers have worked on it with different tasks. To our knowledge, the previous efforts most closely related to our research are [13], [2], and [3]. They all have worked on the same problem: classification of emails into Business or Personal category and used the same Enron data set for training and testing.…”
Section: Related Workmentioning
confidence: 99%
“…Email processing has been an active research topic with the earlier works focusing on email classification (Cohen and others, 1996;Whittaker and Sidner, 1996;Brutlag and Meek, 2000;Manco et al, 2002;Klimt and Yang, 2004;Alkhereyf and Rambow, 2017). This was later followed by work on intent classification (Cohen et al, 2004), searching (Soboroff et al, 2006;Minkov et al, 2008), clustering (Huang and Mitchell, 2008), and summarization (Muresan et al, 2001;Lam, 2002;Newman and Blitzer, 2003;Nenkova and Bagga, 2004;Corston-Oliver et al, 2004;Rambow et al, 2004;Carenini et al, 2007;Ulrich et al, 2008).…”
Section: Related Workmentioning
confidence: 99%
“…Email processing has been an active research topic with the earlier works focusing on email classification (Cohen and others, 1996;Whittaker and Sidner, 1996;Brutlag and Meek, 2000;Manco et al, 2002;Klimt and Yang, 2004;Alkhereyf and Rambow, 2017).…”
Section: Related Workmentioning
confidence: 99%