Proceedings of the 26th International Conference on World Wide Web 2017
DOI: 10.1145/3038912.3052631
|View full text |Cite
|
Sign up to set email alerts
|

Template Induction over Unstructured Email Corpora

Abstract: Unsupervised template induction over email data is a central component in applications such as information extraction, document classification, and auto-reply. The benefits of automatically generating such templates are known for structured data, e.g. machine generated HTML emails. However much less work has been done in performing the same task over unstructured email data.We propose a technique for inducing high quality templates from plain text emails at scale based on the suffix array data structure. We ev… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 17 publications
(9 citation statements)
references
References 21 publications
0
9
0
Order By: Relevance
“…Second-level Examples information retrieval (1884) entity ranking (6) [8], [21], [168] XML retrieval (5) [69], [74], [118] evaluation (35) [52], [162], [290] user activity tracking (2) [18], [27] search (76) [17], [27], [126] recommendation (27) [2], [41], [223] structure analysis (22) [246], [266], [269] query analysis (302) [19], [135], [330] filtering (15) [35], [102], [368] interactive retrieval (12) [138], [142], [284] unstructured information retrieval (107) [90], [204], [359] efficiency and scalability (361) [297], [354], [363] cluster/topic analysis (1624) community discovery (6) [215], [313], [374] text segmentation (26) [66], [174], [326] topic analysis (556) [11], [26], [88] contextual text mining (2) [234],…”
Section: First-levelmentioning
confidence: 99%
“…Second-level Examples information retrieval (1884) entity ranking (6) [8], [21], [168] XML retrieval (5) [69], [74], [118] evaluation (35) [52], [162], [290] user activity tracking (2) [18], [27] search (76) [17], [27], [126] recommendation (27) [2], [41], [223] structure analysis (22) [246], [266], [269] query analysis (302) [19], [135], [330] filtering (15) [35], [102], [368] interactive retrieval (12) [138], [142], [284] unstructured information retrieval (107) [90], [204], [359] efficiency and scalability (361) [297], [354], [363] cluster/topic analysis (1624) community discovery (6) [215], [313], [374] text segmentation (26) [66], [174], [326] topic analysis (556) [11], [26], [88] contextual text mining (2) [234],…”
Section: First-levelmentioning
confidence: 99%
“…There is a large body of work on information extraction from the web, whether directly, through template extraction [6,7,16,30,41], or through the more general idea of region extraction and classification [3,5,[9][10][11]. Template extraction techniques have received a lot of attention recently to improve the performance of search engines, clustering, and classification of web pages.…”
Section: Related Work 21 Information Extractionmentioning
confidence: 99%
“…The general problem of information extraction from web pages has been studied extensively in recent years. Of these, template induction (or wrapper induction) [7,24,30] has proven to be successful for extracting relations from web pages. However, these techniques do not scale to the whole web as obtaining accurate ground truth for all the event domains is expensive.…”
Section: Introductionmentioning
confidence: 99%
“…For emails, multiple algorithms for template induction have been described [2,4] along with applications like email threading [2] and hierarchical classification [44]. A technique has also been suggested for plain text emails [35] where data is not explicitly structured.…”
Section: Related Workmentioning
confidence: 99%