2005
DOI: 10.1007/s10588-005-5380-5
|View full text |Cite
|
Sign up to set email alerts
|

Email Surveillance Using Non-negative Matrix Factorization

Abstract: In this study, we apply a non-negative matrix factorization approach for the extraction and detection of concepts or topics from electronic mail messages. For the publicly released Enron electronic mail collection, we encode sparse term-by-message matrices and use a low rank non-negative matrix factorization algorithm to preserve natural data non-negativity and avoid subtractive basis vector and encoding interactions present in techniques such as principal component analysis. Results in topic detection and mes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
58
0

Year Published

2006
2006
2020
2020

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 104 publications
(58 citation statements)
references
References 13 publications
(29 reference statements)
0
58
0
Order By: Relevance
“…Next, to gain intuition for the (non-)uniformity properties of statistical leverage scores in a typical application, consider a term-document matrix derived from the publicly-released Enron electronic mail collection [169], which is typical of the type of data set to which SVD-based latent semantic analysis (LSA) methods [170] have been applied. This is a 65, 031 × 92, 133 matrix, as described in [169], and let us choose the rank parameter as k = 10.…”
Section: More Recent Perspectives On Statistical Leveragementioning
confidence: 99%
“…Next, to gain intuition for the (non-)uniformity properties of statistical leverage scores in a typical application, consider a term-document matrix derived from the publicly-released Enron electronic mail collection [169], which is typical of the type of data set to which SVD-based latent semantic analysis (LSA) methods [170] have been applied. This is a 65, 031 × 92, 133 matrix, as described in [169], and let us choose the rank parameter as k = 10.…”
Section: More Recent Perspectives On Statistical Leveragementioning
confidence: 99%
“…Berry and Browne [3] apply nonnegative matrix factorizations to discover concepts and topics in the Enron corpus. They discuss results of topic detection and message clustering in the context of published Enron business practices and activities.…”
Section: Enron Data and Social Network Analysismentioning
confidence: 99%
“…Extracting topics from a temporally changing text collection has received some attention lately, for instance by [7] and also touched by [8]. These works investigate text streams that contain documents that can be assigned a timestamp y.…”
Section: Temporal Topic Detectionmentioning
confidence: 99%
“…The approaches to temporal topic detection presented in [7] and [8] employ latent factor methods to extract distinct topics for each time interval, and then compare the found topics at succeeding time intervals to link the topics over time to form temporal topics. We extract topics from each sub-collection C k using a PLSA-model [5].…”
Section: Ck With Parameters (H = {P(t Iz)k P(dlz)k P(z)k} Needmentioning
confidence: 99%