2014 13th International Conference on Machine Learning and Applications 2014
DOI: 10.1109/icmla.2014.101
|View full text |Cite
|
Sign up to set email alerts
|

Iterative Hard Thresholding for Keyword Extraction from Large Text Corpora

Abstract: To better understand and analyze text corpora, such as the news, it is often useful to extract keywords that are meaningfully associated with a given topic. A corpus of documents labeled by their topic can be used to approach this as a learning problem. We consider this problem through the lens of statistical text analysis, using bag-of-words frequencies as features for a sparse linear model. We demonstrate, through numerical experiments, that iterative hard thresholding (IHT) is a practical and effective algo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2018
2018
2018
2018

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 24 publications
0
1
0
Order By: Relevance
“…Second, due to the myriad of text-based content, the presence of largely disjoint Twitter "interest groups", and Twitter's skewed follow graph (such that most nodes have only a few edges while some have tens of millions), most Machine Learning algorithms at Twitter naturally operate on sparse data. This can make training models particularly difficult [27] [28] and force teams to rely on algorithms like binning or feature hashing [25].…”
Section: Unique Challenges Of Twitter Datamentioning
confidence: 99%
“…Second, due to the myriad of text-based content, the presence of largely disjoint Twitter "interest groups", and Twitter's skewed follow graph (such that most nodes have only a few edges while some have tens of millions), most Machine Learning algorithms at Twitter naturally operate on sparse data. This can make training models particularly difficult [27] [28] and force teams to rely on algorithms like binning or feature hashing [25].…”
Section: Unique Challenges Of Twitter Datamentioning
confidence: 99%