Proceedings of the 2009 SIAM International Conference on Data Mining 2009
DOI: 10.1137/1.9781611972795.99
|View full text |Cite
|
Sign up to set email alerts
|

Straightforward Feature Selection for Scalable Latent Semantic Indexing

Abstract: Latent Semantic Indexing (LSI) has been validated to be effective on many small scale text collections. However, little evidence has shown its effectiveness on unsampled large scale text corpus due to its high computational complexity. In this paper, we propose a straightforward feature selection strategy, which is named as Feature Selection for Latent Semantic Indexing (FSLSI), as a preprocessing step such that LSI can be efficiently approximated on large scale text corpus. We formulate LSI as a continuous op… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2009
2009
2015
2015

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 12 publications
0
3
0
Order By: Relevance
“…(e.g., [2,1,11]). Other approaches for increasing the computational performance of LSA include alternatives to the SVD for dimensionality reduction (e.g., [12]) and feature selection to reduce the size of the term-document matrices (e.g., [14]). …”
Section: Related Workmentioning
confidence: 99%
“…(e.g., [2,1,11]). Other approaches for increasing the computational performance of LSA include alternatives to the SVD for dimensionality reduction (e.g., [12]) and feature selection to reduce the size of the term-document matrices (e.g., [14]). …”
Section: Related Workmentioning
confidence: 99%
“…They showed that SDD-based LSI does as well as SVD-based LSI in terms of retrieval performance while requiring only very little storage (one-twentieth) and time (one-half) to answer a query. Yan et al (2009) formulated LSI as a continuous optimization problem and made it effective on large scale collections. Hofmann (1999) presented a novel approach called Probabilistic Latent Semantic Indexing (PLSI).…”
Section: Literature Reviewmentioning
confidence: 99%
“…Kontostathis and Pottenger (2006) presented a mathematical proof that the SVD algorithm can encapsulate term co‐occurrence information. Yan, Yan, Liu, and Chen (2009) formulated LSI as a continuous optimization problem and developed a feature selection algorithm for LSI by optimizing its objective function in terms of discrete optimization. Li and colleagues (Li & Kwong, 2007b; Li, Kwong, & Lee, in press) examined the latent semantic structure of a dataset from a dual perspective; namely, they simultaneously considered the term space and the document space, and then derived a unified kernel function for a class of vector space models from this new viewpoint.…”
Section: Introduction and Previous Researchmentioning
confidence: 99%