2020
DOI: 10.1109/tpds.2020.2979702
|View full text |Cite
|
Sign up to set email alerts
|

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

Abstract: Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based systems have emerged as a promising alternative because of the high computational power and memory bandwidth of GPUs. However, existing GPU-based LDA systems cannot support a large number of topics because they use algorithms on dense data structures whose time … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(26 citation statements)
references
References 25 publications
0
26
0
Order By: Relevance
“…As shown in the Table Ⅳ, The most frequent words in the 2019 symposium were 'ai', 'student', and 'K-12' in order, with more words related to education compared to the 2018 symposium ('learning', 'education', 'teacher', 'curriculum', etc.). (19) computer (19) working (18) session (16) service (15) science (15) program (14) system (14) teacher (14) data ( As the numbers for ecraftlearn, aiall, and k were removed during preprocessing, they are enclosed in parentheses.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…As shown in the Table Ⅳ, The most frequent words in the 2019 symposium were 'ai', 'student', and 'K-12' in order, with more words related to education compared to the 2018 symposium ('learning', 'education', 'teacher', 'curriculum', etc.). (19) computer (19) working (18) session (16) service (15) science (15) program (14) system (14) teacher (14) data ( As the numbers for ecraftlearn, aiall, and k were removed during preprocessing, they are enclosed in parentheses.…”
Section: Resultsmentioning
confidence: 99%
“…Second, Topic modeling is a text mining technique used to discover the hidden semantic structure of the text. It is useful for exploring topics or changing topic trends according to a time series for large amounts of unstructured data, such as social media and newspaper articles [14]- [16].…”
Section: Analysis Methodsmentioning
confidence: 99%
“…Due to the low system overhead, the throughput can be very high [38], but again they cannot handle large B. This category also includes some recent GPUbased systems such as SaberLDA [16] and BIDMach [40].…”
Section: Scalable Systems For Flat Modelsmentioning
confidence: 99%
“…For example, online advertisement systems extract topics from billions of search queries [34], and recommendation systems [1] need to handle millions of users and items. Various efforts has been made to develop scalable topic modeling systems, including asynchronous distributed data parallel training [1,17], hybrid data-and-model-parallel training [37,36], embarrassingly parallel BSP training [10,38,39], and GPUaccelerated training [40,16]. These topic modeling systems mainly handle the partition of the data and model and the synchronization of the count matrix across machines.…”
Section: Introductionmentioning
confidence: 99%
“…• Expectation-Maximization (EM) techniques [10] are also applicable, which converge to the max a posteriori approximation of the posterior.…”
Section: Introductionmentioning
confidence: 99%