Proceedings of the 8th Workshop on General Purpose Processing Using GPUs 2015
DOI: 10.1145/2716282.2716289
|View full text |Cite
|
Sign up to set email alerts
|

Stochastic gradient descent on GPUs

Abstract: Irregular algorithms such as Stochastic Gradient Descent (SGD) can benefit from the massive parallelism available on GPUs. However, unlike in data-parallel algorithms, synchronization patterns in SGD are quite complex. Furthermore, scheduling for scale-free graphs is challenging. This work examines several synchronization strategies for SGD, ranging from simple locking to conflict-free scheduling. We observe that static schedules do not yield better performance despite eliminating the need to perform conflict … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(15 citation statements)
references
References 9 publications
0
15
0
Order By: Relevance
“…There are several existing works that have compared GPU performance with CPUs. For example, it is reported that a GPU implementation of Stochastic Gradient Descent (SGD) performs as good as 14 cores on a 40-core CPU system [20]. For Single-Source Shortest Path (SSSP) problem, it is reported that an efficient serial implementation can outperform highly parallel GPU implementations for high-diameter or scale-free graphs [21].…”
Section: Methodsmentioning
confidence: 99%
“…There are several existing works that have compared GPU performance with CPUs. For example, it is reported that a GPU implementation of Stochastic Gradient Descent (SGD) performs as good as 14 cores on a 40-core CPU system [20]. For Single-Source Shortest Path (SSSP) problem, it is reported that an efficient serial implementation can outperform highly parallel GPU implementations for high-diameter or scale-free graphs [21].…”
Section: Methodsmentioning
confidence: 99%
“…DSGD (Distribute SGD) partitions the ratings matrix into several blocks and updates a set of independent blocks concurrently [8]. Kaleem et al show that the parallel SGD can run efficiently on GPU, and their implementation on GPU is comparable to a 14-thread CPU implementation [51]. Jinoh et al propose MLGF-MF, which is robust to skewed matrices and runs efficiently on blockstorage devices (e.g., SSD disks) as well as shared-memory platforms.…”
Section: Related Workmentioning
confidence: 99%
“…Unlike the Sigmoid and tanh functions, in which the gradient may disappear, PENLU does not exhibit this phenomenon because it does not have a right saturation property, and its derivative does not approach 0. By using the backward transfer SGD (Kaleem et al, 2015) algorithm, the parameters such as β and α are optimized so that they can be switched between the exponential unit and the rectifier unit at random, and the linear and nonlinear adjustment between them is possible. This design of PENLU is more flexible than ReLU, PReLU and ELU, and the latter can be regarded as a special case of PENLU.…”
Section: Parametric Exponential Nonlinear Unitmentioning
confidence: 99%