2012 SC Companion: High Performance Computing, Networking Storage and Analysis 2012
DOI: 10.1109/sc.companion.2012.113
|View full text |Cite
|
Sign up to set email alerts
|

Tight Coupling of R and Distributed Linear Algebra for High-Level Programming with Big Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
1

Year Published

2014
2014
2019
2019

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(9 citation statements)
references
References 5 publications
0
6
1
Order By: Relevance
“…Our results contrast with the recommendation given by Schmidt et al, 2012c, Schmidt et al, 2012a to partition data with square blocking factors. The reason for that is likely due to the fact that the column dimension of the blocking factors was chosen equal to the number of variables related to each variable Yj, thus facilitating the computation of the distributed matrix algebra operations considered in the PLS algorithm.…”
Section: Resultscontrasting
confidence: 99%
See 2 more Smart Citations
“…Our results contrast with the recommendation given by Schmidt et al, 2012c, Schmidt et al, 2012a to partition data with square blocking factors. The reason for that is likely due to the fact that the column dimension of the blocking factors was chosen equal to the number of variables related to each variable Yj, thus facilitating the computation of the distributed matrix algebra operations considered in the PLS algorithm.…”
Section: Resultscontrasting
confidence: 99%
“…Therefore, to properly establish the communicator – that is the object “to define which collection of processes may communicate with each other” – is of paramount importance. Since pbdR is focused on the SPMD programming paradigm (Chen et al, 2012a, Schmidt et al, 2012c, Ostrouchov et al, 2013), users need to initialize the communicator(s) at the beginning of a script with the instruction init(). This enables the initialization of the processors (or task IDs) “to specify the source and destination of messages”.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Methods previously discussed can be adopted, such as cluster analysis and OLAP. Statistical computing packages like R (language) can be useful as well (Schmidt, Ostrouchov, Chen, & Patel, 2012). Some data mining tools and software providers include: enterprise miner from SAS, intelligent miner from IBM, setminer from SGI, clementine from SPSS, DB miner from DB Miner Technology Inc., PRW from Unica Technolgies Inc., Darwin from thinking machines, greenplum from EMC, etc.…”
Section: Expansion Of Current Aismentioning
confidence: 99%
“…However, this approach does not exploit efficient memory sharing in the cloud. To solve the low programmability of traditional distributed computing environments, pbdR [9] tightly couples R with the MPI libraries, which enables developing high-level distributed data parallelism in R and also utilizing HPC platforms, but suffers the fault tolerance problems. SparkR [10] is an R package that provides a lightweight frontend to use Apache Spark from R. It exposes the low-level Spark API through the RDD class and allows users to interactively run jobs from the R shell on a cluster.…”
Section: Related Workmentioning
confidence: 99%