Xinghao Pan scite author profile

We introduce and analyze stochastic optimization methods where the input to each update is perturbed by bounded noise. We show that this framework forms the basis of a unified approach to analyze asynchronous implementations of stochastic optimization algorithms, by viewing them as serial methods operating on noisy inputs. Using our perturbed iterate framework, we provide new analyses of the Hogwild! algorithm and asynchronous stochastic coordinate descent, that are simpler than earlier analyses, remove many assumptions of previous models, and in some cases yield improved upper bounds on the convergence rates. We proceed to apply our framework to develop and analyze KroMagnon: a novel, parallel, sparse stochastic variance-reduced gradient (SVRG) algorithm. We demonstrate experimentally on a 16-core machine that the sparse and parallel version of SVRG is in some cases more than four orders of magnitude faster than the standard SVRG algorithm.

show abstract

MLI: An API for Distributed Machine Learning

Sparks

Talwalkar²,

Smith³

et al. 2013

133

View full text Add to dashboard Cite

MLI is an Application Programming Interface designed to address the challenges of building Machine Learning algorithms in a distributed setting based on data-centric computing. Its primary goal is to simplify the development of high-performance, scalable, distributed algorithms. Our initial results show that, relative to existing systems, this interface can be used to build distributed implementations of a wide variety of common Machine Learning algorithms with minimal complexity and highly competitive performance and scalability. function [U, V] = ALS_matlab(M, U, V, k, lambda, maxiter) 2 3 % Initialize variables 4 [m,n] = size(M); 5 lambI = lambda * eye(k); 6 for q = 1:m 7 Vinds{q} = find(M(q,:)˜= 0); 8 end 9 for q=1:n 10 Uinds{q} = find(M(:,q)˜= 0); 11 end 12 13 % ALS main loop 14 for iter=1:maxiter 15 parfor q=1:m 16 Vq = V(Vinds{q},:); 17 U(q,:) = (Vq' * Vq + lambI) \ (Vq' * M(q,Vinds{q})'); 18 end 19 parfor q=1:n 20 Uq = U(Uinds{q},:); 21 V(q,:) = (Uq' * Uq + lambI) \ (Uq' * M(Uinds{q},q)); 22 end 23 end 24 end 1 object BroadcastALS { 2 def train(trainData: MLTable, k: Int, lambda: Double, 3 maxIter: Int): (LocalMatrix, LocalMatrix) = { 4 val ctx = trainData.context 5 val m = trainData.numRows 6 val n = trainData.numCols 7 val trainDataTrans = trainData.transpose 8 val lambI = LocalMatrix.eye(k) * lambda 9 // Initialize U and V matrices randomly 10 val U0 = LocalMatrix.rand(m, k) 11 val V0 = LocalMatrix.rand(n, k) 12 (0 until maxIter).foldLeft((U0, V0))((UV, iterNum) => { 13 val U = UV._1 14 val V = UV._2 15 // Broadcast V 16 val V_b = ctx.broadcast(V) 17 // Update U matrix 18 val newU = computeFactor(trainData, V_b, lambI) 19 // Broadcast U 20 val U_b = ctx.broadcast(newU) 21 // Update V matrix 22 val newV = computeFactor(trainDataTrans, U_b, lambI) 23 (newU, newV) 24 }) 25 } 26 27 def computeFactor(trainData: MLTable, fixedFactor: Broadcast[LocalMatrix], 28 lambI: LocalMatrix): LocalMatrix = { 29 trainData.map(localALS(_, fixedFactor.value, lambI)).toLocalMatrix 30 } 31 32 def localALS(trainDataPart: MLRow, Y: LocalMatrix, lambI: LocalMatrix) = { 33 val tuple = trainDataPart.tuple 34 val Yq = Y.getRows(tuple.nonZeroIndices) 35 val resultMat = ((Yq.transpose times Yq) + lambI).solve(Yq.transpose times tuple.nonZeroProjection) 36 resultMat.toVector 37 } 38 } Fig. A9: Matrix Factorization via ALS code in MATLAB (top) and MLI (bottom).

show abstract

City-scale traffic estimation from a roving sensor network

et al. 2012

View full text Add to dashboard Cite

Traffic congestion, volumes, origins, destinations, routes, and other road-network performance metrics are typically collected through survey data or via static sensors such as traffic cameras and loop detectors. This information is often out-of-date, difficult to collect and aggregate, difficult to analyze and quantify, or all of the above. In this paper we conduct a case study that demonstrates that it is possible to accurately infer traffic volume through data collected from a roving sensor network of taxi probes that log their locations and speeds at regular intervals. Our model and inference procedures can be used to analyze traffic patterns and conditions from historical data, as well as to infer current patterns and conditions from data collected in real-time. As such, our techniques provide a powerful new sensor network approach for traffic visualization, analysis, and urban planning.

show abstract

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

Mania¹,

Pan²,

Papailiopoulos³

et al. 2015

Preprint

View full text Add to dashboard Cite

Twitter Homophily: Network Based Prediction of User’s Occupation

Pan¹,

Bhardwaj²,

Lu³

et al. 2019

View full text Add to dashboard Cite

In this paper, we investigate the importance of social network information compared to content information in the prediction of a Twitter user's occupational class. We show that the content information of a user's tweets, the profile descriptions of a user's follower/following community, and the user's social network provide useful information for classifying a user's occupational group. In our study, we extend an existing dataset for this problem, and we achieve significantly better performance by using social network homophily that has not been fully exploited in previous work. In our analysis, we found that by using the graph convolutional network to exploit social homophily, we can achieve competitive performance on this dataset with just a small fraction of the training data.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xinghao Pan

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

MLI: An API for Distributed Machine Learning

City-scale traffic estimation from a roving sensor network

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

Twitter Homophily: Network Based Prediction of User’s Occupation

Contact Info

Product

Resources

About