Provable distributed stochastic gradient descent with delayed updates

Distributed stochastic gradient descent method is widely used for training large-scale machine learning models. However, the communication latency might slow down its convergence performance. Thus, [25] proposed a distributed stochastic gradient descent method with delayed updates to mitigate this issue. Although this method has improved empirical performance, its theoretical result is incomplete and suboptimal. In this paper, we provide a tighter and thorough convergence analysis for this method under mild conditions. In particular, our theoretical results achieve tighter convergence rates and disclose how much communication latency this method can admit without hampering the convergence rate. To the best of our knowledge, this is the first work achieving these tighter convergence rates.

Learn More

Publications

Provable distributed stochastic gradient descent with delayed updates

SIAM International Conference on Data Mining (SDM)

Publication date: April 29, 2021

Hongchang Gao, Gang Wu, Ryan A. Rossi

Research Area: AI & Machine Learning