SAAGs: biased stochastic variance reduction methods for large-scale learning

Vinod Kumar Chauhan*, Anuj Sharma, Kalpana Dahiya

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

Stochastic approximation is one of the effective approach to deal with the large-scale machine learning problems and the recent research has focused on reduction of variance, caused by the noisy approximations of the gradients. In this paper, we have proposed novel variants of SAAG-I and II (Stochastic Average Adjusted Gradient) (Chauhan et al. 2017), called SAAG-III and IV, respectively. Unlike SAAG-I, starting point is set to average of previous epoch in SAAG-III, and unlike SAAG-II, the snap point and starting point are set to average and last iterate of previous epoch in SAAG-IV, respectively. To determine the step size, we have used Stochastic Backtracking-Armijo line Search (SBAS) which performs line search only on selected mini-batch of data points. Since backtracking line search is not suitable for large-scale problems and the constants used to find the step size, like Lipschitz constant, are not always available so SBAS could be very effective in such cases. We have extended SAAGs (I, II, III and IV) to solve non-smooth problems and designed two update rules for smooth and non-smooth problems. Moreover, our theoretical results have proved linear convergence of SAAG-IV for all the four combinations of smoothness and strong-convexity, in expectation. Finally, our experimental studies have proved the efficacy of proposed methods against the state-of-art techniques.
Original languageEnglish
Pages (from-to)3331-3361
Number of pages31
JournalApplied Intelligence
Volume49
Issue number9
Early online date5 Apr 2019
DOIs
Publication statusPublished - 1 Sept 2019

Keywords

  • stochastic gradient descent
  • stochastic optimization
  • variance reduction
  • strongly covex
  • smooth and non-smooth
  • SGD
  • large-scale learning

Fingerprint

Dive into the research topics of 'SAAGs: biased stochastic variance reduction methods for large-scale learning'. Together they form a unique fingerprint.

Cite this