Skip to main navigation Skip to search Skip to main content

Faster learning by reduction of data access time

Vinod Kumar Chauhan*, Anuj Sharma, Kalpana Dahiya

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Nowadays, the major challenge in machine learning is the ‘Big Data’ challenge. The big data problems due to large number of data points or large number of features in each data point, or both, the training of models have become very slow. The training time has two major components: Time to access the data and time to process (learn from) the data. So far, the research has focused only on the second part, i.e., learning from the data. In this paper, we have proposed one possible solution to handle the big data problems in machine learning. The idea is to reduce the training time through reducing data access time by proposing systematic sampling and cyclic/sequential sampling to select mini-batches from the dataset. To prove the effectiveness of proposed sampling techniques, we have used empirical risk minimization, which is commonly used machine learning problem, for strongly convex and smooth case. The problem has been solved using SAG, SAGA, SVRG, SAAG-II and MBSGD (Mini-batched SGD), each using two step determination techniques, namely, constant step size and backtracking line search method. Theoretical results prove similar convergence for systematic and cyclic sampling as the widely used random sampling technique, in expectation. Experimental results with bench marked datasets prove the efficacy of the proposed sampling techniques and show up to six times faster training.
Original languageEnglish
Pages (from-to)4715-4729
Number of pages15
JournalApplied Intelligence
Volume48
Issue number12
Early online date24 Jul 2018
DOIs
Publication statusPublished - 1 Dec 2018

Keywords

  • systematic sampling
  • random sampling
  • cyclic sampling
  • big data
  • large-scale learning
  • stochastic learning
  • empirical risk minimization

Fingerprint

Dive into the research topics of 'Faster learning by reduction of data access time'. Together they form a unique fingerprint.

Cite this