On expected error of randomized Nyström kernel regression


Aleksandar Trokicić, Branimir Todorović




Kernel methods are a class of machine learning algorithms which learn and discover patterns in a high (possibly infinite) dimensional feature space obtained by often nonlinear, possibly infinite mapping of an input space. A major problem with kernel methods is their time complexity. For a data set with n input points a time complexity of a kernel method is O(n 3), which is intractable for a large data set. A method based on a random Nyström features is an approximation method that is able to reduce the time complexity to O(np 2 + p 3) where p is the number of randomly selected input data points. A time complexity of O(p 3) comes from the fact that a spectral decomposition needs to be performed on a p × p Gram matrix, and if p is a large number even an approximate algorithm is time consuming. In this paper we will apply the randomized SVD method instead of the spectral decomposition and further reduce the time complexity. An input parameters of a randomized SVD algorithm are p × p Gram matrix and a number m < p. In this case time complexity is O(nm 2 + p 2 m + m 3), and linear regression is performed on a m-dimensional random features. We will prove that the error of a predictor, learned via this method is almost the same in expectation as the error of a kernel predictor. Aditionally, we will empirically show that this predictor is better than the ONE that uses only Nyström method