News for August 2022

A eerily quiet month, this August of ’22. We only found a single paper to cover!

Training Overparametrized Neural Networks in Sublinear Time by Hang Hu, Zhao Song, Omri Weinstein, Danyang Zhuo (arXiv). Think of a classification problem where the inputs are in \(\mathbb{R}^d\). We have \(n\) such points (with their true labels, as training data) and wish to train a Neural Network. A two layer Rectified Linear Unit (ReLU) Neural Network (NN) works as follows. The first layer has \(m\) vertices, where each vertex has vector weight \(\vec{w}_i \in \mathbb{R}^d\). The second “hidden layer” has \(m\) vertices, each with a scalar weight \(a_1, a_2, \ldots, a_m\). This network is called overparametrized when \(m \gg n\). The output of this NN on input vector \(\vec{x}\) is (up to scaling) \(\sum_{i \leq m} a_i \phi(\vec{w_i} \cdot \vec{x})\) (where \(\phi\) is a thresholded linear function). Observe that to compute the value on a single input takes \(O(md)\) time, so the total time to compute all values on \(n\) training inputs takes \(O(mnd)\) time. The training is done by gradient descent methods; given a particular setting of weights, we compute the total loss, and then modify the weights along the gradient. Previous work showed how a single iteration can be done in time \(O(mnd + n^3)\). When \(m \gg n^2\), this can be thought of as linear in computing the loss function (which requires evaluating the NN on all the \(n\) points). This paper shows how to implement a single iteration in \(O(m^{1-\alpha}nd + n^3)\) time, for some \(\alpha > 0\). Hence, the time for an iteration is sublinear in the trivial computation. The techniques used are sparse recovery methods and random projections.

Property Testing Review

The latest in property testing and sublinear time algorithms

Leave a Reply Cancel reply