# News for August 2022

A eerily quiet month, this August of ’22. We only found a single paper to cover!

Training Overparametrized Neural Networks in Sublinear Time by Hang Hu, Zhao Song, Omri Weinstein, Danyang Zhuo (arXiv). Think of a classification problem where the inputs are in $$\mathbb{R}^d$$. We have $$n$$ such points (with their true labels, as training data) and wish to train a Neural Network. A two layer Rectified Linear Unit (ReLU) Neural Network (NN) works as follows. The first layer has $$m$$ vertices, where each vertex has vector weight $$\vec{w}_i \in \mathbb{R}^d$$. The second “hidden layer” has $$m$$ vertices, each with a scalar weight $$a_1, a_2, \ldots, a_m$$. This network is called overparametrized when $$m \gg n$$. The output of this NN on input vector $$\vec{x}$$ is (up to scaling) $$\sum_{i \leq m} a_i \phi(\vec{w_i} \cdot \vec{x})$$ (where $$\phi$$ is a thresholded linear function). Observe that to compute the value on a single input takes $$O(md)$$ time, so the total time to compute all values on $$n$$ training inputs takes $$O(mnd)$$ time. The training is done by gradient descent methods; given a particular setting of weights, we compute the total loss, and then modify the weights along the gradient. Previous work showed how a single iteration can be done in time $$O(mnd + n^3)$$. When $$m \gg n^2$$, this can be thought of as linear in computing the loss function (which requires evaluating the NN on all the $$n$$ points). This paper shows how to implement a single iteration in $$O(m^{1-\alpha}nd + n^3)$$ time, for some $$\alpha > 0$$. Hence, the time for an iteration is sublinear in the trivial computation. The techniques used are sparse recovery methods and random projections.