# News for April 2017

April was a busy month for property testing. A mixture of distribution testing, junta testing, Fourier transforms, matrix problems, and graph testing. Let’s go!
(Update: thanks to Omri Ben Eliezer for correcting some errors in our post. Also, he pointed out that we missed posting about by a property testing paper by Gishboliner-Shapira that appeared in Nov 2016.)

Fourier-Based Testing for Families of Distributions, by Clement Canonne, Ilias Diakonikolas, and Alistair Stewart (ECCC). Readers of PTReview will be familiar with great progress made over the past few years in distribution testing. We wish to determine some property $$\mathcal{P}$$ of a discrete distribution $$D$$ of support size $$n$$, using iid samples from the distribution. The challenge is that the number of samples should be $$o(n)$$. This paper gives a general framework for any property $$\mathcal{P}$$, such that all members (distributions) have sparse Fourier transforms. One of the main tools is a tester for determining if $$D$$ has small Fourier support (and find the restriction to this support). This is a really neat unifying methods that subsumes much of the past work. Moreover, this method leads to new testers for a variety of classes of distributions, including that of multinomial distributions.

Testing from One Sample: Is the casino really using a riffle shuffle, by Constantinos Daskalakis, Nishanth Dikkala, and Nick Gravin (arXiv). This paper introduces a novel twist on distribution testing: suppose $$\mathcal{D}$$ was generated through a Markov Chain, and we only get to see a sequence of samples generated through a random walk. Observe that we do not have the classic iid assumption any more. What can be said now? There is a challenge in defining “closeness” of distributions, which is done by looking at the variation distance between sequences generated through random walks of a carefully chosen length. The focus is on identity testing in two different regimes. First, with no assumptions on the Markov chain, there are general results that take $$O(n)$$ queries (note that the Markov chain has size $$O(n^2)$$). Secondly, for special “sparse” Markov Chains such as various models of card shuffling, one can get the number of samples to be logarithmic in size of the state space. Thus, one effectively gets to see a single shuffle process (which is like a walk in a Markov Chain), and can still decide if it is generating a uniform distribution.

Settling the query complexity of non-adaptive junta testing, by Xi Chen, Rocco A. Servedio, Li-Yang Tan, Erik Waingarten, and Jinyu Xie (arXiv). Ah, progress on a classic open problem! A function $$f:\{0,1\}^n \to \{0,1\}$$ is a $$k$$-junta if it only depends on $$k$$ variables. After a long line of work, Blais provided a non-adaptive $$\widetilde{O}(k^{3/2})$$ tester and in a separate result of Blais gave an adaptive (ignoring $$\epsilon$$ factors) $$\widetilde{O}(k)$$ tester. Previous to this paper, the best non-adaptive lower bound was essentially $$\Omega(k)$$, ignoring polylog factors. This paper finally provides a matching non-adaptive lower bound of $$\Omega(k^{3/2})$$, up to polylog factors! This basically settles the knowledge (adaptive and non-adaptive) of this problem, up to polylogs.

An Adaptive Sublinear-Time Block Sparse Fourier Transform, by Volkan Cevher, Michael Kapralov, Jonathan Scarlett, and Amir Zandieh (arXiv).
(We should have caught this when it was first published in Feb 2017, but at least we caught the update!) Consider a vector $$X$$ whose Fourier transform we wish to compute. Suppose there are only $$k$$ non-zero coefficients (typically, this is what we care about). We can actually compute these coefficients using $$O(k\log n)$$ queries, which is strongly sublinear when $$k \ll n$$. An important question is when the sample complexity can go below $$k\log n$$, which is optimal in general. Often sparsity Fourier coefficients have structure that can be exploited. This paper studies block-sparsity, where all non-zero coefficients occur in $$k_0$$ contiguous blocks (in Fourier space) of size $$k_1$$. Thus, the total sparsity is $$k = k_0k_1$$. This paper designs an algorithm with adaptive query complexity $$O(k_0k_1 + k_0\log n)$$, thereby circumventing the $$k\log n$$ barrier for $$k_0 \ll k$$. Interestingly, the paper gives lower bounds showing that adaptivity is necessary for removing the $$\log n$$ factor.

Sublinear Time Low-Rank Approximation of Positive Semidefinite Matrices, by Cameron Musco and David P. Woodruff (arXiv). Few problems occur as often as computing a low-rank approximation of a matrix $$A$$. Formally, one wishes to compute a rank $$k$$-matrix $$B$$ such that $$\|A-B\|_F$$ is minimized. SVD is the obvious method to do this, but is computationally expensive. There are a number of sampling methods, but all of them require reading all of $$A$$. This paper gives a truly sublinear-time algorithm (for small $$k$$) for PSD matrices. Formally, the algorithm presented here requires $$O(n \cdot \mathrm{poly}(k))$$ running time, where $$n$$ is the dimension size of $$A$$. It is well-known from previous work that there exist a subset of $$k$$ columns that form a good low-rank approximation. Yet finding them requires reading the whole matrix to compute various sampling strategies. The intuition here is that large entries cannot “hide” in arbitrary locations in a PSD matrix.

Testing hereditary properties of ordered graphs and matrices, by Noga Alon, Omri Ben-Eliezer, and Eldar Fischer (arXiv). Classic results by Alon-Shapira and Alon-Fischer-Newman-Shapira have shown that hereditary graph properties are testable in constant time. There is a deep connection between testing such properties and the existence of subgraph removal lemmas. The latter assert that, given some family $$\mathcal{F}$$ of forbidden subgraphs, if an $$n$$-vertex graph is $$\epsilon$$-far from being $$\mathcal{F}$$-free, then it must contain $$\delta n^q$$ copies of some $$F \in \mathcal{F}$$ (where $$F$$ has $$q$$ vertices, $$\delta$$ is a function only of $$\epsilon$$). This paper generalizes these theorems to general matrices over finite domains and ordered edge-colored graphs, where vertices in the graphs have a fixed ordering. Thus, one does not require the symmetry (over isomorphism) of graph properties for testability. These theorems lead to testability results for large classes of properties defined through forbidden substructures.

Removal Lemmas with Polynomial Bounds, by Lior Gishboliner and Asaf Shapira (arXiv). (Paper originally appeared in Nov 2016, but we missed it.) As mentioned above, there is a close connection between property testing of dense graphs and graph removal lemmas. Yet the property testers that result from these lemmas, like the celebrated triangle removal lemma, have terrible tower-like dependences on the parameter $$\epsilon$$. When can we get polynomial dependences on $$\epsilon$$ (which is called “easy testability”)? This fundamental question is answered in this paper. Consider the case when $$\mathcal{F}$$ is finite. If $$\mathcal{F}$$ contains a bipartite graph, the complement of a bipartite graph, and a split graph, then there are polynomially bounded graph removal lemmas (and hence easy property testers) for this family. Furthermore, one cannot drop any of these requirements! (A split graph is one where vertices can be partitioned into a set spanning a clique, and a set spanning an independent set.) There are generalizations to infinite families of graphs. An important corollary of this result is a proof of easy testability of semi-algebraic properties. This captures properties of graphs where vertices are points in Euclidean space and edges are present when the points satisfy some polynomial constraint (in the coordinates).