Author Archives: Seshadhri

News for April 2017

April was a busy month for property testing. A mixture of distribution testing, junta testing, Fourier transforms, matrix problems, and graph testing. Let’s go!
(Update: thanks to Omri Ben Eliezer for correcting some errors in our post. Also, he pointed out that we missed posting about by a property testing paper by Gishboliner-Shapira that appeared in Nov 2016.)

Fourier-Based Testing for Families of Distributions, by Clement Canonne, Ilias Diakonikolas, and Alistair Stewart (ECCC). Readers of PTReview will be familiar with great progress made over the past few years in distribution testing. We wish to determine some property \(\mathcal{P}\) of a discrete distribution \(D\) of support size \(n\), using iid samples from the distribution. The challenge is that the number of samples should be \(o(n)\). This paper gives a general framework for any property \(\mathcal{P}\), such that all members (distributions) have sparse Fourier transforms. One of the main tools is a tester for determining if \(D\) has small Fourier support (and find the restriction to this support). This is a really neat unifying methods that subsumes much of the past work. Moreover, this method leads to new testers for a variety of classes of distributions, including that of multinomial distributions.

Testing from One Sample: Is the casino really using a riffle shuffle, by Constantinos Daskalakis, Nishanth Dikkala, and Nick Gravin (arXiv). This paper introduces a novel twist on distribution testing: suppose \(\mathcal{D}\) was generated through a Markov Chain, and we only get to see a sequence of samples generated through a random walk. Observe that we do not have the classic iid assumption any more. What can be said now? There is a challenge in defining “closeness” of distributions, which is done by looking at the variation distance between sequences generated through random walks of a carefully chosen length. The focus is on identity testing in two different regimes. First, with no assumptions on the Markov chain, there are general results that take \(O(n)\) queries (note that the Markov chain has size \(O(n^2)\)). Secondly, for special “sparse” Markov Chains such as various models of card shuffling, one can get the number of samples to be logarithmic in size of the state space. Thus, one effectively gets to see a single shuffle process (which is like a walk in a Markov Chain), and can still decide if it is generating a uniform distribution.

Settling the query complexity of non-adaptive junta testing, by Xi Chen, Rocco A. Servedio, Li-Yang Tan, Erik Waingarten, and Jinyu Xie (arXiv). Ah, progress on a classic open problem! A function \(f:\{0,1\}^n \to \{0,1\}\) is a \(k\)-junta if it only depends on \(k\) variables. After a long line of work, Blais provided a non-adaptive \(\widetilde{O}(k^{3/2})\) tester and in a separate result of Blais gave an adaptive (ignoring \(\epsilon\) factors) \(\widetilde{O}(k)\) tester. Previous to this paper, the best non-adaptive lower bound was essentially \(\Omega(k)\), ignoring polylog factors. This paper finally provides a matching non-adaptive lower bound of \(\Omega(k^{3/2})\), up to polylog factors! This basically settles the knowledge (adaptive and non-adaptive) of this problem, up to polylogs.

An Adaptive Sublinear-Time Block Sparse Fourier Transform, by Volkan Cevher, Michael Kapralov, Jonathan Scarlett, and Amir Zandieh (arXiv).
(We should have caught this when it was first published in Feb 2017, but at least we caught the update!) Consider a vector \(X\) whose Fourier transform we wish to compute. Suppose there are only \(k\) non-zero coefficients (typically, this is what we care about). We can actually compute these coefficients using \(O(k\log n)\) queries, which is strongly sublinear when \(k \ll n\). An important question is when the sample complexity can go below \(k\log n\), which is optimal in general. Often sparsity Fourier coefficients have structure that can be exploited. This paper studies block-sparsity, where all non-zero coefficients occur in \(k_0\) contiguous blocks (in Fourier space) of size \(k_1\). Thus, the total sparsity is \(k = k_0k_1\). This paper designs an algorithm with adaptive query complexity \(O(k_0k_1 + k_0\log n)\), thereby circumventing the \(k\log n\) barrier for \(k_0 \ll k\). Interestingly, the paper gives lower bounds showing that adaptivity is necessary for removing the \(\log n\) factor.

Sublinear Time Low-Rank Approximation of Positive Semidefinite Matrices, by Cameron Musco and David P. Woodruff (arXiv). Few problems occur as often as computing a low-rank approximation of a matrix \(A\). Formally, one wishes to compute a rank \(k\)-matrix \(B\) such that \(\|A-B\|_F\) is minimized. SVD is the obvious method to do this, but is computationally expensive. There are a number of sampling methods, but all of them require reading all of \(A\). This paper gives a truly sublinear-time algorithm (for small \(k\)) for PSD matrices. Formally, the algorithm presented here requires \(O(n \cdot \mathrm{poly}(k))\) running time, where \(n\) is the dimension size of \(A\). It is well-known from previous work that there exist a subset of \(k\) columns that form a good low-rank approximation. Yet finding them requires reading the whole matrix to compute various sampling strategies. The intuition here is that large entries cannot “hide” in arbitrary locations in a PSD matrix.

Testing hereditary properties of ordered graphs and matrices, by Noga Alon, Omri Ben-Eliezer, and Eldar Fischer (arXiv). Classic results by Alon-Shapira and Alon-Fischer-Newman-Shapira have shown that hereditary graph properties are testable in constant time. There is a deep connection between testing such properties and the existence of subgraph removal lemmas. The latter assert that, given some family \(\mathcal{F}\) of forbidden subgraphs, if an \(n\)-vertex graph is \(\epsilon\)-far from being \(\mathcal{F}\)-free, then it must contain \(\delta n^q\) copies of some \(F \in \mathcal{F}\) (where \(F\) has \(q\) vertices, \(\delta\) is a function only of \(\epsilon\)). This paper generalizes these theorems to general matrices over finite domains and ordered edge-colored graphs, where vertices in the graphs have a fixed ordering. Thus, one does not require the symmetry (over isomorphism) of graph properties for testability. These theorems lead to testability results for large classes of properties defined through forbidden substructures.

Removal Lemmas with Polynomial Bounds, by Lior Gishboliner and Asaf Shapira (arXiv). (Paper originally appeared in Nov 2016, but we missed it.) As mentioned above, there is a close connection between property testing of dense graphs and graph removal lemmas. Yet the property testers that result from these lemmas, like the celebrated triangle removal lemma, have terrible tower-like dependences on the parameter \(\epsilon\). When can we get polynomial dependences on \(\epsilon\) (which is called “easy testability”)? This fundamental question is answered in this paper. Consider the case when \(\mathcal{F}\) is finite. If \(\mathcal{F}\) contains a bipartite graph, the complement of a bipartite graph, and a split graph, then there are polynomially bounded graph removal lemmas (and hence easy property testers) for this family. Furthermore, one cannot drop any of these requirements! (A split graph is one where vertices can be partitioned into a set spanning a clique, and a set spanning an independent set.) There are generalizations to infinite families of graphs. An important corollary of this result is a proof of easy testability of semi-algebraic properties. This captures properties of graphs where vertices are points in Euclidean space and edges are present when the points satisfy some polynomial constraint (in the coordinates).

News for January 2017

2017 starts off rather slow for property testing. Though, we have an intriguing paper to report – an experimental analysis of a classic sublinear algorithm.

Evaluating a Sublinear-time Algorithm for the Minimum Spanning Tree Weight Problem, by Gabriele Santi and Leonardo De Laurentiis (arXiv). The Chazelle-Rubinfeld-Trevisan Minimum Spanning Tree algorithm is a classic in sublinear algorithms. This algorithms provides a \((1+\varepsilon)\) approximation to the MST in time independent of the number of vertices (although it does depend on the average degree). But how this compare with Prim’s algorithm on real instances, in a real (not theoretical) computer? This intriguing paper does a detailed experimental comparison. Having done experimental graph algorithms myself, I can attest to the various challenges: how to choose a test set of graphs? How to set error parameters? Can data structure optimization on the coding side beat asymptotic improvements? This paper does a series of experiments on synthetic graph generators (such as Erdős-Rényi, Barabási-Albert, Watts-Strogatz models). They do validate the basic CRT algorithm at scale, by showing that it is faster than Prim for graphs with more than a million edges. Their experiments suggest that the sublinear-time algorithm gives little benefits when \(\varepsilon \leq 0.2\). The paper has many experiments for a variety of settings, and the authors do a comprehensive study of the various parameters. I’d definitely recommend to anyone interested in exploring how property testing might influence algorithms in the real world.

News for October 2016

Alas, a rather dry month for property testing. We did find one quantum computing result based on the classic linearity testing theorem.

Robust Self-Testing of Many-Qubit States, by Anand Natarajan and Thomas Vidick (arXiv). (Frankly, my understanding of quantum computation is poor, and this summary may reflect that. Then again, some searching on Google and Wikipedia have definitely broadened my horizons.) One of the key concepts in quantum computation is the notion of entanglement. This allows for correlations between (qu)bits of information beyond what can be classically achieved. Given some device with supposed quantum properties (such as sets of entangled bits), is there a way of verification by measuring various outcomes of the device? This is referred to as self-testing of quantum states. This paper proves such a result for a set of \(n\) EPR pairs, which one can think of \(n\) pairs of “entangled” qubits. The interest to us property testers is the application of a quantum version of the seminar Blum, Luby, Rubinfeld linearity test.

News for July 2016

After a few slow months, property testing is back with a bang! This month we have a wide range on results ranging from classic problems like junta testing to new models of property testing.

Tolerant Junta Testing and the Connection to Submodular Optimization and Function Isomorphism, by Eric Blais, Clément L. Canonne, Talya Eden, Amit Levi, and Dana Ron (arXiv). The problem of junta testing requires little introduction. Given a boolean function \(f:\{-1,+1\}^n \mapsto \{-1,+1\}\), a \(k\)-junta only depends on \(k\) of the input variables. A classic problem has been that of property testing \(k\)-juntas, and a rich line of work (almost) resolves the complexity to be \(\Theta(k/\epsilon)\). But what of tolerant testing, where we want to tester to accept functions close to being a junta? Previous work has shown that existing testers are tolerant, but with extremely weak parameters. This paper proves: there is a \(poly(k/\epsilon)\)-query tester that accepts every function \(\epsilon/16\)-close to a \(k\)-junta and rejects every function \(\epsilon\)-far from being a \(4k\)-junta. Note that the “gap” between the junta classes is a factor of 4, and this is the first result to achieve a constant gap. The paper also gives testers when we wish to reject functions far from being a \(k\)-junta (exactly matching the definition of tolerant testng), but the tester has an exponential dependence on \(k\). The results have intriguing connections with isomorphism testing, and there is a neat use of constrained submodular minimization.

Testing Pattern-Freeness, by Simon Korman and Daniel Reichman (arXiv). Consider a string \(I\) of length \(n\) and a “pattern” \(J\) of length \(k\), both over some alphabet \(\Sigma\). Our aim is to property test if \(I\) is \(J\)-free, meaning that \(I\) does not contains \(J\) as a substring. This problem can also be generalized to the 2-dimensional setting, where \(I\) and \(J\) are matrices with entries in \(\Sigma\). This formulation is relevant in image testing, where we need to search for a template pattern in a large image. The main results show that these testing problems can be solved in time \(O(1/\epsilon)\), with an intriguing caveat. If the pattern \(J\) has at least \(3\) distinct symbols, the result holds. If \(J\) is truly binary, then \(J\) is not allowed to be in a specified set of forbidden patterns. The main tool is a modification lemma that shows how to “kill” a specified occurrence of \(J\) without introducing a new one. This lemma is not true for the forbidden patterns, resulting in the dichotomy of the results.

Erasure-Resilient Property Testing, by Kashyap Dixit, Sofya Raskhodnikova, Abhradeep Thakurta, and Nithin Varma (arXiv). Property testing begins with a query model for \(f: \mathcal{D} \mapsto \mathcal{R}\), so we can access any \(f(x)\). But what if an adversary corrupted some fraction of the input? Consider monotonicity testing on the line, so \(f: [n] \mapsto \mathbb{R}\). We wish to test if \(f\) is \(\epsilon\)-far from being monotone, but the adversary corrupts/hides an \(\alpha\)-fraction of the values. When we query such a position, we receive a null value. We wish to accept if there exists some “filling” of the corrupted values that makes \(f\) monotone, and reject if all “fillings” keep \(f\) far from monotone. It turns out the standard set of property testers are not erasure resilient, and fail on this problem. This papers gives erasure resilient testers for properties like monotonicity, Lipschitz continuity, and convexity. The heart of the techniques involves a randomized binary tree tester for the line that can avoid the corrupted points.

Partial Sublinear Time Approximation and Inapproximation for Maximum Coverage, by Bin Fu (arXiv). Consider the classic maximum coverage problem. We have \(m\) sets \(A_1, A_2, \ldots, A_m\) and wish to pick the \(k\) of them with the largest union size. The query model allows for membership queries, cardinality queries, and generation of random elements from a set. Note that the size of the input can be thought of as \(\sum_i |A_i|\). Can one approximate this problem without reading the full input? This paper gives a \((1-1/e)\)-factor approximation that runs in time \(poly(k)m\log m\), and is thus sublinear in the input size. The algorithm essentially implements a greedy approximation algorithm in sublinear time. It is shown the the linear dependence on \(m\) is necessary: there is no constant factor approximation that runs in \(q(n) m^{1-\delta}\) (where \(n\) denotes the maximum cardinality, \(q(\cdot)\) is an arbitrary function, and \(\delta > 0\)).

Local Testing for Membership in Lattices, by Karthekeyan Chandrasekaran, Mahdi Cheraghchi, Venkata Gandikota, and Elena Grigorescu (arXiv). Inspired by the theory of locally testable codes, this paper introduces local testing in lattices. Given a set of basis vectors \(b_1, b_2, \ldots\) in \(\mathbb{Z}^n\), the lattice \(L\) is the set of all integer linear combinations of the basic vectors. This is the natural analogue of linear error correcting codes (where everything is done over finite fields). Given some input \(t \in \mathbb{Z}^n\), we wish to determine if \(t \in L\), or is far (defined through some norm) from all vectors in \(L\), by querying a constant number of coordinates in \(t\). We assume that \(L\) is fixed, so it can be preprocessed arbitrarily. This opens up a rich source of questions, and this work might be seen as only the first step in this direction. The papers shows a number of results. Firstly, there is a family of “code formula” lattices for which testers exists (with almost matching lower bounds). Furthermore, with high probability over random lattices, testers do not exist. Analogous to testing codes, if a constant query lattice tester exists, then there exists a canonical constant query tester.

News for June 2016

This month isn’t the most exciting for property testing. Just one paper to report (thanks to Clément for catching!), of which only a minor part is actually related to property testing.

Approximating the Spectral Sums of Large-scale Matrices using Chebyshev Approximation, by Insu Han, Dmitry Malioutov, Haim Avron, and Jinwoo Shin (arXiv). The main results of the paper deal with approximating the trace of matrix functions. Consider symmetric matrix \(M \in \mathbb{R}^{d\times d}\), with eigenvalues \(\lambda_1, \lambda_2, \ldots, \lambda_d\). The paper gives new algorithms to approximate \(\sum_i f(\lambda_i)\). Each “operation” of the algorithm is a matrix-vector product, and the aim is to minimize the number of such operations. The property testing application is as follows. Suppose we wish to test if \(M\) is positive-definite (all \(\lambda_i > 0\)). We consider a matrix \(\epsilon\)-far from being positive-definite if the smallest eigenvalue is less than \(-\epsilon \|M\|_F = -\epsilon \sum_i \lambda^2_i\). The main theorems in this paper yield a testing algorithm (under this definition of distance) that makes \(o(d)\) matrix-vector products. While this is not exactly sublinear under a standard query access model, it is relevant when \(M\) is not explicitly represented and the only access is through such matrix-vector product queries.

News For February 2016

News for February 2016! Another take on distribution testing, exciting stuff about generating random graphs, and distributed property testing.

Sublinear Random Access Generators for Preferential Attachment Graphs by Guy Even, Reut Levi, Moti Medina, and Adi Rosén (ArXiv). A major aspect of research in “network science” and applied large graph analysis is random graph models. A classic example is the Barabasi-Albert preferential attachment (PA) model, that attempts to model graphs that appear in the real-world. The model is inherently sequential, in that vertices “arrive” in order and connect to other vertices according to a specified distribution. It was not clear how to generate a PA graph in parallel, or how one can construct a local view of a PA graph without constructing all of it. Not clear, that is, until this paper. This really cool result shows the following. Suppose we want to construct an \(n\) vertex PA graph. One can basically query the adjacency list of any vertex in \(poly(\log n)\) time, without constructing the graph! The PA graph is implicitly constructed through the queries. This is really interesting because PA graphs (at least conceptually) are created sequentially, and the order of vertex arrival fundamentally affects the graph structure.

 

The Uniform Distribution is Complete with Respect to Testing Identity to a Fixed Distribution by Oded Goldreich (ECCC) . This is a follow up to a recent result by Diakonikolas and Kane (covered last month!). This paper reduces testing equality (of a distribution) to a fixed distribution, to the special case to when the fixed distribution is uniform. The notions of reduction in distribution testing were exploited by Diakonikolas and Kane to get optimal testers. While this new paper does not give new bounds, it gives a clean and simplified approach for testing distribution equality. This paper came out of Oded’s lecture notes, and has a nice expository feel to it.

 

Fast Distributed Algorithms for Testing Graph Properties by Keren Censor-Hillel, Eldar Fischer, Gregory Schwartzman, and Yadu Vasudev (ArXiv) . Connections between distributed algorithms and property testing have been unearthed in the past (most explicitly by Parnas and Ron). This paper directly solves graph property testing problems in the distributed setting. For dense graph testing, it is known that any (constant-query) testing algorithm algorithm can be made non-adaptive, and thus can be simulated in the distributed setting. The results for testing bipartiteness in the sparse graph model are quite interesting, because existing property testers use random walks. This paper gives a distributed implementation of multiple random walks from all vertices, and controls the total congestion of the implementation. This leads to a \(O(\log n)\)-round bipartiteness tester.

Lecture Notes on Property Testing

(Update: Krzysztof pointed out the resources at sublinear.info, which we forgot to mention. It makes for sense for all of this information to be there. We’ve added links there, and would urge our readers to post more information there. Beats having it in half-baked blogposts!)

Despite the relative maturity of property testing (more than a decade old!), we still lack a good textbook on the subject. At least, I definitely feel the need when I’m talking to interested students. I end up pointing to a bunch of papers, all with their own notation. We now have numerous courses being taught, so that certainly helps. And Oded Goldreich just pointed out his notes. So here’s a summary of stuff that I have found. Please let us know if you have other sources, including your own courses!

Oded Goldreich’s lecture notes
Ronitt Rubinfeld’s collection of surveys
Sofya Raskhodnikova’s course notes at Penn State
Eric Price’s course notes at UT Austin
Grigory Yaroslavtsev’s crash course at the University of Buenos Aires
Rocco Servedio’s course notes at Columbia

News for November 2015

(Updating post with an additional paper that we missed in our first posting. Sorry! Feel free to email us at little.oh.of.n@gmail.com to inform us of papers we should mention.)

We’ve got exciting new in November! Optimal results for testing of monotone conjunctions, a new lower bound for monotonicity testing (yay!), and new lower bounds for Locally Testable Codes.

Tight Bounds for the Distribution-Free Testing of Monotone Conjunctions by Xi Chen and Jinyu Xie (arxiv). Consider functions \(f: \{0,1\}^n \rightarrow \{0,1\}\) (amen), and the basic property of being a monotone conjunction. Our notion of distance is with respect to a distribution \(\mathcal{D}\), so the distance between functions \(f\) and \(g\) is \(\mathop{Pr}_{x \sim \mathcal{D}} [f(x) \neq g(x)]\). In the distribution-free testing model, the tester does not know the distribution \(\mathcal{D}\), but is allowed samples from \(\mathcal{D}\). When \(\mathcal{D}\) is uniform, this coincides with standard property testing. There can be significant gaps between standard and distribution-free testing, evidenced by conjunctions. Parnas, Ron, and Samorodnitsky proved that monotone conjunctions can be tested in the vanilla setting in time independent of \(\mathcal{n}\), while Glasner-Servedio prove a \(\Omega(n^{1/5})\) lower bound for distribution-free testing. This paper provides a one-sided, adaptive distribution-free tester that makes \(\widetilde{O}(n^{1/3})\) queries, and they also prove a matching (up to polylog terms and \(\epsilon\)-dependencies) two-sided, adaptive lower bound. This is a significant improvement on the previous upper bound of \(\widetilde{O}(\sqrt{n})\) of Dolev-Ron, as well as over the previous lower bound. Furthermore, the results of this paper hold for general conjunctions.

A Polynomial Lower Bound for Testing Monotonicity by Aleksandrs Belovs and Eric Blais (arxiv). Surely the reader needs little introduction to testing monotonicity of Boolean functions, and a previous Open Problem post should help. A quick summary: we want to test monotonicity of \(f:\{0,1\}^n \rightarrow \{0,1\}\). The best upper bound is the \(\widetilde{O}(\sqrt{n})\) non-adaptive, one-sided tester of Khot et al. There is a matching, non-adaptive lower bound (up to polylog terms) by Chen et al, implying the best-known adaptive lower bound of \(\Omega(\log n)\). Can adaptivity help? This paper proves an adaptive lower bound of \(\Omega(n^{1/4})\). Exciting! The approach of Chen et al is to focus on monotonicity of linear threshold functions (technically, regular LTFS, where the weights are bounded). The authors show that this property can be tested in \(\mathop{poly}(\log n)\) time, shooting down hopes of extending the LTF approach for adaptive lower bounds. The key insight is to work with the distribution of Talagrand’s random DNF instead, which is the most noise sensitive DNF. (Talagrand strikes again. He helps the upper bound, he helps the lower bound.) Perturbations of this DNF lead to non-monotone functions, the paper proves that these distributions cannot be distinguished in \(o(n^{1/4})\) queries.

Lower bounds for constant query affine-invariant LCCs and LTCs by Arnab Bhattacharyya and Sivakanth Gopi (arxiv). Think of any code as a set/property of codewords in the domain \(\Sigma^N\). Such a code is an \(r\)-LTC if it has an \(r\)-query property tester. An \(r\)-LCC is has the property that, given any \(x \in \Sigma^n\) sufficiently close to a codeword, one can determine any coordinate of the codeword using \(r\)-queries. A fundamental question is to understand the length of LCCs and LTCS, or alternately (fixing \(\Sigma^N\)) determining the largest possible set of codewords. Existing constructions typically have much symmetry, either linear or affine invariance. (Check out Arnab Bhattacharyya’s survey on affine invariant properties for more details.) It is convenient to think of any codeword as a function \(f: [N] \rightarrow \Sigma\), and furthermore, think of \([N]\) as a vector space \(\mathbb{K}^n\). The best known construction of Guo, Kopparty, and Sudan yields (affine-invariant) LCCs of size \(\exp(\Theta(n^{r-1}))\) and LTCs of size \(\exp(\Theta(n^{r-2}))\) (many dependences on \(\mathbb{K}\), the rate, etc. are hidden in the big-Oh). This paper show that these bounds are actually the best achievable by any affine-invariant code. (Previous lower bounds of Ben-Sasson and Sudan only held for linear codes.) The intriguing and wide-open question is to prove such lower bounds without affine invariance.