News for May 2017

May brought us a wealth of new papers, including four (!) on distributed property testing.

Distributed Property Testing for Subgraph-Freeness Revisited, by Orr Fischer, Tzlil Gonen, and Rotem Oshman (arXiv). This paper studies subgraph-freeness testing in the CONGEST model of distributed computation. This problem was previously studied — this paper provides a number of new algorithms, which either improve upon the running time of prior work or are the first non-trivial results for these problems. Some cases they study include testing for freeness of \(k\)-cycles, constant-size trees, and \(k\)-cliques. For the latter case, they prove that a dependence on \(\varepsilon\) is necessary.

Faster and Simpler Distributed Algorithms for Testing and Correcting Graph Properties in the CONGEST-Model, by Guy Even, Reut Levi, and Moti Medina (arXiv). This paper also studies graph property testing in the CONGEST model. There appears to be some overlap in problems with the previous paper, including testing subgraph-freeness for cycles and constant size trees. As the title suggests, they also study graph property correction in the case of \(k\)-cycles, and other properties such as bipartiteness.

K-Monotonicity is Not Testable on the Hypercube, by Elena Grigorescu, Akash Kumar, and Karl Wimmer (arXiv, ECCC). Recent work introduced the concept of \(k\)-monotonicity, where a function may switch value between \(0\) to \(1\) at most \(k\)-times on any ascending chain. This generalizes the common notion of monotonicity, which corresponds to \(k = 1\). Since \(1\)-monotonicity on the hypercube is testable in \(O(\sqrt{n})\) queries, it was conjectured that \(k\)-monotonicity may be tested in \(poly(n^k)\) queries. This paper disproves this conjecture, showing that even \(2\)-monotonicity requires a number of queries which is exponential in \(\sqrt{n}\).

The coin problem for product tests, by Chin Ho Lee and Emanuele Viola (ECCC). The coin problem asks, if \(f\) is a product of \(k\) functions on disjoint inputs of length \(n\) bits, what is the smallest \(\varepsilon\) such that \(f\) can distinguish between an input of \(m = nk\) fair bits versus an input of \(m\) \(\varepsilon\)-biased bits. This paper proves tight bounds in a few cases of interest — when the range of \(f\) is \(\{0, 1\}\) or \(\{\pm 1\}\), the answer is \(\Theta(1/\sqrt{n\log k})\), while if the range is the set of unit-norm complex numbers, the answer is \(\Theta(1/\sqrt{nk})\).

Distributed Testing of Conductance, by Hendrik Fichtenberger and Yadu Vasudev (arXiv). Another paper on testing in the CONGEST model — this one studies the problem of testing conductance. In particular, they give an \(O(\log n)\) round two-sided tester which distinguishes between graphs with conductance at least \(\Phi\) and graphs which are \(\varepsilon\)-far from having conductance at least \(\Omega(\Phi)\). They also prove a lower bound of \(\Omega(\log n)\), even in the (easier) LOCAL model.

On The Multiparty Communication Complexity of Testing Triangle-Freeness, by Orr Fischer, Shay Gershtein, and Rotem Oshman (arXiv). The final paper in May on distributed graph property testing — in contrast to the other papers, this paper considers multiparty communication complexity in the coordinator model. A graph is divided up among \(k\) players, and each player can only communicate with a coordinator and not each other. The authors show upper bounds on the communication, in both the vanilla and the simultaneous model (which allows only \(1\) round of communication). They also show a near-matching lower bound for simultaneous protocols on graphs of constant degree.

Exponentially Small Soundness for the Direct Product Z-test, by Irit Dinur and Inbal Livni Navon (ECCC). This paper studies the problem of direct product testing, in which one tests whether the output of a function on a coordinate depends only on the input to that coordinate (approximately). The authors investigate the Z-test of Impagliazzo, Kabanets, and Wigderson, and show that it achieves the optimal soundness of \(\varepsilon \geq \exp(-O(k))\).

News for April 2017

April was a busy month for property testing. A mixture of distribution testing, junta testing, Fourier transforms, matrix problems, and graph testing. Let’s go!
(Update: thanks to Omri Ben Eliezer for correcting some errors in our post. Also, he pointed out that we missed posting about by a property testing paper by Gishboliner-Shapira that appeared in Nov 2016.)

Fourier-Based Testing for Families of Distributions, by Clement Canonne, Ilias Diakonikolas, and Alistair Stewart (ECCC). Readers of PTReview will be familiar with great progress made over the past few years in distribution testing. We wish to determine some property \(\mathcal{P}\) of a discrete distribution \(D\) of support size \(n\), using iid samples from the distribution. The challenge is that the number of samples should be \(o(n)\). This paper gives a general framework for any property \(\mathcal{P}\), such that all members (distributions) have sparse Fourier transforms. One of the main tools is a tester for determining if \(D\) has small Fourier support (and find the restriction to this support). This is a really neat unifying methods that subsumes much of the past work. Moreover, this method leads to new testers for a variety of classes of distributions, including that of multinomial distributions.

Testing from One Sample: Is the casino really using a riffle shuffle, by Constantinos Daskalakis, Nishanth Dikkala, and Nick Gravin (arXiv). This paper introduces a novel twist on distribution testing: suppose \(\mathcal{D}\) was generated through a Markov Chain, and we only get to see a sequence of samples generated through a random walk. Observe that we do not have the classic iid assumption any more. What can be said now? There is a challenge in defining “closeness” of distributions, which is done by looking at the variation distance between sequences generated through random walks of a carefully chosen length. The focus is on identity testing in two different regimes. First, with no assumptions on the Markov chain, there are general results that take \(O(n)\) queries (note that the Markov chain has size \(O(n^2)\)). Secondly, for special “sparse” Markov Chains such as various models of card shuffling, one can get the number of samples to be logarithmic in size of the state space. Thus, one effectively gets to see a single shuffle process (which is like a walk in a Markov Chain), and can still decide if it is generating a uniform distribution.

Settling the query complexity of non-adaptive junta testing, by Xi Chen, Rocco A. Servedio, Li-Yang Tan, Erik Waingarten, and Jinyu Xie (arXiv). Ah, progress on a classic open problem! A function \(f:\{0,1\}^n \to \{0,1\}\) is a \(k\)-junta if it only depends on \(k\) variables. After a long line of work, Blais provided a non-adaptive \(\widetilde{O}(k^{3/2})\) tester and in a separate result of Blais gave an adaptive (ignoring \(\epsilon\) factors) \(\widetilde{O}(k)\) tester. Previous to this paper, the best non-adaptive lower bound was essentially \(\Omega(k)\), ignoring polylog factors. This paper finally provides a matching non-adaptive lower bound of \(\Omega(k^{3/2})\), up to polylog factors! This basically settles the knowledge (adaptive and non-adaptive) of this problem, up to polylogs.

An Adaptive Sublinear-Time Block Sparse Fourier Transform, by Volkan Cevher, Michael Kapralov, Jonathan Scarlett, and Amir Zandieh (arXiv).
(We should have caught this when it was first published in Feb 2017, but at least we caught the update!) Consider a vector \(X\) whose Fourier transform we wish to compute. Suppose there are only \(k\) non-zero coefficients (typically, this is what we care about). We can actually compute these coefficients using \(O(k\log n)\) queries, which is strongly sublinear when \(k \ll n\). An important question is when the sample complexity can go below \(k\log n\), which is optimal in general. Often sparsity Fourier coefficients have structure that can be exploited. This paper studies block-sparsity, where all non-zero coefficients occur in \(k_0\) contiguous blocks (in Fourier space) of size \(k_1\). Thus, the total sparsity is \(k = k_0k_1\). This paper designs an algorithm with adaptive query complexity \(O(k_0k_1 + k_0\log n)\), thereby circumventing the \(k\log n\) barrier for \(k_0 \ll k\). Interestingly, the paper gives lower bounds showing that adaptivity is necessary for removing the \(\log n\) factor.

Sublinear Time Low-Rank Approximation of Positive Semidefinite Matrices, by Cameron Musco and David P. Woodruff (arXiv). Few problems occur as often as computing a low-rank approximation of a matrix \(A\). Formally, one wishes to compute a rank \(k\)-matrix \(B\) such that \(\|A-B\|_F\) is minimized. SVD is the obvious method to do this, but is computationally expensive. There are a number of sampling methods, but all of them require reading all of \(A\). This paper gives a truly sublinear-time algorithm (for small \(k\)) for PSD matrices. Formally, the algorithm presented here requires \(O(n \cdot \mathrm{poly}(k))\) running time, where \(n\) is the dimension size of \(A\). It is well-known from previous work that there exist a subset of \(k\) columns that form a good low-rank approximation. Yet finding them requires reading the whole matrix to compute various sampling strategies. The intuition here is that large entries cannot “hide” in arbitrary locations in a PSD matrix.

Testing hereditary properties of ordered graphs and matrices, by Noga Alon, Omri Ben-Eliezer, and Eldar Fischer (arXiv). Classic results by Alon-Shapira and Alon-Fischer-Newman-Shapira have shown that hereditary graph properties are testable in constant time. There is a deep connection between testing such properties and the existence of subgraph removal lemmas. The latter assert that, given some family \(\mathcal{F}\) of forbidden subgraphs, if an \(n\)-vertex graph is \(\epsilon\)-far from being \(\mathcal{F}\)-free, then it must contain \(\delta n^q\) copies of some \(F \in \mathcal{F}\) (where \(F\) has \(q\) vertices, \(\delta\) is a function only of \(\epsilon\)). This paper generalizes these theorems to general matrices over finite domains and ordered edge-colored graphs, where vertices in the graphs have a fixed ordering. Thus, one does not require the symmetry (over isomorphism) of graph properties for testability. These theorems lead to testability results for large classes of properties defined through forbidden substructures.

Removal Lemmas with Polynomial Bounds, by Lior Gishboliner and Asaf Shapira (arXiv). (Paper originally appeared in Nov 2016, but we missed it.) As mentioned above, there is a close connection between property testing of dense graphs and graph removal lemmas. Yet the property testers that result from these lemmas, like the celebrated triangle removal lemma, have terrible tower-like dependences on the parameter \(\epsilon\). When can we get polynomial dependences on \(\epsilon\) (which is called “easy testability”)? This fundamental question is answered in this paper. Consider the case when \(\mathcal{F}\) is finite. If \(\mathcal{F}\) contains a bipartite graph, the complement of a bipartite graph, and a split graph, then there are polynomially bounded graph removal lemmas (and hence easy property testers) for this family. Furthermore, one cannot drop any of these requirements! (A split graph is one where vertices can be partitioned into a set spanning a clique, and a set spanning an independent set.) There are generalizations to infinite families of graphs. An important corollary of this result is a proof of easy testability of semi-algebraic properties. This captures properties of graphs where vertices are points in Euclidean space and edges are present when the points satisfy some polynomial constraint (in the coordinates).

News for March 2017

March was a relatively good month for property testing, with 3 papers — including a foray in differential privacy.

Optimal Unateness Testers for Real-Valued Functions: Adaptivity Helps, by Roksana Baleshzar, Deeparnab Chakrabarty, Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, and C. Seshadhri (arXiv). Testing unateness (that is, whether a function is monotone after a rotation of the hypercube) has recently received significant attention. Understanding the power of adaptivity in property testing, on the other hand, has been a long-standing question, which has sparked much interest over the past decade. In this paper, the authors obtain optimal results for testing unateness of real-valued functions \(f\colon \{0,1\}^d\to \mathbb{R}\) and \(f\colon [n]^d\to \mathbb{R}\) (up to the dependence on the proximty parameter \(\varepsilon\)), and show that — unlike what happens for monotonicity testing of real-valued functions — adaptivity helps. Namely, adaptive testing of unatess of functions \(f\colon \{0,1\}^d\to \mathbb{R}\) has query complexity \(O(d/\varepsilon)\), but non-adaptive testers require \(\Omega(d\log d)\) queries.
This work is a merged, and significantly improved version of two papers we covered a few months back, by the same authors.

Switching from function to distribution testing, last month saw two independent samples appear online:

Near-Optimal Closeness Testing of Discrete Histogram Distributions, by Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin (arXiv). Closeness testing of discrete distributions is the question of testing, given sample access to two unknown distributions \(p,q\) over a known discrete domain \([n] \stackrel{\rm def}{=} \{1,\dots,n\}\), whether \(p=q\) or \(d_{\rm TV}(p,q)> \varepsilon\) (are the two distributions the same, or differ significantly in total variation distance?). This quite fundamental question is by now fully understood in the general case, where \(p,q\) are allowed to be arbitrary distributions over ; but what if one has some additional knowledge about them? Specifically, what if we were guaranteed that both distributions have some “nice structure” — e.g., that they are piecewise constant with \(k\) pieces (\(k\)-histograms)? Leveraging machinery the authors developed in a series of previous work (for the continuous case), the authors show that in this case, the sample complexity of “testing closeness under structural assumptions” is quite different than in the general case; and that the resulting (near tight) bound is a rather intricate tradeoff between the three parameters \(n,k,\varepsilon\).

Priv’IT: Private and Sample Efficient Identity Testing, by Bryan Cai, Constantinos Daskalakis, and Gautam Kamath (arXiv). Distribution testing is a vibrant field; differential privacy (DP) is an incredibly active and topical area. What about differentially private distribution testing — a.k.a. testing the data without learning too much about any single sample? In this paper, the authors address the task of performing identity testing of distributions (“given samples from an unknown arbitrary distribution, decide whether it matches a fixed known probability distribution/model”) in the DP setting, focusing on getting the best of both worlds: how to guarantee privacy without sacrificing efficiency. Not only do they obtain asymptotically near-optimal bounds in both the testing and differential privacy parameters — they also run some experiments to validate their approach empirically. (Plus, the title is rather cute.)

Usual disclaimer: if we forgot or misrepresented your paper, please signal it in the comments — we’ll address it as quickly as we can.

News for February 2017

This February, we had four exciting testing papers, covering domains from Boolean functions to distributions to graphs. Read on to catch up on the latest results!

Property Testing of Joint Distributions using Conditional Samples, by Rishiraj Bhattacharyya and Sourav Chakraborty (arXiv). This paper focuses on the problems of identity testing and independence testing for multivariate distributions in the conditional sampling model. Multivariate distribution testing has not previously been considered in the conditional sampling model. While previous algorithms for identity testing would generally carry over to this setting, the authors restrict themselves to algorithms where the queries are on subcubes of the domain. Interestingly, the authors show that the complexity of these problems is polynomial in the dimension (and that a polynomial dependence is necessary). That is, despite the “simple” structure of the queries, they are still powerful enough to avoid the curse of dimensionality which is present in vanilla multivariate distribution testing.

An Improved Dictatorship Test with Perfect Completeness, by Amey Bhangale, Subhash Khot, and Devanathan Thiruvenkatachari (arXiv, ECCC). A function is a dictator if it depends on exactly one variable. This paper provides an improved tester with perfect completeness for this property. In particular, they give a (non-adaptive) \(k\)-query test which never rejects a dictator function, and accepts a far-from-dictator function with probability at most \(\frac{2k +1}{2^k}\). This improves upon the previous best result, which accepts a far-from-dictator function with probability at most \(\frac{2k+3}{2^k}\).

An Adaptivity Hierarchy Theorem for Property Testing, by Clément L. Canonne and Tom Gur (arXiv, ECCC). This paper investigates the power of adaptivity in property testing. We most frequently think of algorithms as either adaptive or non-adaptive. In the former, an algorithm may specify each query after viewing the results of all previous queries, while in the latter, all the queries must be specified in advance. The authors focus on the natural interpolation, in which the algorithm is allowed to make queries in \(k\) rounds, where the queries in a round may depend on the result of queries in all previous rounds. They show that the power of an algorithm can shift dramatically with just one more round of adaptivity — there exist properties which can be tested with \(\tilde O(k)\) queries with \(k\) rounds of adaptivity, but require \(\Omega(n)\) queries with \(k-1\) rounds of adaptivity.

Beyond Talagrand Functions: New Lower Bounds for Testing Monotonicity and Unateness, by Xi Chen, Erik Waingarten, and Jinyu Xie (arXiv). The main result in this paper is an improved lower bound for the well-studied problem of testing monotonicity of Boolean functions. The authors show that any two-sided and adaptive algorithm requires \(\tilde \Omega(n^{1/3})\) samples to test whether a function is monotone or far from monotone. This improves upon the previous result of \(\tilde \Omega(n^{1/4})\) by Belovs and Blais, which was the first to show that adaptive monotonicity testing requires polynomially-many queries. The construction of Belovs and Blais uses Talagrand’s random DNFs, for which they also showed \(\tilde O(n^{1/4})\) queries are sufficient. This paper analyzes an extension of this family, denoted as two-level Talagrand functions. The result leaves open the tantalizing question: does adaptivity help with monotonicity testing? Or does the complexity remain \(\tilde \Theta(\sqrt{n})\)? The authors also prove lower bounds for the problem of testing unateness, a problem of recent interest and a natural generalization of monotonicity.

News for January 2017

2017 starts off rather slow for property testing. Though, we have an intriguing paper to report – an experimental analysis of a classic sublinear algorithm.

Evaluating a Sublinear-time Algorithm for the Minimum Spanning Tree Weight Problem, by Gabriele Santi and Leonardo De Laurentiis (arXiv). The Chazelle-Rubinfeld-Trevisan Minimum Spanning Tree algorithm is a classic in sublinear algorithms. This algorithms provides a \((1+\varepsilon)\) approximation to the MST in time independent of the number of vertices (although it does depend on the average degree). But how this compare with Prim’s algorithm on real instances, in a real (not theoretical) computer? This intriguing paper does a detailed experimental comparison. Having done experimental graph algorithms myself, I can attest to the various challenges: how to choose a test set of graphs? How to set error parameters? Can data structure optimization on the coding side beat asymptotic improvements? This paper does a series of experiments on synthetic graph generators (such as Erdős-Rényi, Barabási-Albert, Watts-Strogatz models). They do validate the basic CRT algorithm at scale, by showing that it is faster than Prim for graphs with more than a million edges. Their experiments suggest that the sublinear-time algorithm gives little benefits when \(\varepsilon \leq 0.2\). The paper has many experiments for a variety of settings, and the authors do a comprehensive study of the various parameters. I’d definitely recommend to anyone interested in exploring how property testing might influence algorithms in the real world.

News for December 2016

December was indeed a merry month for property testing, with seven new papers appearing online.* Let’s hope the trend continues!

Cube vs. Cube Low Degree Test, by Amey Bhangale, Irit Dinur, and Inbal Livni Navon (ECCC). This work provides a new and improved analysis of the “low-degree test” of Raz and Safra, tightening the dependence on the alphabet size of the soundness parameter. Specifically, the goal is as follows: given query access to the encoding of a function \(f\colon \mathbb{F}^m\to \mathbb{F}\) as a “cube table,” decide whether \(f\) is a low-degree polynomial (i.e., of degree at most \(d\)). With direct applications to PCP theorems, this question has a long history; here, the authors focus on a very simple and natural test, introduced by Raz–Safra and Arora–Sudan. In particular, they improve the soundness guarantee, which previously required the error to be at least \(\textrm{poly}(d)/\mathbb{F}^{1/8}\), to obtain a dependence on the field size which is only \(\mathbb{F}^{-1/2}\).

Robust Multiplication-based Tests for Reed-Muller Codes, by Prahladh Harsha and Srikanth Srinivasan (arXiv). Given a function \(f\colon \mathbb{F}_q^n\to\mathbb{F}_q\) purported to be a degree-\(d\) polynomial, deciding whether this is indeed the case is a question with applications to hardness of approximation and — as the title strongly hints— to testing of Reed—Muller codes. Here, the authors generalize and improve on a test originally due to Dinur and Guruswami, improving on the soundness: that is, they show a robust version of the soundness guarantee of this multiplication test, answering a question left open by Dinur and Guruswami.

A Note on Testing Intersection of Convex Sets in Sublinear Time, by Israela Solomon (arXiv). In this note, the author addresses the following testing question: given \(n\) convex sets in \(\mathbb{R}^d\), distinguish between the case where (i) the intersection is non-empty, and (ii) even after removing any \(\varepsilon n\) sets, the intersection is still empty. Through the use of a generalization of Helly’s Theorem due Katchalski and Liu (1979), the author then provides and analyzes an algorithm for this question, with query complexity \(O(\log 1/(d\varepsilon))\).

A Characterization of Constant-Sample Testable Properties, by Eric Blais and Yuichi Yoshida (arXiv). A lot of work in property testing has been to understand which properties can be locally tested, i.e. admit testers with constant query complexity. Here, the authors tackle — and answer — the variant of this question in the sample-based testing model, that is restricting the ability of the algorithm to only obtain the value of the function on independently and uniformly distributed locations. Namely, they provide a full characterization of the properties of Boolean-valued function which are testable with constant sample complexity, resolving the natural question of whether most properties are easily tested from samples only. As it turns out, only those properties which are (essentially) \(O(1)\)-part-symmetric have this nice testability — where a property \(\mathcal{P}\) is \(k\)-part-symmetric if one can divide the input variables in \(k\) blocks, such that membership to \(\mathcal{P}\) is invariant by permutations inside each block.

The last three are all works on distribution testing, and all concerned with the high-dimensional case. Indeed, most of the distribution testing literature so far has been focusing in probability distributions over arbitrary discrete sets or the one-dimension ordered line; however, when trying to use these results in the (arbitrary) high-dimensional case on is subjected to the curse of dimensionality. Can one leverage (presumed) structure to reduce the cost, and obtain computationally and sample-efficient testers?

Testing Ising Models, by Costis Daskalakis, Nishanth Dikkala, and Gautam Kamath (arXiv). In this paper, the authors take on the above question in the context of Markov Random Fields, and more specifically in the Ising model. Modeling the (unknown) high-dimensional distributions as Ising models with some promise on the parameters, they tackle the two questions of identity and independence testing; respectively, “is the unknown Ising model equal to a fixed, known reference distribution?” and “is the high-dimensional, a priori complex distribution a product distribution?”. They obtain both testing algorithms and lower bounds for these two problems, where the soundness guarantee sought is in terms of the symmetrized Kullback—Leibler divergence (a notion of distance more stringent that the statistical distance).

Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing, by Costis Daskalakis and Qinxuan Pan (arXiv). Another natural model to study structured high-dimensional distribution is that of Bayesian networks, that is directed graphs providing a succinct description of the dependencies between coordinates. This paper studies the closeness testing problem (testing if two unknown distributions are equal or far, with regard to the usual statistical distance) for Bayesian networks, parameterized by the dimension, alphabet size, and (promise on the) maximum in-degree of the unknown Bayes net. At its core is a new inequality relating the (squared) Hellinger distance of two Bayesian networks to the sum of the (squared) Hellinger distances between their marginals.

Testing Bayesian Networks, by Clément Canonne, Ilias Diakonikolas, Daniel Kane, and Alistair Stewart (arXiv). Tackling the testing of Bayesian networks as well, this second paper considers two of the most standard testing questions — identity (one-unknown) and closeness (two-unknown) testing — under various assumptions, in order to pinpoint the exact sample complexity in each case. Specifically the goal is to see when (and under which natural restrictions) does testing become easier than learning for Bayesian networks, focusing on the dimension and maximum in-degree as parameters.

* As usual, if we forgot one or you find imprecisions in our review of a paper, please let us now in the comments below.

News for November 2016

November was quite eventful for property testing, with six exciting new results for you to peruse.

Alice and Bob Show Distribution Testing Lower Bounds (They don’t talk to each other anymore.), by Eric Blais, Clément L. Canonne, and Tom Gur (ECCC). The authors examine distribution testing lower bounds through the lens of communication complexity, a-la Blais, Brody, and Matulef, who previously showed such a connection for property testing lower bounds in the Boolean function setting. In this work, the authors’ main result involves testing identity to a specific distribution \(p\). While Valiant and Valiant showed tight bounds involving the \(\ell_{2/3}\)-quasinorm of \(p\), this paper gives tight bounds using a different quantity, namely Peetre’s \(K\)-functional. Their techniques also give lower bounds for several other properties (some old and some new), including monotonicity, sparse symmetric support, and \(k\)-juntas in the PAIRCOND model.

Fast property testing and metrics for permutations, by Jacob Fox and Fan Wei (arXiv). This paper proves a general testing result for permutations. In particular, it shows that any hereditary property of permutations is two-sided testable with respect to the rectangular distance with a constant number of queries. While in many such testing results on combinatorial objects (such as graphs), a “constant number of queries” may be exorbitantly large (due to complexities arising from an application of the strong regularity lemma), surprisingly, the complexity obtained in this paper is polynomial in \(1/\varepsilon\).

A Unified Maximum Likelihood Approach for Optimal Distribution Property Estimation, by Jayadev Acharya, Hirakendu Das, Alon Orlitsky, and Ananda Theertha Suresh (arXiv, ECCC). There has been a considerable deal of work recently on estimating several symmetric distribution properties, namely support size, support coverage, entropy, and distance to uniformity. One drawback of these results is that, despite the similarities between these properties, seemingly different techniques are required to obtain optimal rates for each. This paper shows that one concept, pattern maximum likelihood (PML), unifies them all. A PML distribution of a multiset of samples is any distribution which maximizes the likelihood of observing the multiplicities of the multiset, after discarding the labels on the elements. This can behave quite differently from the sequence maximum likelihood (SML), or empirical distribution. In particular, if a multiset of samples on support \(\{x, y\}\) is \(\{x, y, x\}\), then the SML is \((2/3, 1/3)\), while the PML is \((1/2, 1/2)\). The main result of this paper is, if one can approximate PML, then applying the plug-in estimator gives the optimal sample complexity for all of the aforementioned properties. The one catch is that efficient approximation of the PML is currently open. Consider the gauntlet thrown to all our readers!

Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures, by Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart (arXiv, ECCC). While the main focus of this work is on lower bounds for distribution estimation in the statistical query (SQ) model, this paper also has some interesting lower bounds for multivariate testing problems. Namely, they show that it is impossible to achieve a sample complexity which is significantly sublinear in the dimension for either of the following two problems:

  • Given samples from an \(n\)-dimensional distribution \(D\), distinguish whether \(D = \mathcal{N}(0,I)\) or \(D\) is \(\varepsilon/100\)-close to any \(\mathcal{N}(\mu, I)\) where \(\|\mu\|_2 \geq \varepsilon\).
  • Given samples from an \(n\)-dimensional distribution \(D\), distinguish whether \(D = \mathcal{N}(0,I)\) or \(D\) is a mixture of \(k\) Gaussians with almost non-overlapping components.

Collision-based Testers are Optimal for Uniformity and Closeness, by Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price (arXiv, ECCC). In the TCS community, the seminal results in the field of distribution testing include the papers of Goldreich and Ron and Batu et al., which study uniformity testing and \(\ell_2\)-closeness testing (respectively) using collision based testers. While these testers appeared to be lossy, subsequent results have attained tight upper and lower bounds for these problems. As suggested by the title, this paper shows that collision-based testers actually achieve the optimal sample complexities for uniformity and \(\ell_2\)-closeness testing.

Testing submodularity and other properties of valuation functions, by Eric Blais, and Abhinav Bommireddi (arXiv). This paper studies the query complexity of several properties which have been studied in the context of valuation functions in algorithmic game theory. These properties are real-valued functions over the Boolean hypercube, and include submodularity, additivity, unit-demand, and much more. The authors show that, for constant \(\varepsilon\) and any \(p \geq 1\), these properties are constant-sample testable. Their results are obtained via an extension of the testing by implicit learning method of Diakonikolas et al.

News for October 2016

Alas, a rather dry month for property testing. We did find one quantum computing result based on the classic linearity testing theorem.

Robust Self-Testing of Many-Qubit States, by Anand Natarajan and Thomas Vidick (arXiv). (Frankly, my understanding of quantum computation is poor, and this summary may reflect that. Then again, some searching on Google and Wikipedia have definitely broadened my horizons.) One of the key concepts in quantum computation is the notion of entanglement. This allows for correlations between (qu)bits of information beyond what can be classically achieved. Given some device with supposed quantum properties (such as sets of entangled bits), is there a way of verification by measuring various outcomes of the device? This is referred to as self-testing of quantum states. This paper proves such a result for a set of \(n\) EPR pairs, which one can think of \(n\) pairs of “entangled” qubits. The interest to us property testers is the application of a quantum version of the seminar Blum, Luby, Rubinfeld linearity test.

News for September 2016

September has just concluded, and with it the rise of Fall in property testing: no less than 5 papers* this month.

Removal Lemmas for Matrices, by Noga Alon and Omri Ben-Eliezer (arXiv). The graph removal lemma, and its many variants and generalizations, is a cornerstone in graph property testing (among others), and is a fundamental ingredient in many graph testing results. In its simplest form, it states that  for any fixed \(h\)-vertex pattern \(H\) and \(\varepsilon > 0\),  there is some constant \(\delta=\delta(\varepsilon)\) such that any graph \(G\) on \(n\) vertices containing at most \(\delta n^h\) copies of \(H\) can be “fixed” (made \(H\)-free) by removing at most \(\varepsilon n^2\) edges.
The authors introduce and establish several analogues of this result, in the context of matrices. These matrix removal lemmas have direct implications in property testing (discussed in Section 1.2); for instance, they imply constant-query testing of \(F\)-freeness of (binary) matrices, where \(F\) is any family of binary matrices closed under row permutations.

Improving and extending the testing of distributions for shape-restricted properties, by Eldar Fischer, Oded Lachish, and Yadu Vasudev (arXiv). Testing membership to a class (family) is a very natural question in distribution testing, and one which has received significant attention lately. In this paper, the authors build on, improve, and generalize the techniques of Canonne, Diakonikolas, Gouleakis, and Rubinfeld (2016) to obtain a generic, one-size-fits-all algorithm for this question. They also consider the same question in the “conditional oracle” setting, for which their ideas yield significantly more sample-efficient algorithms. At the core of their approach is a better and simpler learning procedure for families of distributions that enjoy some decomposability property.

The Bradley―Terry condition is \(L_1\)–testable, by Agelos Georgakopoulos and  Konstantinos Tyros (arXiv). An incursion in probability models: say that the directed weighted graph \(((V,E),w)\) on \(n\) vertices, with \(w\colon E_n\to[0,1]\) such that \(w(x,y)=w(y,x)\) for all \((x,y)\in V^2\), satisfies the Bradley—Terry condition if there exists an assignment of probabilities \((a_x)_{x\in V}\) such that

\(\qquad\displaystyle \frac{w(x,y)}{w(y,x)} = \frac{a_x}{a_y}\)

for all \((x,y)\in V^2\). This condition captures whether such a graph corresponds to a Bradley—Terry tournament.
This paper considers this characterization from a property testing point of view: that is, it addresses the task of testing, in the \(L_1\)-sense of Berman, Raskhodnikova, and Yaroslavtsev (2012), if a \(((V,E),w)\) satisfies the Bradley—Terry condition (or is far from any such graph).

On SZK and PP, by Adam Bouland, Lijie Chen, Dhiraj Holden, Justin Thaler, and Prashant Nalini Vasudevan (arXiv, ECCC). Another (unexpected) connection! This work proves query and communication complexity separations between two complexity classes, \(\textsf{NISZK}\) (non-interactive statistical zero knowledge) and \(\textsf{UPP}\) (a generalization of \(\textsf{PP}\) with unbounded coin flips, and thus arbitrary gap in the error probability). While I probably cannot even begin to list all that goes over my head in this paper, the authors show in Section 2.3.2 some implications of their results for property testing, for instance in distribution testing. In more detail, their results yield some lower bounds on the sample complexity of (tolerant) property testers with success probability arbitrarily close to \(1/2\) (as opposed to the “usual” setting of \(1/2+\Omega(1)\)).

Quantum conditional query complexity, by Imdad S. B. Sardharwalla, Sergii Strelchuk, and Richard Jozsa (arXiv). Finally, our last paper of the list deals with both quantum algorithms and distribution testing, by introducing a new distribution testing model in a quantum setting, the quantum conditional oracle. This model, the quantum analogue of the conditional query oracle of Chakraborty et al. and Canonne et al., allows the (quantum) testing algorithm to get samples from the probability distribution to test, conditioning on chosen subsets of the domain.
In this setting, they obtain speedups over both (i)  quantum “standard” oracle and (ii)  “classical” conditional query testing for many distribution testing problems; and also consider applications to quantum testing of Boolean functions, namely balancedness.

* As usual, in case we missed or misrepresented any, please signal it in the comments.

News for August 2016

This summer is a very exciting one for property testing, with seven (!) papers uploaded during the month of August. In particular, there have been a number of results on testing unateness.

An \(\tilde O(n)\) Queries Adaptive Tester for Unateness, by Subhash Khot and Igor Shinkar (arXivECCC). A Boolean function is unate if, for each \(i \in [n]\), it is monotone non-increasing or monotone non-decreasing in coordinate \(i\). This is a generalization of monotonicity, one of the most studied problems in the property testing literature. This paper gives an adaptive algorithm for testing if a Boolean function is unate. The algorithm requires \(\tilde O(n)\) queries, improving on the \(O(n^{1.5})\) query tester of Goldreich et al. from 2000.

A \(\tilde O(n)\) Non-Adaptive Tester for Unateness, by Deeparnab Chakrabarty and C. Seshadhri (arXivECCC). The unateness testing continues! This paper also gives a \(\tilde O(n)\) algorithm for this problem, but improves upon the previous work by making non-adaptive queries. The algorithm and analysis are very clean and elegant — with a theorem by Chakrabarty et al. in place, they take up just a single page!

Testing Unateness of Real-Valued Functions, by Roksana Baleshzar, Meiram Murzabulatov, Ramesh Krishnan S. Pallavoor, and Sofya Raskhodnikova (arXiv). The final paper this month on testing unateness, this work studies real-valued functions on the hypergrid \([n]^d\) (as opposed to Boolean functions on the hypercube). They give an adaptive algorithm requiring \(O\left(\frac{d \log (\max (d,n))}{\varepsilon}\right)\) queries, generalizing the above result by Khot and Shinkar, which works only for Boolean functions on the hypercube and requires \(O\left(\frac{d \log d}{\varepsilon}\right)\) queries. Additionally, they prove lower bounds for this setting (of \(\Omega(\min(d, |R|^2))\), where \(R\) is the range of the function) and the Boolean hypercube setting (of \(\Omega(\sqrt{d}/\varepsilon)\)). The latter lower bound leaves open a tantalizing question: does a \(O(\sqrt{d})\) query algorithm exist? In other words, is testing unateness no harder than testing monotonicity?

Testing k-Monotonicity, by Clément L. Canonne, Elena Grigorescu, Siyao Guo, Akash Kumar, Karl Wimmer (arXivECCC). And now, a different generalization of monotonicity! This paper studies testing \(k\)-monotonicity, where a Boolean function over a poset \(\mathcal{D}\) is \(k\)-monotone if it alternates between the values \(0\) and \(1\) at most \(k\) times on any ascending chain in \(\mathcal{D}\). Classical monotonicity is then \(1\)-monotone in this definition. On the hypercube domain, the authors prove a separation between testing monotonicity and testing \(k\)-monotonicity and a separation between testing and learning. They also present a tolerant test for \(k\)-monotonicity over the hypergrid \([n]^d\) with complexity independent of \(n\).

The Dictionary Testing Problem, by Siddharth Barman, Arnab Bhattacharyya, and Suprovat Ghoshal (arXiv). The dictionary learning problem has enjoyed extensive study in computer science. In this problem, given a matrix \(Y\), one wishes to construct a decomposition \(Y = AX\) such that each column of \(X\) is \(k\)-sparse. In the settings typically studied, such a decomposition is guaranteed to exist. This paper initiates the dictionary testing problem, in which one wishes to test whether \(Y\) admits such a decomposition, or is far from it. The authors provide a very elegant characterization — it admits a decomposition iff the columns of \(Y\) have a sufficiently narrow Gaussian width.

The Sparse Awakens: Streaming Algorithms for Matching Size Estimation in Sparse Graphs, by Graham Cormode, Hossein Jowhari, Morteza Monemizadeh, and S. Muthukrishnan (arXiv). This paper provides streaming algorithms for estimating the size of the maximum matching in graphs with some underlying sparsity. They give a one-pass algorithm for insert-only and dynamic streams with bounded arboricity, and a two-pass algorithm for random order streams with bounded degree. Interestingly, for the latter result, they use local exploration techniques, which have seen previous use in the property testing literature.

Testing Assignments to Constraint Satisfaction Problems, by Hubie Chen, Matt Valeriote, and Yuichi Yoshida (arXiv). Given a CSP from a finite relational structure \(A\) and query access to an assignment, is the CSP satisfied? This paper shows a dichotomy theorem which characterizes which \(A\) admit a constant-query test. They go on to show a trichotomy theorem for when instances may include existentially quantified variables, classifying which \(A\) admit a constant-query test, which admit only a sublinear-query test, and which require a linear number of queries.

Minimizing Quadratic Functions in Constant Time, by Kohei Hayashi and Yuichi Yoshida (arXiv). This paper investigates the problem of (approximately) minimizing quadratic functions in high-dimensions. Prior approaches to this problem scale poorly with the dimension — at least linearly, if not worse. This work uses a sampling-based method to avoid paying this cost: the resulting time complexity is completely independent of the dimension!