News for February 2018

February had a flurry of conference deadlines, and they seem to have produced six papers for us to enjoy, including three on estimating symmetric properties of distributions.

Locally Private Hypothesis Testing, by Or Sheffet (arXiv). We now have a very mature understanding of the sample complexity of distributional identity testing — given samples from a distribution \(p\), is it equal to, or far from, some model hypothesis \(q\)? Recently,  several papers have studied this problem under the additional constraint of differential privacy. This paper strengthens the privacy constraint to local privacy, where each sample is locally noised before being provided to the testing algorithm.

Distribution-free Junta Testing, by Xi Chen, Zhengyang Liu, Rocco A. Servedio, Ying Sheng, and Jinyu Xie (arXiv). Testing whether a function is a \(k\)-junta is very well understood — when done with respect to the uniform distribution. In particular, the adaptive complexity of this problem is \(\tilde \Theta(k)\), while the non-adaptive complexity is \(\tilde \Theta(k^{3/2})\). This paper studies the more challenging task of distribution-free testing, where the distance between functions is measured with respect to some unknown distribution. The authors show that, while the adaptive complexity of this problem is still polynomial (at \(\tilde O(k^2)\)), the non-adaptive complexity becomes exponential: \(2^{\Omega(k/3)}\). In other words, there’s a qualitative gap between the adaptive and non-adaptive complexity, which does not appear when testing with respect to the uniform distribution.

The Vertex Sample Complexity of Free Energy is Polynomial, by Vishesh Jain, Frederic Koehler, and Elchanan Mossel (arXiv). This paper studies the classic question of estimating (the logarithm of) the partition function of a Markov Random Field, a highly-studied topic in theoretical computer science and statistical physics. As the title suggests, the authors show that the vertex sample complexity of this quantity is polynomial. In other words, randomly subsampling a \(\mathrm{poly}(1/\varepsilon)\)-size graph and computing its free energy gives a good approximation to the free energy of the overall graph. This is in contrast to more general graph properties, for the vertex sample complexity is super-exponential in \(1/\varepsilon\).

Entropy Rate Estimation for Markov Chains with Large State Space, by Yanjun Han, Jiantao Jiao, Chuan-Zheng Lee, Tsachy Weissman, Yihong Wu, and Tiancheng Yu (arXiv). Entropy estimation is now quite well-understood when one observes independent samples from a discrete distribution — we can get by with a barely-sublinear sample complexity, saving a logarithmic factor compared to the support size. This paper shows that these savings can also be enjoyed in the case where we observe a sample path of observations from a Markov chain.

Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance, by Yanjun Han, Jiantao Jiao, and Tsachy Weissman (arXiv). Speaking more generally of the above problem: there has been significant work into estimating symmetric properties of distributions, i.e., those which do not change when the distribution is permuted. One natural method for estimating such properties is to estimate the sorted distribution, then apply the plug-in estimator for the quantity of interest. The authors give an improved estimator for the sorted distribution, improving on the results of Valiant and Valiant.

INSPECTRE: Privately Estimating the Unseen, by Jayadev Acharya, Gautam Kamath, Ziteng Sun, and Huanyu Zhang (arXiv). One final work in this area — this paper studies the estimation of symmetric distribution properties (including entropy, support size, and support coverage), but this time while maintaining differentially privacy of the sample. By using estimators for these tasks with low sensitivity, one can additionally obtain privacy for a low or no additional cost over the non-private sample complexity.

News for January 2018

And now, for the first papers of 2018! It’s a slow start with only four papers (or technical three “standard property testing” papers, and one non-standard paper).

Adaptive Boolean Monotonicity Testing in Total Influence Time, by Deeparnab Chakrabarty and C. Seshadhri (arXiv ECCC). The problem of testing monotonicity of Boolean functions \(f:\{0,1\}^n \to \{0,1\}\) has seen a lot of progress recently. After the breakthrough results of Khot-Minzer-Safra giving a \(\widetilde{O}(\sqrt{n})\) non-adaptive tester, Blais-Belovs proved the first polynomial lower bound for adaptive testers, recently improved to \(O(n^{1/3})\) by Chen, Waingarten, and Xi. The burning question: does adaptivity help? This result shows gives an adaptive tester that runs in \(O(\mathbf{I}(f))\), the total influence of \(f\). Thus, we can beat these lower bounds (and the non-adaptive complexity) for low influence functions.

Adaptive Lower Bound for Testing Monotonicity on the Line, by Aleksandrs Belovs (arXiv). More monotonicity testing! But this time on functions \(f:[n] \to [r]\). Classic results on property testing show that monotonicity can be tested in \(O(\varepsilon^{-1}\log n)\) time. A recent extension of these ideas by Pallavoor-Raskhodnikova-Varma replace the \(\log n\) with \(\log r\), an improvement for small ranges. This paper proves an almost matching lower bound of \((\log r)/(\log\log r)\). The main construction can be used to give a substantially simpler proof of an \(\Omega(d\log n)\) lower bound for monotonicity testing on hypergrids \(f:[n]^d \to \mathbb{N}\). The primary contribution is giving explicit lower bound constructions and avoiding Ramsey-theoretical arguments previously used for monotonicity lower bounds.

Earthmover Resilience and Testing in Ordered Structures, by Omri Ben-Eliezer and Eldar Fischer (arXiv). While there has been much progress on understanding the constant-time testability of graphs, the picture is not so clear for ordered structures (such as strings/matrices). There are a number of roadblocks (unlike the graph setting): there are no canonical testers for, say, string properties, there are testable properties that are not tolerant testable, and Szemeredi-type regular partitions may not exist for such properties. The main contribution of this paper is to find a natural, useful condition on ordered properties such that the above roadblocks disappear hold, and thus we have strong testability results. The paper introduces the notion of Earthmover Resilient properties (ER). Basically, a graph properties is a property of symmetric matrices that is invariant under permutation of base elements (rows/columns). An ER property is one that is invariant under mild perturbations of the base elements. The natural special cases of ER properties are those over strings and matrices, and it is includes all graph properties as well as image properties studied in this context. There are a number of characterization results. Most interestingly, for ER properties of images (binary matrices) and edge-colored ordered graphs, the following are equivalent: existence of canonical testers, tolerant testability, and regular reducibility.

Nondeterminisic Sublinear Time Has Measure 0 in P, by John Hitchcock and Adewale Sekoni (arXiv). Not your usual property testing paper, but on sublinear (non-deterministic) time nonetheless. Consider the complexity class of \(NTIME(n^\delta)\), for \(\delta < 1\). This paper shows that this complexity class is a "negligible" fraction of \(P\). (The analogous result was known for \(\alpha < 1/11\) by Cai-Sivakumar-Strauss.) This requires a technical concept of measure for languages and complexity classes. While I don’t claim to understand the details, the math boils down to understanding the following process. Consider some language \(\mathcal{L}\) and a martingale betting process that repeatedly tries to guess the membership of strings \(x_1, x_2, \ldots\) in a well-defined order. If one can define such a betting process with a limited computational resource that has unbounded gains, then \(\mathcal{L}\) has measure 0 with respect to that (limited) resource.

News for December 2017

December 2017 concluded the year in style, with seven property testing papers spanning quite the range. Let’s hope 2018 will keep up on that trend! (And, of course, if we missed any, please point it out in the comments.The blame would be on our selves from the past year.)

We begin with graphs:

High Dimensional Expanders, by Alexander Lubotzky (arXiv). This paper surveys the recent developments in studying high dimensional expander graphs, a recent generalization of expanders which has become quite active in the past years and has intimate connections to property testing.

Generalized Turán problems for even cycles, by Dániel Gerbner, Ervin Győri, Abhishek Methuku, and Máté Vizer (arXiv).
A Generalized Turán Problem and its Applications, by Lior Gishboliner and Asaf Shapira (arXiv).
In these two independent works, the authors study questions of a following flavor: two subgraphs (patterns) \(H,H’\), what is the maximum number of copies of \(H\) which can exist in a graph \(G\) promised to be \(H’\)-free? They consider the case where the said patterns are cycles on \(\ell,k\) vertices respectively, and obtain asymptotic bounds on the above quantity (the two papers obtain somewhat incomparable bounds, and the first focuses on the case where both \(\ell,k\) are even). These estimates, in turn, have applications to graph removal lemmata, as discussed in the second work (Section 1.2): specifically, it implies the existence of a removal lemma with a
tight super-polynomial bound, a question which was previously open.

Approximating the Spectrum of a Graph, by David Cohen-Steiner, Weihao Kong, Christian Sohler, and Gregory Valiant (arXiv). The authors obtain constant-time and query algorithm for the task of approximating (in \(\ell_1\) norm) the spectrum of a graph \(G\), i.e. the eigenvalues of its Laplacian, given random query access to the nodes of \(G\) and to the neighbors of any given node. They also study the applications of this result to property testing in the bounded-degree model, showing that a large class of spectral properties of high-girth graphs is testable.

Then, we go quantum:

Schur-Weyl Duality for the Clifford Group with Applications: Property Testing, a Robust Hudson Theorem, and de Finetti Representations, by David Gross, Sepehr Nezami, and Michael Walter (arXiv).
Introducing and studying a duality theory for the Clifford group, the authors are able (among other results) to resolve an open question in quantum property testing, establishing a constant-query tester (indeed, making 6 queries) for testing whether an unknown quantum state is a stabilizer state. The previous best upper bound was linear in the number of qubits, as it proceeded by learning the state (“testing by learning”).

Quantum Lower Bound for a Tripartite Version of the Hidden Shift Problem, by Aleksandrs Belovs (arXiv). This work introduces and studies a generalization of (both) the hidden shift and 3-sum problems, and shows an \(\Omega(n^{1/3})\) lower bound on its quantum query complexity. The author also considers a property testing version of the problem, for which he proves a similar lower bound—interestingly, this polynomial lower bound is shown using the adversary method, evading the “property testing barrier” which states that (a restricted version of) this method cannot yield better than a constant-query lower bound.

And to conclude, a distribution testing paper:

Approximate Profile Maximum Likelihood, by Dmitri S. Pavlichin, Jiantao Jiao, and Tsachy Weissman (arXiv) This paper proposes an efficient (linear-time) algorithm to approximate the profile maximum likelihood of a sequence of i.i.d. samples from an unknown distribution, i.e. the probability distribution which, ignoring the labels of the samples and keeping only the collision counts, maximizes the likelihood of the sequence observed. This provides a candidate solution to an open problem suggested by Orlitsky in a FOCS’17 workshop (see also Open problem 84), and one which would have direct implications to tolerant testing and estimation of symmetric properties of distributions.

News for November 2017

A quiet month, with only two papers. Perhaps the calm before the storm? Please let us know in the comments if something slipped under our radar.

Agreement tests on graphs and hypergraphs, by Irit Dinur, Yuval Filmus, and Prahladh Harsha (ECCC). This work looks at agreement tests and agreement theorems, which argue that if one checks if a number of local functions agree, then there exists a global function which agrees with most of them. This work extends previous work on direct product testing to local functions of higher degree, which corresponds to agreement tests on graphs and hypergraphs.

Testing Conditional Independence of Discrete Distributions, by Clément L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart (arXiv). This paper focuses on testing whether a bivariate discrete distribution has independent marginals, conditioned on the value of a tertiary discrete random variable. More precisely, given realizations of \((X, Y, Z)\), test if \(X \perp Y \mid Z\). Unconditional independence testing (corresponding to the case when \(Z\) is constant) has been extensively studied by the community, with tight upper and lower bounds showing that the sample complexity has two regimes, depending on the tradeoff between the support size and the accuracy desired. This paper shows gives upper and lower bounds for this more general problem, showing a rich landscape depending on the relative value of the parameters.

New Book: Introduction to Property Testing, by Oded Goldreich

A tad less than one year and a half ago, we announced on this blog a then-upcoming book on property testing, by Oded Goldreich. Now, just in time for the Holiday season, or for perusing on the beach if you live in the south hemisphere, the book is out!

Covering a broad swath of topics in property testing, Introduction to Property Testing, by Oded Goldreich (published by Cambridge University Press) is, quoting Amazon:

An extensive and authoritative introduction to property testing, the study of super-fast algorithms for the structural analysis of large quantities of data in order to determine global properties. This book can be used both as a reference book and a textbook, and includes numerous exercises.

See the book’s website for more resources and the (detailed) table of contents.

News for October 2017

After the flood of papers over the past few months, it’s reduced to “merely” six papers this October. And only two papers on distribution testing 🙂 (Update: Gautam pointed out another paper that also does distribution testing under Wasserstein distance.)

Proofs of Proximity for Distribution Testing, by Alessandro Cheisa and Tom Gur (ECCC). Take 1 on distribution testing. As usual, we have a distribution \(\mathcal{D}\) over \([n]\), a property \(\mathbf{P}\) of distributions, and a proximity parameter \(\varepsilon > 0\). But our tester is now aided by a purported proof (or a prover). If \(\mathcal{D} \in \mathbf{P}\), there must exist a proof/prover strategy that makes the tester accept. If \(\mathcal{D}\) is \(\varepsilon\)-far from \(\mathbf{P}\), for any proof/prover strategy, the tester must reject with high probability. This paper studies a number of settings: deterministic vs randomized testers, proof (a la \(\mathbb{NP}\) or \(\mathbb{MA}\)) vs provers (a la \(\mathbb{IP}\)). There are a number of very intriguing results, so here are some highlights. For every property, there is a near-linear proof that allows for \(O(\sqrt{n})\) testers. For every property, either the proof length or the tester (sample) complexity is at least \(\Omega(\sqrt{s})\), where \(s\) is the optimal complexity of a vanilla tester. But there exist prover strategies that can beat this lower bound.

Wasserstein Identity Testing, by Shichuan Deng, Wenzheng Li, and Xuan Wu (arXiv). For Take 2 on distribution testing, this paper considers the problem on continuous distributions. In all results on distribution testing, the sample complexity depends on the support size. This breaks down in this setting, so this paper focuses on identity testing according to Wasserstein distance (as opposed to \(\ell_p\)-norms). A previous paper of Do Ba-Nguyen-Nguyen-Rubinfeld also considers the same problem, where the domain is assumed to be \([\Delta]^d\). In this paper, we assume that the domain \(X\) is endowed with a metric space, to allow for the definition of Wasserstein/earthmover distance between distributions. The final result is technical, depends on the size of nets in \(X\), and is shown to be optimal for testing equality with a known distribution \(p\). The paper also gives an instance optimal for (almost) all \(p\), a la Valiant-Valiant for the discrete case.

Improved Bounds for Testing Forbidden Order Patterns, by Omri Ben-Elizer and Clément Canonne (arXiv). Any function from \(f:D \to \mathbb{R}\), where \(D\) is a total order, can be thought to induce a permutation, based on the order of function values. Consider \(f:[n] \to \mathbb{R}\) and permutation \(\pi \in S_k\). The property of \(\pi\)-free permutations is the set of \(f\) such that no restriction of \(f\) (to a subdomain of size \(k\)) induces \(\pi\). Newman et al proved that this property can be tested non-adaptively in (ignoring \(\varepsilon\)) \(O(n^{1-1/k})\) samples. Furthermore, for non-monotone \(\pi\), there is a non-adaptive \(\Omega(\sqrt{n})\) lower bound. This paper has a number of results that shed significant light on the non-adaptive complexity. The upper bound is improved to \(O(n^{1-1/(k-1)})\), and there exist a class of permutations (for every \(k\)) for which this is the optimal complexity. Furthermore, for a random \(\pi\), the paper shows a lower bound of \(\Omega(n^{1-1/(k-3)})\). There is an intriguing conjecture on the optimal complexity for any \(k\), that has an intricate dependence on the structure of \(k\). On the adaptive side, there is an interesting hierarchy for testing \((1,3,2)\)-freeness, depending on the number of rounds of adaptivity. There is an \(O(n^{1/(r+1)})\) tester with \(r\)-rounds of adaptivity, and any such tester requires \(O(n^{1/2^{r+3}})\) queries.

A \(o(d)\cdot \mathrm{poly}\log n\) Monotonicity Tester for Boolean Functions
over the Hypergrid \([n]^d\)
, by Hadley Black, Deeparnab Chakrabarty, and C. Seshadhri (arXiv). Monotonicity testing has seen much progress in the past few years. This paper focuses on monotonicity testing over the hypergrid, of functions \(f:[n]^d \to \{0,1\}\). For \(n=2\) (the hypercube), the result of Khot-Minzer-Safra gives a \(\widetilde{\sqrt{d}}\) time (non-adaptive) tester. Previously, the best known tester for general \(n\) took \(O(d\log d)\) queries, by Berman-Raskhodnikova-Yaroslavtsev. This paper breaks the barrier of \(d\) queries for the hypergrid, by giving a \(O(d^{5/6}\log n)\) time tester. The main technical ingredient is a new isoperimetric inequality for the directed “augmented” hypergrid, where pairs differing on one coordinate by a power of 2 are also connected. The intriguing open question that remains is the existence of testers with query complexity sublinear in \(d\) and independent of \(n\).

Provable and practical approximations for the degree distribution using sublinear graph samples, by Talya Eden, Shweta Jain, Ali Pinar, Dana Ron, and C. Seshadhri (arXiv). The degree distribution of a simple, undirected graph \(G\) is the sequence \(\{n_d\}\), where \(n_d\) is the number of vertices of degree \(d\). The properties of the degree distribution have played a significant role in network science and the mathematical study of real-world graphs. It is often the first quantity computed in a “data analysis” of a massive graph. This paper addresses the problem of getting (bicriteria) estimates for all values of the degree distribution, in sublinear time. The main result gives sublinear time algorithms for computing degree distributions with sufficiently “fat” tails, as measured by certain fatness indices. For the important special case, when the degree distribution is a power law (\(n_d \propto 1/d^{\gamma}\)), this result yields strictly sublinear algorithms. Interestingly, the main result involves the h-index of the degree sequence, inspired by bibliographic metrics. This problem is practically important, and the paper demonstrates the quality of the sublinear algorithm in practice.

On the Complexity of Sampling Vertices Uniformly from a Graph, by Flavio Chierichetti and Shahrzad Haddadan (arXiv). This paper isn’t your traditional property testing paper, but is very relevant to those of us interested in graph property testing. One common query model in graph property testing is access to uniform random vertices. In practice though (think of a graph like the Facebook friendship network or the web graph), this is quite challenging. We typically have access to a few seed vertices, and we can “crawl” the graph. A natural approach is to perform a random walk (in the hope of mixing quickly) to generate random vertices. We might attempt rejection sampling or Metropolis-Hastings on top of that to get uniform random vertices. A recent result of Chierichetti et al give an algorithm for this problem using \(O(t_{mix} d_{avg})\) samples, where \(t_{mix}\) is the mixing time (of the random walk on the graph) and \(d_{avg}\) is the average degree. This paper proves that this bound is optimal, for most reasonable choices of these parameters.

FOCS 2017: Workshop on Distribution Testing

As it may have transpired by now, FOCS 2017 (a.k.a. the 58th Annual Symposium on Foundations of Computer Science) is around the corner.

I am not going to go into the details of the talks on or related to property testing at the conference itself,\({}^{(\ast)}\) but instead focus on (one of) the workshops.

This FOCS will include 3 workshops,\({}^{(\dagger)}\) on Saturday 14th (gird your loins — they last the full day!):

  • Linear Sketching as a Tool for Everything
  • Frontiers in Distribution Testing
  • Hardness Escalation in Communication Complexity and Query Complexity

The second,  Frontiers in Distribution Testing, is specifically about recent advances, new questions, and novel directions in property testing of distributions. Co-organized by Gautam Kamath and me, its official objective is to “catch the community up in recent developments, and highlight some of the most interesting frontiers in distribution testing.”

It’ll feature a stellar lineup of speakers\({}^{(\ddagger)}\) (namely, Ilias Diakonikolas, Jiantao Jiao, Alon Orlitsky, Constantinos Daskalakis, Ryan O’Donnell, Ronitt Rubinfeld, and Tom Gur), as well as an open problem session — and, very possibly, pictures of eggs. The program is available, with all relevant details, on the workshop webpage.

If you are around next Saturday, consider attending — and submit your favorite open problems!

\({}^{(\ast)}\) Hint: look at Sessions 7A and 10B.
\({}^{(\dagger)}\) Thanks to the FOCS workshop chairs James Lee and Aleksander Mądry.
\({}^{(\ddagger)}\) A lineup of stellar speakers.

News for September 2017

Summer came and went, and Fall begins at full throttle with a (metric) ton of papers. Eight that we counted — if any was missed, please mention it in the comments!

Efficient Removal without Efficient Regularity, by Lior Gishboliner, Asaf Shapira (arXiv). Obtaining efficient removal lemmata for graphs pattern (such as triangle, to name the most famous), that is removal results with bounds on the number of copies of the pattern that is not mind-blowingly huge like a tower of \(\varepsilon\), is a classic and longstanding problem. This work makes significant progress for the last remaining case, i.e. for the pattern \(C_4\): providing bounds that are merely exponential in \(\varepsilon\).

Local decoding and testing of polynomials over grids, by Srikanth Srinivasan, Madhu Sudan (arXiv, ECCC). In this work, the authors study the local decodability and local testability of error-correcting codes corresponding to low-degree polynomials on the grid \(\{0,1\}^n\) (over a field \(\mathbb{F}\supseteq \{0,1\}\)). Obtaining both positive and negative results on these, a consequence of their results is a separation between local testability and local decodability for a natural family of codes.

Lower Bounds for Approximating Graph Parameters via Communication Complexity, by Talya Eden and Will Rosenbaum (arXiv). This paper establishes an analogue of the framework of Blais, Brody, and Matulef (2012), which enabled one to obtain property testing lower bounds by a reduction from communication complexity, for the setting of graph parameter estimation. The authors then leverage this technique to give new and simpler proofs of lower bounds for several such estimation tasks.

A Note on Property Testing Sum of Squares and Multivariate Polynomial Interpolation, by Aaron Potechin and Liu Yang (arXiv). The authors introduce and study the question of testing “sum-of-square-ness,” i.e. the property of a degree-\(d\)-polynomial being a sum of squares. Specifically, they show that one-sided sample-based testers cannot do much better than the trivial approach, that is that they require sample complexity \(n^{\Omega(d)}\) — while learning the polynomial can be done with \(n^{O(d)}\) samples.

Sharp Bounds for Generalized Uniformity Testing, by Ilias Diakonikolas, Daniel Kane, and Alistair Stewart (arXiv, ECCC). Remember the post from last month, which included a paper on “Generalized Uniformity Testing”? Well, this paper more or less settles the question, establishing tight bounds on the sample complexity of testing whether an (unknown) probability distribution over an (unknown) discrete domain is uniform on its support, or far from every uniform distribution. Specifically, the authors significantly strengthen the previous upper bound, by getting the right dependence on \(\varepsilon\) for all regimes; and complement it by a matching worst-case lower bound.

Sample-Optimal Identity Testing with High Probability, by Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price  (ECCC). Usually, in property testing we do not care too much about the error probability \(\delta\): if one can achieve \(1/3\), then simple repetition can bring it down to \(\delta\) at the mild price of a \(\log(1/\delta)\) factor in the query/sample complexity. Is that necessary, though? This paper shows that for uniformity and identity testing of distributions, the answer is “no”: for some regimes, this repetition trick is strictly suboptimal, as one can pay instead only a multiplicative \(\sqrt{\log(1/\delta)}\). And quite interestingly, this improvement is achieved with the simplest algorithm one can think of: by considering the empirical distribution obtained from the samples.

A Family of Dictatorship Tests with Perfect Completeness for 2-to-2 Label Cover, by Joshua Brakensiek and Venkatesan Guruswami (ECCC). While I tried to paraphrase the original abstract, but my attempts only succeeded in making it less clear; and, for fear of botching the job, decided to instead quote said abstract: “[the authors] give a family of dictatorship tests with perfect completeness [that is, one-sided] and low-soundness for 2-to-2 constraints. The associated 2-to-2 conjecture has been the basis of some previous inapproximability results with perfect completeness. However, evidence towards the conjecture in the form of integrality gaps even against weak semidefinite programs has been elusive. [Their] result provides some indication of the expressiveness and non-triviality of 2-to-2 constraints, given the close connections between dictatorship tests and satisfiability and approximability of CSPs.”

A polynomial bound for the arithmetic \(k\)-cycle removal lemma in vector spaces, by Jacob Fox, László Miklós Lovász, and Lisa Sauermann (arXiv). And back to removal lemmata! This work proves a generalization of Green’s arithmetic \(k\)-cycle removal lemma, which held for any \(k\geq 3\) and abelian group \(G\); however, the bounds in this lemma were quite large — i.e., tower-ype. Here, the authors establish an efficient lemma (with polynomial bounds) for the case of the group \(\mathbb{F}_p^n\) (where \(p\geq 2\) is any fixed prime, and \(k\geq 3\)).

Update (10/04): Finally, a paper we covered last summerThe Dictionary Testing Problem, by Siddharth Barman, Arnab Bhattacharyya, and Suprovat Ghoshal, went under signficant changes. Now titled Testing Sparsity over Known and Unknown Bases, it now includes (in addition to the previous results) a testing algorithm for sparsity with regard to a specific basis: given a matrix \(A \in \mathbb{R}^{d \times m}\) and unknown input vector \(y \in \mathbb{R}^d\), does \(y\) equal \(Ax\) for some \(k\)-sparse vector \(x\), or is it far from all such representations?

Update (10/5): we missed a recent paper of Benjamin Fish, Lev Reyzin, and Benjamin Rubinstein on Sublinear-Time Adaptive Data Analysis (arXiv). While not directly falling into the umbrella of property testing, this work considers sublinear-time algorithms for adaptive data analysis — similar in goal and spirit to property testing.

News for August 2017

This month has been comparatively slow, with only five papers. I suppose we’re lucky that five papers is considered to be a slow month!

Local Testing and Decoding of High-Rate Error-Correcting Codes, by Swastik Kopparty and Shubhangi Saraf (ECCC). This is a survey article, covering recent results in locally testable, correctable, and decodable codes. A good place to start for any who have fallen behind on the recent literature.

Sample-Optimal Identity Testing with High Probability, by Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price (arXiv). Distribution testing results are  often stated in the regime where the probability of failure is some constant, i.e. \(\delta = 1/3\). These can be boosted to arbitrarily high probability of success at a multiplicative cost of \(\log (1/\delta)\) using the “median trick” — repeat the test \(\log (1/\delta)\) times and choose the majority result. This paper shows that this method is suboptimal for identity testing with small values of \(\delta\). In particular, they give upper and lower bounds for the entire tradeoff curve between \(n, \varepsilon\), and \(\delta\).

Sample Efficient Estimation and Recovery in Sparse FFT via Isolation on Average, by Michael Kapralov (arXiv). The Sparse Fourier Transform problem has seen a wealth of study in the past few years, with various tradeoffs between sample and time complexity. This paper gives the first result achieving the optimal sample complexity of \(O(k \log n)\) while maintaining a time complexity which is sublinear in the length of the signal, \(n\). In order to obtain this, the author introduces a new technique for analyzing noisy hashing schemes.

Generalized Uniformity Testing, by Tuğkan Batu and Clément L. Canonne (arXiv). Uniformity testing is one of the most studied problems in distribution testing, and it is well known that the complexity is \(\Theta(\sqrt{n})\). This paper studies a slightly different question: given samples from a distribution \(p\), is it uniform over some (unknown) subset of its domain? The authors give upper and lower bounds for this question, showing that the complexity is \(\Theta(1/\|p\|_3)\). Interestingly, when \(p\) is uniform over a set of size \(\Omega(n)\), the complexity is \(\Theta(n^{2/3})\), which is more than the \(\Theta(\sqrt{n})\) cost of vanilla uniformity testing.

Boolean Unateness Testing with \(\tilde O(n^{3/4})\) Adaptive Queries, by Xi Chen, Erik Waingarten, and Jinyu Xie (arXiv). At this point, we have a fairly mature understanding of testing monotonicity over the Boolean hypercube: for non-adaptive algorithms, the complexity is \(\tilde \Theta(\sqrt{n})\). There exists a gap for adaptive algorithms — the best known lower bound is \(\tilde \Omega(n^{1/3})\), while the best upper bound is the same as for the non-adaptive case \(\tilde O(n^{1/2})\). However, many conjecture that adaptivity does not help for monotonicity testing. More recently, there has been study of testing unateness — a function is unate if it is monotone non-decreasing or non-increasing with respect to each coordinate. Interestingly, this work proves an adaptive upper bound of \(\tilde O(n^{2/3})\) which beats the lower bound of \(\tilde \Omega(n)\) for non-adaptive algorithms, thus showing that adaptivity does help for unateness testing.

News for July 2017

We’re seeing lots of papers in the summer. I guess the heat and sun (and more time?) is bringing out the good stuff. Distribution testing, from testing to streaming, hypergraph testing, and bunch of papers on graph testing. And just for the record: in what follows, \(n\) is the support size in distribution testing, it’s the number of vertices in graph testing, and it’s the length when the input is a string. And \(m\) is the number of edges when the input is a graph. Amen. (Update: added a paper on testing Dyck languages. 11 papers this month, I think that’s a PTReview record.)

Which Distribution Distances are Sublinearly Testable?, by Constantinos Daskalakis, Gautam Kamath, and John Wright (Arxiv). Distribution testing is seeing much progress these days. Many questions in this area can be formulated as: given a known distribution \(q\), how many samples from an unknown distribution \(p\) are required to determine the distance between \(p\) and \(q\)? For the “identity testing” setting, we are given discrete distribution \(q\), two distance measures \(d_1, d_2\), and proximity parameters \(\varepsilon_1, \varepsilon_2\). How many samples from unknown distribution \(p\) are required to distinguish \(d_1(p,q) \leq \varepsilon_1\) from \(d_2(p,q) \geq \varepsilon_2\)? Observe the asymmetric nature of the question. The choices for the distributions are most of the standard ones: TV, KL-divergence, Hellinger, \(\chi^2\), and \(\ell_2\). And the paper basically completes the entire picture by giving matching upper and lower bounds (in terms of the support size \(n\)) and showing certain settings that are untestable with sublinear sample complexity. The results also consider the “equivalence/closeness testing” case, where \(q\) is also unknown. There are many interesting aspects of this collection of results. For example, when \(d_1\) is KL-divergence instead of \(\chi^2\), the complexity can jump from \(\Theta(\sqrt{n})\) to \(\Theta(n/\log n)\).

We have two independent papers with similar results (and titles!).

Differentially Private Identity and Closeness Testing of Discrete Distributions, by Maryam Aliakbarpour, Ilias Diakonikolas, and Ronitt Rubinfeld (arXiv). Differentially Private Testing of Identity and Closeness of Discrete Distributions, by Jayadev Acharya, Ziteng Sun, and Huanyu Zhang (arXiv). Both these papers study the problem of identity testing over TV-distance, in the differentially private setting. In essence, the distribution testing algorithm should not be able to distinguish between two “neighboring” input distributions \(p, p’\). The main result, achieved in both papers, is such an algorithm for identity (and closeness) testing whose sample complexity is quite close to the best non-differentially private algorithms. At a high level, the approaches used are also similar, specifically using a result of Goldreich on black-box reductions from identity to uniformity testing, and designing a private version of Paninski’s uniformity tester.

Estimating parameters associated with monotone properties, by Carlos Hoppen, Yoshiharu Kohayakawa, Richard Lang, Hanno Lefmann, and Henrique Stagni (arXiv). The study of monotone graph properties has rich history in graph property testing. This paper focuses on constant-query estimation of monotone graph parameters. A monotone graph property is one that is closed under subgraphs, and a monotone parameter is one that is non-increasing for subgraphs. Given a family of (constant-size) graphs \(\mathcal{F}\), a graph is \(\mathcal{F}\)-free if it contains no subgraph in \(\mathcal{F}\). For any graph \(G\), define the parameter \(z_\mathcal{F}\) to be (essentially) the log of the number of \(\mathcal{F}\)-free spanning subgraphs of \(G\). This parameter has been studied in the context of subgraph counting problems. This papers studies the complexity of estimating \(z_\mathcal{F}\) up to additive error \(\varepsilon\). Most results of this flavor typically involve complexities that are towers in \(1/\varepsilon\), but this result builds connections with the breakthrough subgraph removal lemma of Fox, to yield complexities that are towers in \(\log(1/\varepsilon)\).

Testable Bounded Degree Graph Properties Are Random Order Streamable, by Morteza Monemizadeh, S. Muthukrishnan, Pan Peng, and Christian Sohler (arXiv). This paper proves a nice connection between sublinear and streaming algorithms. A number of properties, such as connectivity, subgraph-freeness, minor-closed properties, are constant-time testable in the bounded-degree graph setting. The main result here shows that any such property can also be tested using constant space (in terms of words) when the graph is a random order stream. In contrast, existing work by Huang and Peng proves \(\Omega(n)\) space lower bounds for some of these properties for adversarial edge order. The main technical ingredient is an algorithm that maintains distributions of constant-sized rooted subgraphs in the random-order streaming setting.

On Testing Minor-Freeness in Bounded Degree Graphs With One-Sided Error, by Hendrik Fichtenberger, Reut Levi, Yadu Vasudev, and Maximilian Wötzel (arXiv). A major open problem in testing of bounded-degree graphs is the one-sided testability of minor-closed properties. Benjamini, Schramm, and Shapira first proved that minor-closed properties are two-sided testable, and conjectured that the one-sided complexity is \(O(\sqrt{n})\). Czumaj et al prove this result for the specific property of being \(C_k\)-minor free (where \(C_k\) is the \(k\)-cycle). This paper gives a \(O(n^{2/3})\) algorithm for testing whether a graph is \(k \times 2\)-grid minor-free. The latter class includes graphs that are cycles with non-intersecting chords and cactus graphs. A common approach in one-sided graph testing is to use the existence of a “nice” decomposition
of the graph into small well-behaved pieces, and argue that the tester only has to look within these pieces. This result employs a recent partitioning scheme of Lenzen and Levi into pieces of size \(O(n^{1/3})\). Unfortunately, this partitioning may remove many edges, and the tester is forced to find minors among these cut edges. The combinatorics of the \(k \times 2\)-grid minor ensures that any such large cut always contains such a minor, and it can be detected efficiently.

Testing bounded arboricity, by Talya Eden, Reut Levi, and Dana Ron (arXiv). The arboricity of a graph is the minimum number of spanning forests that a graph can be decomposed into. It is a 2-approximation of the maximum average degree of a subgraph, and thus is a measure of whether “local” density exists. For example, the arboricity of all minor-closed graphs is constant. The main result in this paper is a property tester for arboricity. Formally, the paper provides an algorithm that accepts if the graph is \(\varepsilon\)-close to having arboricity \(\alpha\), and rejects if the graph is \(c\varepsilon\)-far from having arboricity \(3\alpha\) (for large constant \(c\)). Ignoring dependencies on \(\varepsilon\), the running time is \(O(n/\sqrt{m} + n\alpha/m)\), which is proven to be optimal. The result builds on distributed graph algorithms for forest decompositions.The constant of \(3\) can be brought down arbitrarily close to \(2\), and it is an intriguing question if it can be reduced further.

On Approximating the Number of k-cliques in Sublinear Time, by Talya Eden, Dana Ron, and C. Seshadhri (arXiv). This paper studies the classic problem of approximating the number of \(k\)-cliques of a graph, in the adjacency list representation. The main result is an algorithm that \((1+\varepsilon)\)-approximates \(C_k\) (the number of \(k\)-cliques) in time \(O(n/C^k_k + m^{k/2}/C_k)\). (This expression ignored polylogs and \(\varepsilon\).) Somewhat surprisingly, this strange expression is the optimal complexity. It is a generalization of previous result of Goldreich-Ron (average degree/edge counting, \(k=2\)) and Eden et al (triangle counting, \(k=3\)). The algorithm is quite intricate, and is achieved by building on various sampling techniques in these results.

A characterization of testable hypergraph properties, by Felix Joos, Jaehoon Kim, Daniela Kühn, and Deryk Osthus (arXiv). A fundamental result of Alon, Fischer, Newman, and Shapira characterized all constant-time testable (dense) graph properties, through a connection to the regularity lemma. This paper essentially extends that characterization to \(k\)-uniform hypergraph properties. Essentially, the main results shows the testability of property \(\bf{P}\) is equivalent to \(\bf{P}\) being “regular-reducible”. This is a rather involved definition that captures properties where hypergraph regularity can be used for testing. Furthermore, this can also be used to prove that any testable property also has a constant-time distance estimation algorithm.

Distributed Testing of Conductance, by Hendrik Fichtenberger and Yadu Vasudev (arXiv). There has been recent interest in property testing in the distributed graph framework. The problem studied here in graph conductance. Specifically, we wish to accept a graph with conductance \(\Phi\) and reject a graph that is far from having conductance \(\Phi^2\). The setting is the CONGEST framework, where each vertex has a processor and can only communicate \(O(\log n)\) amount of data to its neighbors in a single round. The main result gives an algorithm with \(O(\log n)\) rounds, which is shown to be optimal. Standard testing algorithms for conductance basically perform a collection of random walks from some randomly chosen seeds vertices. An important insight in this paper is that testing algorithms for conductance can be simulated in the CONGEST model without having to maintain all the random walks explicitly. It suffices to transfer a small amount of information that tracks overall statistics of the walks, which suffice for the testing problem.

Improved bounds for testing Dyck languages, by Eldar Fischer, Frédéric Magniez, Tatiana Starikovskaya (arXiv). Testing membership in languages is a classic problem in property testing. Fix a language \(L\) over an alphabet \(\Sigma\). Given a string \(x\) of length \(n\), the tester should accept if \(x \in L\) and reject if \(x\) is far (in terms of Hamming distance) from \(L\). This paper focuses the simple, yet fundamental setting of \(L\) being a Dyck language, the set of strings of perfectly balanced parentheses. Let \(D_k\) be the Dyck language using \(k\) types of parentheses. The classic paper of Alon et al on regular language testing also gives a constant time tester for \(k=1\), and subsequent work of Parnas, Ron, Rubinfeld proves, for \(k \geq 2\), testing complexity bounds of \(O(n^{2/3})\) and \(\Omega(n^{1/11})\). This paper gives significant improvements in testing complexity: an upper bound of \(O(n^{2/5+\delta})\) (for any \(\delta > 0\)) and a lower bound of \(\Omega(n^{1/5})\). The lower bound proof introduces some new techniques for proving that similarity of transcript distributions of a deterministic algorithm over two distributions of inputs (the key step in any Yao’s minimax lemma application). The proof constructs a third transcript distribution using a special “wild” transcript that represents the (low probability) event of the algorithm “learning” too much. A careful construction of this distribution can used to upper bound the distance between the original transcript distributions.