# FOCS 2017: Workshop on Distribution Testing

As it may have transpired by now, FOCS 2017 (a.k.a. the 58th Annual Symposium on Foundations of Computer Science) is around the corner.

I am not going to go into the details of the talks on or related to property testing at the conference itself,$${}^{(\ast)}$$ but instead focus on (one of) the workshops.

This FOCS will include 3 workshops,$${}^{(\dagger)}$$ on Saturday 14th (gird your loins — they last the full day!):

• Linear Sketching as a Tool for Everything
• Frontiers in Distribution Testing
• Hardness Escalation in Communication Complexity and Query Complexity

The second,  Frontiers in Distribution Testing, is specifically about recent advances, new questions, and novel directions in property testing of distributions. Co-organized by Gautam Kamath and me, its official objective is to “catch the community up in recent developments, and highlight some of the most interesting frontiers in distribution testing.”

It’ll feature a stellar lineup of speakers$${}^{(\ddagger)}$$ (namely, Ilias Diakonikolas, Jiantao Jiao, Alon Orlitsky, Constantinos Daskalakis, Ryan O’Donnell, Ronitt Rubinfeld, and Tom Gur), as well as an open problem session — and, very possibly, pictures of eggs. The program is available, with all relevant details, on the workshop webpage.

If you are around next Saturday, consider attending — and submit your favorite open problems!

$${}^{(\ast)}$$ Hint: look at Sessions 7A and 10B.
$${}^{(\dagger)}$$ Thanks to the FOCS workshop chairs James Lee and Aleksander Mądry.
$${}^{(\ddagger)}$$ A lineup of stellar speakers.

# News for September 2017

Summer came and went, and Fall begins at full throttle with a (metric) ton of papers. Eight that we counted — if any was missed, please mention it in the comments!

Efficient Removal without Efficient Regularity, by Lior Gishboliner, Asaf Shapira (arXiv). Obtaining efficient removal lemmata for graphs pattern (such as triangle, to name the most famous), that is removal results with bounds on the number of copies of the pattern that is not mind-blowingly huge like a tower of $$\varepsilon$$, is a classic and longstanding problem. This work makes significant progress for the last remaining case, i.e. for the pattern $$C_4$$: providing bounds that are merely exponential in $$\varepsilon$$.

Local decoding and testing of polynomials over grids, by Srikanth Srinivasan, Madhu Sudan (arXiv, ECCC). In this work, the authors study the local decodability and local testability of error-correcting codes corresponding to low-degree polynomials on the grid $$\{0,1\}^n$$ (over a field $$\mathbb{F}\supseteq \{0,1\}$$). Obtaining both positive and negative results on these, a consequence of their results is a separation between local testability and local decodability for a natural family of codes.

Lower Bounds for Approximating Graph Parameters via Communication Complexity, by Talya Eden and Will Rosenbaum (arXiv). This paper establishes an analogue of the framework of Blais, Brody, and Matulef (2012), which enabled one to obtain property testing lower bounds by a reduction from communication complexity, for the setting of graph parameter estimation. The authors then leverage this technique to give new and simpler proofs of lower bounds for several such estimation tasks.

A Note on Property Testing Sum of Squares and Multivariate Polynomial Interpolation, by Aaron Potechin and Liu Yang (arXiv). The authors introduce and study the question of testing “sum-of-square-ness,” i.e. the property of a degree-$$d$$-polynomial being a sum of squares. Specifically, they show that one-sided sample-based testers cannot do much better than the trivial approach, that is that they require sample complexity $$n^{\Omega(d)}$$ — while learning the polynomial can be done with $$n^{O(d)}$$ samples.

Sharp Bounds for Generalized Uniformity Testing, by Ilias Diakonikolas, Daniel Kane, and Alistair Stewart (arXiv, ECCC). Remember the post from last month, which included a paper on “Generalized Uniformity Testing”? Well, this paper more or less settles the question, establishing tight bounds on the sample complexity of testing whether an (unknown) probability distribution over an (unknown) discrete domain is uniform on its support, or far from every uniform distribution. Specifically, the authors significantly strengthen the previous upper bound, by getting the right dependence on $$\varepsilon$$ for all regimes; and complement it by a matching worst-case lower bound.

Sample-Optimal Identity Testing with High Probability, by Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price  (ECCC). Usually, in property testing we do not care too much about the error probability $$\delta$$: if one can achieve $$1/3$$, then simple repetition can bring it down to $$\delta$$ at the mild price of a $$\log(1/\delta)$$ factor in the query/sample complexity. Is that necessary, though? This paper shows that for uniformity and identity testing of distributions, the answer is “no”: for some regimes, this repetition trick is strictly suboptimal, as one can pay instead only a multiplicative $$\sqrt{\log(1/\delta)}$$. And quite interestingly, this improvement is achieved with the simplest algorithm one can think of: by considering the empirical distribution obtained from the samples.

A Family of Dictatorship Tests with Perfect Completeness for 2-to-2 Label Cover, by Joshua Brakensiek and Venkatesan Guruswami (ECCC). While I tried to paraphrase the original abstract, but my attempts only succeeded in making it less clear; and, for fear of botching the job, decided to instead quote said abstract: “[the authors] give a family of dictatorship tests with perfect completeness [that is, one-sided] and low-soundness for 2-to-2 constraints. The associated 2-to-2 conjecture has been the basis of some previous inapproximability results with perfect completeness. However, evidence towards the conjecture in the form of integrality gaps even against weak semidefinite programs has been elusive. [Their] result provides some indication of the expressiveness and non-triviality of 2-to-2 constraints, given the close connections between dictatorship tests and satisfiability and approximability of CSPs.”

A polynomial bound for the arithmetic $$k$$-cycle removal lemma in vector spaces, by Jacob Fox, László Miklós Lovász, and Lisa Sauermann (arXiv). And back to removal lemmata! This work proves a generalization of Green’s arithmetic $$k$$-cycle removal lemma, which held for any $$k\geq 3$$ and abelian group $$G$$; however, the bounds in this lemma were quite large — i.e., tower-ype. Here, the authors establish an efficient lemma (with polynomial bounds) for the case of the group $$\mathbb{F}_p^n$$ (where $$p\geq 2$$ is any fixed prime, and $$k\geq 3$$).

Update (10/04): Finally, a paper we covered last summerThe Dictionary Testing Problem, by Siddharth Barman, Arnab Bhattacharyya, and Suprovat Ghoshal, went under signficant changes. Now titled Testing Sparsity over Known and Unknown Bases, it now includes (in addition to the previous results) a testing algorithm for sparsity with regard to a specific basis: given a matrix $$A \in \mathbb{R}^{d \times m}$$ and unknown input vector $$y \in \mathbb{R}^d$$, does $$y$$ equal $$Ax$$ for some $$k$$-sparse vector $$x$$, or is it far from all such representations?

Update (10/5): we missed a recent paper of Benjamin Fish, Lev Reyzin, and Benjamin Rubinstein on Sublinear-Time Adaptive Data Analysis (arXiv). While not directly falling into the umbrella of property testing, this work considers sublinear-time algorithms for adaptive data analysis — similar in goal and spirit to property testing.

# News for August 2017

This month has been comparatively slow, with only five papers. I suppose we’re lucky that five papers is considered to be a slow month!

Local Testing and Decoding of High-Rate Error-Correcting Codes, by Swastik Kopparty and Shubhangi Saraf (ECCC). This is a survey article, covering recent results in locally testable, correctable, and decodable codes. A good place to start for any who have fallen behind on the recent literature.

Sample-Optimal Identity Testing with High Probability, by Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price (arXiv). Distribution testing results are  often stated in the regime where the probability of failure is some constant, i.e. $$\delta = 1/3$$. These can be boosted to arbitrarily high probability of success at a multiplicative cost of $$\log (1/\delta)$$ using the “median trick” — repeat the test $$\log (1/\delta)$$ times and choose the majority result. This paper shows that this method is suboptimal for identity testing with small values of $$\delta$$. In particular, they give upper and lower bounds for the entire tradeoff curve between $$n, \varepsilon$$, and $$\delta$$.

Sample Efficient Estimation and Recovery in Sparse FFT via Isolation on Average, by Michael Kapralov (arXiv). The Sparse Fourier Transform problem has seen a wealth of study in the past few years, with various tradeoffs between sample and time complexity. This paper gives the first result achieving the optimal sample complexity of $$O(k \log n)$$ while maintaining a time complexity which is sublinear in the length of the signal, $$n$$. In order to obtain this, the author introduces a new technique for analyzing noisy hashing schemes.

Generalized Uniformity Testing, by Tuğkan Batu and Clément L. Canonne (arXiv). Uniformity testing is one of the most studied problems in distribution testing, and it is well known that the complexity is $$\Theta(\sqrt{n})$$. This paper studies a slightly different question: given samples from a distribution $$p$$, is it uniform over some (unknown) subset of its domain? The authors give upper and lower bounds for this question, showing that the complexity is $$\Theta(1/\|p\|_3)$$. Interestingly, when $$p$$ is uniform over a set of size $$\Omega(n)$$, the complexity is $$\Theta(n^{2/3})$$, which is more than the $$\Theta(\sqrt{n})$$ cost of vanilla uniformity testing.

Boolean Unateness Testing with $$\tilde O(n^{3/4})$$ Adaptive Queries, by Xi Chen, Erik Waingarten, and Jinyu Xie (arXiv). At this point, we have a fairly mature understanding of testing monotonicity over the Boolean hypercube: for non-adaptive algorithms, the complexity is $$\tilde \Theta(\sqrt{n})$$. There exists a gap for adaptive algorithms — the best known lower bound is $$\tilde \Omega(n^{1/3})$$, while the best upper bound is the same as for the non-adaptive case $$\tilde O(n^{1/2})$$. However, many conjecture that adaptivity does not help for monotonicity testing. More recently, there has been study of testing unateness — a function is unate if it is monotone non-decreasing or non-increasing with respect to each coordinate. Interestingly, this work proves an adaptive upper bound of $$\tilde O(n^{2/3})$$ which beats the lower bound of $$\tilde \Omega(n)$$ for non-adaptive algorithms, thus showing that adaptivity does help for unateness testing.

# News for July 2017

We’re seeing lots of papers in the summer. I guess the heat and sun (and more time?) is bringing out the good stuff. Distribution testing, from testing to streaming, hypergraph testing, and bunch of papers on graph testing. And just for the record: in what follows, $$n$$ is the support size in distribution testing, it’s the number of vertices in graph testing, and it’s the length when the input is a string. And $$m$$ is the number of edges when the input is a graph. Amen. (Update: added a paper on testing Dyck languages. 11 papers this month, I think that’s a PTReview record.)

Which Distribution Distances are Sublinearly Testable?, by Constantinos Daskalakis, Gautam Kamath, and John Wright (Arxiv). Distribution testing is seeing much progress these days. Many questions in this area can be formulated as: given a known distribution $$q$$, how many samples from an unknown distribution $$p$$ are required to determine the distance between $$p$$ and $$q$$? For the “identity testing” setting, we are given discrete distribution $$q$$, two distance measures $$d_1, d_2$$, and proximity parameters $$\varepsilon_1, \varepsilon_2$$. How many samples from unknown distribution $$p$$ are required to distinguish $$d_1(p,q) \leq \varepsilon_1$$ from $$d_2(p,q) \geq \varepsilon_2$$? Observe the asymmetric nature of the question. The choices for the distributions are most of the standard ones: TV, KL-divergence, Hellinger, $$\chi^2$$, and $$\ell_2$$. And the paper basically completes the entire picture by giving matching upper and lower bounds (in terms of the support size $$n$$) and showing certain settings that are untestable with sublinear sample complexity. The results also consider the “equivalence/closeness testing” case, where $$q$$ is also unknown. There are many interesting aspects of this collection of results. For example, when $$d_1$$ is KL-divergence instead of $$\chi^2$$, the complexity can jump from $$\Theta(\sqrt{n})$$ to $$\Theta(n/\log n)$$.

We have two independent papers with similar results (and titles!).

Differentially Private Identity and Closeness Testing of Discrete Distributions, by Maryam Aliakbarpour, Ilias Diakonikolas, and Ronitt Rubinfeld (arXiv). Differentially Private Testing of Identity and Closeness of Discrete Distributions, by Jayadev Acharya, Ziteng Sun, and Huanyu Zhang (arXiv). Both these papers study the problem of identity testing over TV-distance, in the differentially private setting. In essence, the distribution testing algorithm should not be able to distinguish between two “neighboring” input distributions $$p, p’$$. The main result, achieved in both papers, is such an algorithm for identity (and closeness) testing whose sample complexity is quite close to the best non-differentially private algorithms. At a high level, the approaches used are also similar, specifically using a result of Goldreich on black-box reductions from identity to uniformity testing, and designing a private version of Paninski’s uniformity tester.

Estimating parameters associated with monotone properties, by Carlos Hoppen, Yoshiharu Kohayakawa, Richard Lang, Hanno Lefmann, and Henrique Stagni (arXiv). The study of monotone graph properties has rich history in graph property testing. This paper focuses on constant-query estimation of monotone graph parameters. A monotone graph property is one that is closed under subgraphs, and a monotone parameter is one that is non-increasing for subgraphs. Given a family of (constant-size) graphs $$\mathcal{F}$$, a graph is $$\mathcal{F}$$-free if it contains no subgraph in $$\mathcal{F}$$. For any graph $$G$$, define the parameter $$z_\mathcal{F}$$ to be (essentially) the log of the number of $$\mathcal{F}$$-free spanning subgraphs of $$G$$. This parameter has been studied in the context of subgraph counting problems. This papers studies the complexity of estimating $$z_\mathcal{F}$$ up to additive error $$\varepsilon$$. Most results of this flavor typically involve complexities that are towers in $$1/\varepsilon$$, but this result builds connections with the breakthrough subgraph removal lemma of Fox, to yield complexities that are towers in $$\log(1/\varepsilon)$$.

Testable Bounded Degree Graph Properties Are Random Order Streamable, by Morteza Monemizadeh, S. Muthukrishnan, Pan Peng, and Christian Sohler (arXiv). This paper proves a nice connection between sublinear and streaming algorithms. A number of properties, such as connectivity, subgraph-freeness, minor-closed properties, are constant-time testable in the bounded-degree graph setting. The main result here shows that any such property can also be tested using constant space (in terms of words) when the graph is a random order stream. In contrast, existing work by Huang and Peng proves $$\Omega(n)$$ space lower bounds for some of these properties for adversarial edge order. The main technical ingredient is an algorithm that maintains distributions of constant-sized rooted subgraphs in the random-order streaming setting.

On Testing Minor-Freeness in Bounded Degree Graphs With One-Sided Error, by Hendrik Fichtenberger, Reut Levi, Yadu Vasudev, and Maximilian Wötzel (arXiv). A major open problem in testing of bounded-degree graphs is the one-sided testability of minor-closed properties. Benjamini, Schramm, and Shapira first proved that minor-closed properties are two-sided testable, and conjectured that the one-sided complexity is $$O(\sqrt{n})$$. Czumaj et al prove this result for the specific property of being $$C_k$$-minor free (where $$C_k$$ is the $$k$$-cycle). This paper gives a $$O(n^{2/3})$$ algorithm for testing whether a graph is $$k \times 2$$-grid minor-free. The latter class includes graphs that are cycles with non-intersecting chords and cactus graphs. A common approach in one-sided graph testing is to use the existence of a “nice” decomposition
of the graph into small well-behaved pieces, and argue that the tester only has to look within these pieces. This result employs a recent partitioning scheme of Lenzen and Levi into pieces of size $$O(n^{1/3})$$. Unfortunately, this partitioning may remove many edges, and the tester is forced to find minors among these cut edges. The combinatorics of the $$k \times 2$$-grid minor ensures that any such large cut always contains such a minor, and it can be detected efficiently.

Testing bounded arboricity, by Talya Eden, Reut Levi, and Dana Ron (arXiv). The arboricity of a graph is the minimum number of spanning forests that a graph can be decomposed into. It is a 2-approximation of the maximum average degree of a subgraph, and thus is a measure of whether “local” density exists. For example, the arboricity of all minor-closed graphs is constant. The main result in this paper is a property tester for arboricity. Formally, the paper provides an algorithm that accepts if the graph is $$\varepsilon$$-close to having arboricity $$\alpha$$, and rejects if the graph is $$c\varepsilon$$-far from having arboricity $$3\alpha$$ (for large constant $$c$$). Ignoring dependencies on $$\varepsilon$$, the running time is $$O(n/\sqrt{m} + n\alpha/m)$$, which is proven to be optimal. The result builds on distributed graph algorithms for forest decompositions.The constant of $$3$$ can be brought down arbitrarily close to $$2$$, and it is an intriguing question if it can be reduced further.

On Approximating the Number of k-cliques in Sublinear Time, by Talya Eden, Dana Ron, and C. Seshadhri (arXiv). This paper studies the classic problem of approximating the number of $$k$$-cliques of a graph, in the adjacency list representation. The main result is an algorithm that $$(1+\varepsilon)$$-approximates $$C_k$$ (the number of $$k$$-cliques) in time $$O(n/C^k_k + m^{k/2}/C_k)$$. (This expression ignored polylogs and $$\varepsilon$$.) Somewhat surprisingly, this strange expression is the optimal complexity. It is a generalization of previous result of Goldreich-Ron (average degree/edge counting, $$k=2$$) and Eden et al (triangle counting, $$k=3$$). The algorithm is quite intricate, and is achieved by building on various sampling techniques in these results.

A characterization of testable hypergraph properties, by Felix Joos, Jaehoon Kim, Daniela Kühn, and Deryk Osthus (arXiv). A fundamental result of Alon, Fischer, Newman, and Shapira characterized all constant-time testable (dense) graph properties, through a connection to the regularity lemma. This paper essentially extends that characterization to $$k$$-uniform hypergraph properties. Essentially, the main results shows the testability of property $$\bf{P}$$ is equivalent to $$\bf{P}$$ being “regular-reducible”. This is a rather involved definition that captures properties where hypergraph regularity can be used for testing. Furthermore, this can also be used to prove that any testable property also has a constant-time distance estimation algorithm.

Distributed Testing of Conductance, by Hendrik Fichtenberger and Yadu Vasudev (arXiv). There has been recent interest in property testing in the distributed graph framework. The problem studied here in graph conductance. Specifically, we wish to accept a graph with conductance $$\Phi$$ and reject a graph that is far from having conductance $$\Phi^2$$. The setting is the CONGEST framework, where each vertex has a processor and can only communicate $$O(\log n)$$ amount of data to its neighbors in a single round. The main result gives an algorithm with $$O(\log n)$$ rounds, which is shown to be optimal. Standard testing algorithms for conductance basically perform a collection of random walks from some randomly chosen seeds vertices. An important insight in this paper is that testing algorithms for conductance can be simulated in the CONGEST model without having to maintain all the random walks explicitly. It suffices to transfer a small amount of information that tracks overall statistics of the walks, which suffice for the testing problem.

Improved bounds for testing Dyck languages, by Eldar Fischer, Frédéric Magniez, Tatiana Starikovskaya (arXiv). Testing membership in languages is a classic problem in property testing. Fix a language $$L$$ over an alphabet $$\Sigma$$. Given a string $$x$$ of length $$n$$, the tester should accept if $$x \in L$$ and reject if $$x$$ is far (in terms of Hamming distance) from $$L$$. This paper focuses the simple, yet fundamental setting of $$L$$ being a Dyck language, the set of strings of perfectly balanced parentheses. Let $$D_k$$ be the Dyck language using $$k$$ types of parentheses. The classic paper of Alon et al on regular language testing also gives a constant time tester for $$k=1$$, and subsequent work of Parnas, Ron, Rubinfeld proves, for $$k \geq 2$$, testing complexity bounds of $$O(n^{2/3})$$ and $$\Omega(n^{1/11})$$. This paper gives significant improvements in testing complexity: an upper bound of $$O(n^{2/5+\delta})$$ (for any $$\delta > 0$$) and a lower bound of $$\Omega(n^{1/5})$$. The lower bound proof introduces some new techniques for proving that similarity of transcript distributions of a deterministic algorithm over two distributions of inputs (the key step in any Yao’s minimax lemma application). The proof constructs a third transcript distribution using a special “wild” transcript that represents the (low probability) event of the algorithm “learning” too much. A careful construction of this distribution can used to upper bound the distance between the original transcript distributions.

# News for June 2017

Happy Fourth of July to everyone! Last month was prolific for property testing, as we counted no less than nine papers making their way online!*

Parameterized Property Testing of Functions, by Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, and Nithin Varma. (ECCC) (A preliminary version was published in ITCS’17). In property testing, here of functions, it is standard to parameterize the query complexity by the two “obvious” parameters — namely, the domain size parameter $$n$$, and the proximity parameter $$\varepsilon$$. While this often leads to a good understanding of the landscape, sometime a more fine-grained analysis may be useful, to capture the complexity of the question in terms of the setting-specific “right” parameters. This work initiates this line of inquiry for functions $$f\colon [n] \to \mathbb{R}$$, showing examples where classic lower bounds can be circumvented by focusing on a more relevant parameters of the problem.

A Lower Bound for Nonadaptive, One-Sided Error Testing of Unateness of Boolean Functions over the Hypercube, by Roksana Baleshzar, Deeparnab Chakrabarty, Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, C. Seshadhri (arXiv). A Boolean function $$f\colon \{0,1\}^n\to \{0,1\}$$ is said to be unate if it is monotone (either increasing or decreasing) in each coordinate; or, equivalently, if there exists $$\sigma\in\{0,1\}^n$$ such that $$x\mapsto f(x\oplus\sigma)$$ is monotone. As reported in the previous months, there has been a significant amount of work lately on testing unateness of functions (including real-valued). In this short paper, the authors improve a lower bound of Chen et al. (2017) on non-adaptive, one-sided unateness testing of Boolean functions, bringing it from $$\Omega(\frac{n}{\log^2 n})$$ to $$\Omega(\frac{n}{\log n})$$ — leaving only a gap of $$\log^2 n$$ between known upper and lower bounds.

Adaptivity is exponentially powerful for testing monotonicity of halfspaces, by Xi Chen, Rocco A. Servedio, Li-Yang Tan, and Erik Waingarten (arXiv:). While monotonicity testing of general Boolean functions has been widely studied these past years, testing monotonicity of functions promised to belong to a specific class (here, halfspaces — linear threshold functions) is much less understood. In this work, the authors show that testing monotonicity of halfspaces can done adaptively with $$\mathrm{poly}(\log n, 1/\varepsilon)$$ queries. Since the $$\Omega(n^{1/2})$$ non-adaptive lower bound for general monotonicity testing relied on instances that were halfspaces, it applies here as well — showing an exponential gap between adaptive and non-adaptive testing in this case. “It’s as extreme as it gets!”

Testing Piecewise Functions, by Steve Hanneke and Liu Yang (arXiv). Generalizing the concept of “unions of intervals,” the authors consider here the following general question: fix a class of functions $$\mathcal{H}\subseteq \mathcal{Y}^{\mathcal{X}}$$, where $$\mathcal{X},\mathcal{Y}$$ are “nice” spaces (typically $$\mathcal{X}=\mathbb{R}$$) and parameter $$k$$ and $$\varepsilon$$. The goal is to test whether (i) one can partition $$\mathcal{X}$$ into $$k$$ pieces and find $$h_1,\dots,h_k\in\mathcal{H}$$, such that $$f=h_i$$ on the $$i$$-th piece; or (ii) $$f$$ is $$\varepsilon$$-far (for a notion of distance depending on the measure on $$\mathcal{X}$$) from every such “$$k$$-piecewise function.”
They give upper bounds (as well as a lower bound) for the query complexity of this question in both the active testing and the sample-based testing settings (for the latter, considering the class $$\mathcal{H}$$ of constant functions).

Sample-based high-dimensional convexity testing, by Xi Chen, Adam Freilich, Rocco A. Servedio, and Timothy Sun (arXiv). In the setting considered in this paper, there exists an unknown set $$S\subseteq \mathbb{R}^n$$, and the algorithm is provided with samples $$\langle x, S(x)\rangle$$, where $$x$$ is distributed according to a standard Gaussian distribution and the label $$S(x)$$ is $$1$$ iff $$x\in S$$. The goal is to test whether $$S$$ is convex (or $\varepsilon$-far from convex under the standard Gaussian distribution). The authors then show near-matching upper and lower bounds: one-sided testing of convexity in this testing has sample complexity $$2^{\tilde{\Theta}(n)}$$, while two-sided testing has sample complexity $$2^{\tilde{\Theta}(\sqrt{n})}$$.

Distributed Detection of Cycles, by Pierre Fraigniaud and Dennis Olivetti (arXiv); and Distributed Subgraph Detection, by Pierre Fraigniaud, Pedro Montealegre, Dennis Olivetti, Ivan Rapaport, and Ioan Todinca (arXiv). Improving the understanding of distributed property testing of graphs in the CONGEST model (already touched upon last month), these two works tackle the question of distributed testing of subgraph freeness. The first considers testing cycle-freeness, showing that $$C_k$$-freeness can be tested in the CONGEST model in a constant number of rounds (and with one-sided error). The second considers the more general question testing $$H$$-freeness of graphs, for a larger class of subgraphs that includes cycles. Specifically, they show that for any pattern $$H$$ consisting of a couple of nodes, connected in an arbitrary manner to a tree, $$H$$-freeness can be tested (with one-sided error) in $$O(1)$$ rounds.

Hypothesis Testing For Densities and High-Dimensional Multinomials: Sharp Local Minimax Rates, by Sivaraman Balakrishnan and Larry Wasserman (arXiv). In distribution testing, the question of testing identity to a known discrete distribution $$p$$ is almost fully settled, with near tight “instance-optimal” bounds. But what about the related question of testing identity to $$p$$ (not necessarily discrete) under the additional promise that both $$p$$ and the unknown distribution $$q$$ both belong to a specific class $$C$$ of distributions, instead of arbitrary? This is the problem this work tackles, for the specific case of $$C$$ being (i) the class of multinomial distributions, and (ii) the class of (continuous) Lipschitz distributions. In both cases, the authors establish some near-tight bounds, both of an “instance-optimal” flavor.

On Axis-Parallel Tests for Tensor Product Codes, by Alessandro Chiesa, Peter Manohar, and Igor Shinkar (ECCC). Low-degree testing has its roots in the origins of property testing, and its connection to the PCP theorem; and has been a fruitful and rich line of work since. Here, the authors analyze a type of low-degree test introduced in 2006 by Ben-Sasson and Sudan, which constrains the tester to only considers restrictions along axis-parallel hyperplanes; and establish new results on these class of (weaker but more general) testers.

* As usual, if you find a paper we forgot or misrepresented, please signal it in the comments below.

# News for May 2017

May brought us a wealth of new papers, including four (!) on distributed property testing.

Distributed Property Testing for Subgraph-Freeness Revisited, by Orr Fischer, Tzlil Gonen, and Rotem Oshman (arXiv). This paper studies subgraph-freeness testing in the CONGEST model of distributed computation. This problem was previously studied — this paper provides a number of new algorithms, which either improve upon the running time of prior work or are the first non-trivial results for these problems. Some cases they study include testing for freeness of $$k$$-cycles, constant-size trees, and $$k$$-cliques. For the latter case, they prove that a dependence on $$\varepsilon$$ is necessary.

Faster and Simpler Distributed Algorithms for Testing and Correcting Graph Properties in the CONGEST-Model, by Guy Even, Reut Levi, and Moti Medina (arXiv). This paper also studies graph property testing in the CONGEST model. There appears to be some overlap in problems with the previous paper, including testing subgraph-freeness for cycles and constant size trees. As the title suggests, they also study graph property correction in the case of $$k$$-cycles, and other properties such as bipartiteness.

K-Monotonicity is Not Testable on the Hypercube, by Elena Grigorescu, Akash Kumar, and Karl Wimmer (arXiv, ECCC). Recent work introduced the concept of $$k$$-monotonicity, where a function may switch value between $$0$$ to $$1$$ at most $$k$$-times on any ascending chain. This generalizes the common notion of monotonicity, which corresponds to $$k = 1$$. Since $$1$$-monotonicity on the hypercube is testable in $$O(\sqrt{n})$$ queries, it was conjectured that $$k$$-monotonicity may be tested in $$poly(n^k)$$ queries. This paper disproves this conjecture, showing that even $$2$$-monotonicity requires a number of queries which is exponential in $$\sqrt{n}$$.

The coin problem for product tests, by Chin Ho Lee and Emanuele Viola (ECCC). The coin problem asks, if $$f$$ is a product of $$k$$ functions on disjoint inputs of length $$n$$ bits, what is the smallest $$\varepsilon$$ such that $$f$$ can distinguish between an input of $$m = nk$$ fair bits versus an input of $$m$$ $$\varepsilon$$-biased bits. This paper proves tight bounds in a few cases of interest — when the range of $$f$$ is $$\{0, 1\}$$ or $$\{\pm 1\}$$, the answer is $$\Theta(1/\sqrt{n\log k})$$, while if the range is the set of unit-norm complex numbers, the answer is $$\Theta(1/\sqrt{nk})$$.

Distributed Testing of Conductance, by Hendrik Fichtenberger and Yadu Vasudev (arXiv). Another paper on testing in the CONGEST model — this one studies the problem of testing conductance. In particular, they give an $$O(\log n)$$ round two-sided tester which distinguishes between graphs with conductance at least $$\Phi$$ and graphs which are $$\varepsilon$$-far from having conductance at least $$\Omega(\Phi)$$. They also prove a lower bound of $$\Omega(\log n)$$, even in the (easier) LOCAL model.

On The Multiparty Communication Complexity of Testing Triangle-Freeness, by Orr Fischer, Shay Gershtein, and Rotem Oshman (arXiv). The final paper in May on distributed graph property testing — in contrast to the other papers, this paper considers multiparty communication complexity in the coordinator model. A graph is divided up among $$k$$ players, and each player can only communicate with a coordinator and not each other. The authors show upper bounds on the communication, in both the vanilla and the simultaneous model (which allows only $$1$$ round of communication). They also show a near-matching lower bound for simultaneous protocols on graphs of constant degree.

Exponentially Small Soundness for the Direct Product Z-test, by Irit Dinur and Inbal Livni Navon (ECCC). This paper studies the problem of direct product testing, in which one tests whether the output of a function on a coordinate depends only on the input to that coordinate (approximately). The authors investigate the Z-test of Impagliazzo, Kabanets, and Wigderson, and show that it achieves the optimal soundness of $$\varepsilon \geq \exp(-O(k))$$.

# News for April 2017

April was a busy month for property testing. A mixture of distribution testing, junta testing, Fourier transforms, matrix problems, and graph testing. Let’s go!
(Update: thanks to Omri Ben Eliezer for correcting some errors in our post. Also, he pointed out that we missed posting about by a property testing paper by Gishboliner-Shapira that appeared in Nov 2016.)

Fourier-Based Testing for Families of Distributions, by Clement Canonne, Ilias Diakonikolas, and Alistair Stewart (ECCC). Readers of PTReview will be familiar with great progress made over the past few years in distribution testing. We wish to determine some property $$\mathcal{P}$$ of a discrete distribution $$D$$ of support size $$n$$, using iid samples from the distribution. The challenge is that the number of samples should be $$o(n)$$. This paper gives a general framework for any property $$\mathcal{P}$$, such that all members (distributions) have sparse Fourier transforms. One of the main tools is a tester for determining if $$D$$ has small Fourier support (and find the restriction to this support). This is a really neat unifying methods that subsumes much of the past work. Moreover, this method leads to new testers for a variety of classes of distributions, including that of multinomial distributions.

Testing from One Sample: Is the casino really using a riffle shuffle, by Constantinos Daskalakis, Nishanth Dikkala, and Nick Gravin (arXiv). This paper introduces a novel twist on distribution testing: suppose $$\mathcal{D}$$ was generated through a Markov Chain, and we only get to see a sequence of samples generated through a random walk. Observe that we do not have the classic iid assumption any more. What can be said now? There is a challenge in defining “closeness” of distributions, which is done by looking at the variation distance between sequences generated through random walks of a carefully chosen length. The focus is on identity testing in two different regimes. First, with no assumptions on the Markov chain, there are general results that take $$O(n)$$ queries (note that the Markov chain has size $$O(n^2)$$). Secondly, for special “sparse” Markov Chains such as various models of card shuffling, one can get the number of samples to be logarithmic in size of the state space. Thus, one effectively gets to see a single shuffle process (which is like a walk in a Markov Chain), and can still decide if it is generating a uniform distribution.

Settling the query complexity of non-adaptive junta testing, by Xi Chen, Rocco A. Servedio, Li-Yang Tan, Erik Waingarten, and Jinyu Xie (arXiv). Ah, progress on a classic open problem! A function $$f:\{0,1\}^n \to \{0,1\}$$ is a $$k$$-junta if it only depends on $$k$$ variables. After a long line of work, Blais provided a non-adaptive $$\widetilde{O}(k^{3/2})$$ tester and in a separate result of Blais gave an adaptive (ignoring $$\epsilon$$ factors) $$\widetilde{O}(k)$$ tester. Previous to this paper, the best non-adaptive lower bound was essentially $$\Omega(k)$$, ignoring polylog factors. This paper finally provides a matching non-adaptive lower bound of $$\Omega(k^{3/2})$$, up to polylog factors! This basically settles the knowledge (adaptive and non-adaptive) of this problem, up to polylogs.

An Adaptive Sublinear-Time Block Sparse Fourier Transform, by Volkan Cevher, Michael Kapralov, Jonathan Scarlett, and Amir Zandieh (arXiv).
(We should have caught this when it was first published in Feb 2017, but at least we caught the update!) Consider a vector $$X$$ whose Fourier transform we wish to compute. Suppose there are only $$k$$ non-zero coefficients (typically, this is what we care about). We can actually compute these coefficients using $$O(k\log n)$$ queries, which is strongly sublinear when $$k \ll n$$. An important question is when the sample complexity can go below $$k\log n$$, which is optimal in general. Often sparsity Fourier coefficients have structure that can be exploited. This paper studies block-sparsity, where all non-zero coefficients occur in $$k_0$$ contiguous blocks (in Fourier space) of size $$k_1$$. Thus, the total sparsity is $$k = k_0k_1$$. This paper designs an algorithm with adaptive query complexity $$O(k_0k_1 + k_0\log n)$$, thereby circumventing the $$k\log n$$ barrier for $$k_0 \ll k$$. Interestingly, the paper gives lower bounds showing that adaptivity is necessary for removing the $$\log n$$ factor.

Sublinear Time Low-Rank Approximation of Positive Semidefinite Matrices, by Cameron Musco and David P. Woodruff (arXiv). Few problems occur as often as computing a low-rank approximation of a matrix $$A$$. Formally, one wishes to compute a rank $$k$$-matrix $$B$$ such that $$\|A-B\|_F$$ is minimized. SVD is the obvious method to do this, but is computationally expensive. There are a number of sampling methods, but all of them require reading all of $$A$$. This paper gives a truly sublinear-time algorithm (for small $$k$$) for PSD matrices. Formally, the algorithm presented here requires $$O(n \cdot \mathrm{poly}(k))$$ running time, where $$n$$ is the dimension size of $$A$$. It is well-known from previous work that there exist a subset of $$k$$ columns that form a good low-rank approximation. Yet finding them requires reading the whole matrix to compute various sampling strategies. The intuition here is that large entries cannot “hide” in arbitrary locations in a PSD matrix.

Testing hereditary properties of ordered graphs and matrices, by Noga Alon, Omri Ben-Eliezer, and Eldar Fischer (arXiv). Classic results by Alon-Shapira and Alon-Fischer-Newman-Shapira have shown that hereditary graph properties are testable in constant time. There is a deep connection between testing such properties and the existence of subgraph removal lemmas. The latter assert that, given some family $$\mathcal{F}$$ of forbidden subgraphs, if an $$n$$-vertex graph is $$\epsilon$$-far from being $$\mathcal{F}$$-free, then it must contain $$\delta n^q$$ copies of some $$F \in \mathcal{F}$$ (where $$F$$ has $$q$$ vertices, $$\delta$$ is a function only of $$\epsilon$$). This paper generalizes these theorems to general matrices over finite domains and ordered edge-colored graphs, where vertices in the graphs have a fixed ordering. Thus, one does not require the symmetry (over isomorphism) of graph properties for testability. These theorems lead to testability results for large classes of properties defined through forbidden substructures.

Removal Lemmas with Polynomial Bounds, by Lior Gishboliner and Asaf Shapira (arXiv). (Paper originally appeared in Nov 2016, but we missed it.) As mentioned above, there is a close connection between property testing of dense graphs and graph removal lemmas. Yet the property testers that result from these lemmas, like the celebrated triangle removal lemma, have terrible tower-like dependences on the parameter $$\epsilon$$. When can we get polynomial dependences on $$\epsilon$$ (which is called “easy testability”)? This fundamental question is answered in this paper. Consider the case when $$\mathcal{F}$$ is finite. If $$\mathcal{F}$$ contains a bipartite graph, the complement of a bipartite graph, and a split graph, then there are polynomially bounded graph removal lemmas (and hence easy property testers) for this family. Furthermore, one cannot drop any of these requirements! (A split graph is one where vertices can be partitioned into a set spanning a clique, and a set spanning an independent set.) There are generalizations to infinite families of graphs. An important corollary of this result is a proof of easy testability of semi-algebraic properties. This captures properties of graphs where vertices are points in Euclidean space and edges are present when the points satisfy some polynomial constraint (in the coordinates).

# News for March 2017

March was a relatively good month for property testing, with 3 papers — including a foray in differential privacy.

Optimal Unateness Testers for Real-Valued Functions: Adaptivity Helps, by Roksana Baleshzar, Deeparnab Chakrabarty, Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, and C. Seshadhri (arXiv). Testing unateness (that is, whether a function is monotone after a rotation of the hypercube) has recently received significant attention. Understanding the power of adaptivity in property testing, on the other hand, has been a long-standing question, which has sparked much interest over the past decade. In this paper, the authors obtain optimal results for testing unateness of real-valued functions $$f\colon \{0,1\}^d\to \mathbb{R}$$ and $$f\colon [n]^d\to \mathbb{R}$$ (up to the dependence on the proximty parameter $$\varepsilon$$), and show that — unlike what happens for monotonicity testing of real-valued functions — adaptivity helps. Namely, adaptive testing of unatess of functions $$f\colon \{0,1\}^d\to \mathbb{R}$$ has query complexity $$O(d/\varepsilon)$$, but non-adaptive testers require $$\Omega(d\log d)$$ queries.
This work is a merged, and significantly improved version of two papers we covered a few months back, by the same authors.

Switching from function to distribution testing, last month saw two independent samples appear online:

Near-Optimal Closeness Testing of Discrete Histogram Distributions, by Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin (arXiv). Closeness testing of discrete distributions is the question of testing, given sample access to two unknown distributions $$p,q$$ over a known discrete domain $$[n] \stackrel{\rm def}{=} \{1,\dots,n\}$$, whether $$p=q$$ or $$d_{\rm TV}(p,q)> \varepsilon$$ (are the two distributions the same, or differ significantly in total variation distance?). This quite fundamental question is by now fully understood in the general case, where $$p,q$$ are allowed to be arbitrary distributions over ; but what if one has some additional knowledge about them? Specifically, what if we were guaranteed that both distributions have some “nice structure” — e.g., that they are piecewise constant with $$k$$ pieces ($$k$$-histograms)? Leveraging machinery the authors developed in a series of previous work (for the continuous case), the authors show that in this case, the sample complexity of “testing closeness under structural assumptions” is quite different than in the general case; and that the resulting (near tight) bound is a rather intricate tradeoff between the three parameters $$n,k,\varepsilon$$.

Priv’IT: Private and Sample Efficient Identity Testing, by Bryan Cai, Constantinos Daskalakis, and Gautam Kamath (arXiv). Distribution testing is a vibrant field; differential privacy (DP) is an incredibly active and topical area. What about differentially private distribution testing — a.k.a. testing the data without learning too much about any single sample? In this paper, the authors address the task of performing identity testing of distributions (“given samples from an unknown arbitrary distribution, decide whether it matches a fixed known probability distribution/model”) in the DP setting, focusing on getting the best of both worlds: how to guarantee privacy without sacrificing efficiency. Not only do they obtain asymptotically near-optimal bounds in both the testing and differential privacy parameters — they also run some experiments to validate their approach empirically. (Plus, the title is rather cute.)

Usual disclaimer: if we forgot or misrepresented your paper, please signal it in the comments — we’ll address it as quickly as we can.

# News for February 2017

This February, we had four exciting testing papers, covering domains from Boolean functions to distributions to graphs. Read on to catch up on the latest results!

Property Testing of Joint Distributions using Conditional Samples, by Rishiraj Bhattacharyya and Sourav Chakraborty (arXiv). This paper focuses on the problems of identity testing and independence testing for multivariate distributions in the conditional sampling model. Multivariate distribution testing has not previously been considered in the conditional sampling model. While previous algorithms for identity testing would generally carry over to this setting, the authors restrict themselves to algorithms where the queries are on subcubes of the domain. Interestingly, the authors show that the complexity of these problems is polynomial in the dimension (and that a polynomial dependence is necessary). That is, despite the “simple” structure of the queries, they are still powerful enough to avoid the curse of dimensionality which is present in vanilla multivariate distribution testing.

An Improved Dictatorship Test with Perfect Completeness, by Amey Bhangale, Subhash Khot, and Devanathan Thiruvenkatachari (arXiv, ECCC). A function is a dictator if it depends on exactly one variable. This paper provides an improved tester with perfect completeness for this property. In particular, they give a (non-adaptive) $$k$$-query test which never rejects a dictator function, and accepts a far-from-dictator function with probability at most $$\frac{2k +1}{2^k}$$. This improves upon the previous best result, which accepts a far-from-dictator function with probability at most $$\frac{2k+3}{2^k}$$.

An Adaptivity Hierarchy Theorem for Property Testing, by Clément L. Canonne and Tom Gur (arXiv, ECCC). This paper investigates the power of adaptivity in property testing. We most frequently think of algorithms as either adaptive or non-adaptive. In the former, an algorithm may specify each query after viewing the results of all previous queries, while in the latter, all the queries must be specified in advance. The authors focus on the natural interpolation, in which the algorithm is allowed to make queries in $$k$$ rounds, where the queries in a round may depend on the result of queries in all previous rounds. They show that the power of an algorithm can shift dramatically with just one more round of adaptivity — there exist properties which can be tested with $$\tilde O(k)$$ queries with $$k$$ rounds of adaptivity, but require $$\Omega(n)$$ queries with $$k-1$$ rounds of adaptivity.

Beyond Talagrand Functions: New Lower Bounds for Testing Monotonicity and Unateness, by Xi Chen, Erik Waingarten, and Jinyu Xie (arXiv). The main result in this paper is an improved lower bound for the well-studied problem of testing monotonicity of Boolean functions. The authors show that any two-sided and adaptive algorithm requires $$\tilde \Omega(n^{1/3})$$ samples to test whether a function is monotone or far from monotone. This improves upon the previous result of $$\tilde \Omega(n^{1/4})$$ by Belovs and Blais, which was the first to show that adaptive monotonicity testing requires polynomially-many queries. The construction of Belovs and Blais uses Talagrand’s random DNFs, for which they also showed $$\tilde O(n^{1/4})$$ queries are sufficient. This paper analyzes an extension of this family, denoted as two-level Talagrand functions. The result leaves open the tantalizing question: does adaptivity help with monotonicity testing? Or does the complexity remain $$\tilde \Theta(\sqrt{n})$$? The authors also prove lower bounds for the problem of testing unateness, a problem of recent interest and a natural generalization of monotonicity.

# News for January 2017

2017 starts off rather slow for property testing. Though, we have an intriguing paper to report – an experimental analysis of a classic sublinear algorithm.

Evaluating a Sublinear-time Algorithm for the Minimum Spanning Tree Weight Problem, by Gabriele Santi and Leonardo De Laurentiis (arXiv). The Chazelle-Rubinfeld-Trevisan Minimum Spanning Tree algorithm is a classic in sublinear algorithms. This algorithms provides a $$(1+\varepsilon)$$ approximation to the MST in time independent of the number of vertices (although it does depend on the average degree). But how this compare with Prim’s algorithm on real instances, in a real (not theoretical) computer? This intriguing paper does a detailed experimental comparison. Having done experimental graph algorithms myself, I can attest to the various challenges: how to choose a test set of graphs? How to set error parameters? Can data structure optimization on the coding side beat asymptotic improvements? This paper does a series of experiments on synthetic graph generators (such as Erdős-Rényi, Barabási-Albert, Watts-Strogatz models). They do validate the basic CRT algorithm at scale, by showing that it is faster than Prim for graphs with more than a million edges. Their experiments suggest that the sublinear-time algorithm gives little benefits when $$\varepsilon \leq 0.2$$. The paper has many experiments for a variety of settings, and the authors do a comprehensive study of the various parameters. I’d definitely recommend to anyone interested in exploring how property testing might influence algorithms in the real world.