Author Archives: Clement Canonne

News for September 2018

Last month was slower than usual, with only one property testing paper*

Property testing and expansion in cubical complexes, by David Garber, Uzi Vishne (arXiv). Consider the question of testing if an arbitrary function \(f\colon V\times V \to\{-1,1\}\) is of the form \(f(x,y) = h(x)h(y)\) for some \(h\colon V\to\{-1,1\}\). An intuitive one-sided test, shown to work by Lubotzky and Kaufman (2014), is to pick uniformly random \(x,y,z\in V\) and check that \(f(x,y)f(y,z)f(z,x)=1\). This paper considers the high-dimensional generalization of testing the property that a function\(f\colon V\times V \times V\times V \to\{-1,1\}\) is of the form \(f(w,x,y,z) = \alpha\cdot h(w,x)h(y,x)h(y,z) h(w,z)\), for some \(h\colon V\times V\to\{-1,1\}\) and sign \(\alpha\in\{-1,1\}\). The authors derive necessary and sufficient conditions for testability of this property, by formulating it in the language of incidence geometry and exploiting this connection.

* If we missed any, please let us know in the comments.

News for June 2018

The summer gets off to a flying start, with three property testing papers, spanning differential privacy, distribution testing, and juntas in Gaussian space!

On closeness to \(k\)-wise uniformity, by Ryan O’Donnell and Yu Zhao (arXiv)
In this paper, the authors consider the following structural question about probability distributions over the Boolean hypercube \(\{-1,1\}^n\): ” what is the relation between total variation distance \(\delta\) to \(k\)-wise independence, and bound \(\varepsilon\) on the Fourier coefficients of the distribution on degrees up to \(k\)?”

While this question might seem a bit esoteric at first glance, it has direct and natural applications to derandomization, and of course to distribution testing (namely, to test \(k\)-wise independence and its generalization, \((\varepsilon, k)\)-wise independence of distributions over the hypercube).

The main contribution here is to improve (by a \((\log n)^{O(k)}\) factor) the bounds on \(\delta (n,k,\varepsilon)\) over the previous work by Alon et al. [AAK+07], making them either tight (for \(k\) even) or near-tight. To do so, the authors introduce a new hammer to the game, using linear programming duality in the proof of both their upper and lower bounds.

Property Testing for Differential Privacy, by Anna Gilbert and Audra McMillan (arXiv)
Differential privacy, as introduced by Dwork et al., needs no introduction. Property testing, especially on this website, needs even less. What about a combination of the two? Namely, given black-box access to an algorithm claiming to perform a differentially private computation, how to test whether this is indeed the case?

Introducing and considering this quite natural question for the first time, this work shows, roughly speaking, that testing differential privacy is hard. Specifically, they show that for many notions of differential privacy (pure, approximate, and their distributional counterparts), testing is either impossible or possible but not with a sublinear number of queries (even when the tester is provided with side information about the black-box). In other terms, as the authors put it: trusting the privacy of an algorithm “requires compromise by either the verifier or algorithm owner” (and, in the latter case, even then it’s not a simple matter).

Is your data low-dimensional?, by Anindya De, Elchanan Mossel, and Joe Neeman (arXiv)
(Well, is it?) To state it upfront, I am biased here, as it is a problem I was very eager to see investigated to begin with. To recap, the question is as follows: “given query access to some unknown Boolean-valued function \(f\colon \mathbb{R}^n \to \{-1,1\}\) over the high-dimensional space \(\mathbb{R}^n\) endowed with the Gaussian measure, how can one check whether \(f\) only depends on “few” (i.e., \(k \ll n\)) variables?”

This is the continuous, Gaussian version of the (quite famous) junta testing problem, which has gathered significant attention over the past years (the Gaussian version has, to the best of my knowledge, never been investigated). Now, the above formulation has a major flaw: specifically, it is uninteresting. In Gaussian space*, who cares about the particular basis I expressed my input vector in? So a more relevant question, and that that the authors tackle, is the more robust and natural one: “given query access to some unknown Boolean-valued function \(f\colon \mathbb{R}^n \to \{-1,1\}\) over the high-dimensional space \(\mathbb{R}^n\) endowed with the Gaussian measure, how can one check whether \(f\) only depends on a low-dimensional linear combination of the variables?” Or, put differently, does all the relevant information for \(f\) live in a low-dimensional subspace?

De, Mossel, and Neeman show how can do this, non-adaptively, with a query complexity independent of the dimension \(n\) (hurray!), but instead polynomial in \(k\), the distance parameter \(\varepsilon\), and the surface area \(s\) of \(f\). And since this last parameter may seem quite arbitrary, they also proceed to show that a polynomial dependence in this \(s\) is indeed required.

*”In Gaussian space, no one can hear you change basis?”

News for March 2018

March has been a relatively slow month for property testing, with 3 works appearing online. (If you notice we missed some, please leave a comment pointing it out)

Edge correlations in random regular hypergraphs and applications to subgraph testing, by Alberto Espuny Díaz, Felix Joos, Daniela Kühn, and Deryk Osthus (arXiv). While testing subgraph-freness in the dense graph model is now well-understood, after a series of works culminating in a complete characterization of the testing problems which admit constant-query testers, the corresponding question for hypergraphs is far from resolved. In this paper, the authors develop new techniques for the study of study of random regular hypergraphs, which imply new testing results for subhypergraph-freeness testing, improving on the state-of-the-art for some parameter regimes (e.g., when the input graph satisfies some average-degree condition).

Back from hypergraphs to graphs, we also have:
The Subgraph Testing Model, by Oded Goldreich and Dana Ron (ECCC). Here, the authors introduce a new model for property testing of graphs, where the goal is to test if an unknown subgraph \(F\) of an explicitly given graph \(G=(V,E)\) satisfies the desired property. The testing algorithm is provided access to \(F\) via membership queries, i.e., through query access to the indicator function \(\mathbf{1}_F\colon E \to \{0,1\}\). (In some very hazy sense, this is reminiscent of the active learning or testing frameworks, where one gets more or less free access to unlabeled data but pays to see their label.) As a sample of the results obtained, the paper establishes that this new model and the bounded-degree graph model are incomparable: there exist properties easier to test in one model than the other, and vice-versa — and some properties equally easy to test in both.

And finally, to drive home the point that “models matter a lot,” we have our third paper:
Every set in P is strongly testable under a suitable encoding, by Irit Dinur, Oded Goldreich, and Tom Gur (ECCC). It is known that the choice of representation of the objects has a large impact in property testing: for instance, the complexity of testing a given property can change drastically between the dense and bounded-degree graph models. This work provides another example of such a strong dependence on the representation: while membership to some sets in \(P\) is known to be hard to test, the authors here prove that, for every set \(S\in P\), there exists a (polynomial-time, invertible) encoding \(E_S\colon \{0,1\}^\ast\to \{0,1\}^\ast\) such that testing membership to \(S\) under this encoding is easy. (They actually show even stronger a statement: namely, that under this encoding the set admits a “proximity-oblivious tester,” that is a constant-query testing algorithm which rejects with probability function of the distance to \(S\).)

Finally, on a non-property testing note: Edith Cohen, Vitaly Feldman, Omer Reingold, and Ronitt Rubinfeld recently wrote a pledge for inclusiveness in the TCS community, available here: https://www.gopetition.com/petitions/a-pledge-for-inclusiveness-in-toc.html
If you haven’t seen it already, we encourage you to read it.

Update: Fixed a mistake in the overview of the second paper; as pointed out by Oded in the comments, the main comparison was between the new model and the bounded-degree graph model, not the dense graph one.

News for December 2017

December 2017 concluded the year in style, with seven property testing papers spanning quite the range. Let’s hope 2018 will keep up on that trend! (And, of course, if we missed any, please point it out in the comments.The blame would be on our selves from the past year.)

We begin with graphs:

High Dimensional Expanders, by Alexander Lubotzky (arXiv). This paper surveys the recent developments in studying high dimensional expander graphs, a recent generalization of expanders which has become quite active in the past years and has intimate connections to property testing.

Generalized Turán problems for even cycles, by Dániel Gerbner, Ervin Győri, Abhishek Methuku, and Máté Vizer (arXiv).
A Generalized Turán Problem and its Applications, by Lior Gishboliner and Asaf Shapira (arXiv).
In these two independent works, the authors study questions of a following flavor: two subgraphs (patterns) \(H,H’\), what is the maximum number of copies of \(H\) which can exist in a graph \(G\) promised to be \(H’\)-free? They consider the case where the said patterns are cycles on \(\ell,k\) vertices respectively, and obtain asymptotic bounds on the above quantity (the two papers obtain somewhat incomparable bounds, and the first focuses on the case where both \(\ell,k\) are even). These estimates, in turn, have applications to graph removal lemmata, as discussed in the second work (Section 1.2): specifically, it implies the existence of a removal lemma with a
tight super-polynomial bound, a question which was previously open.

Approximating the Spectrum of a Graph, by David Cohen-Steiner, Weihao Kong, Christian Sohler, and Gregory Valiant (arXiv). The authors obtain constant-time and query algorithm for the task of approximating (in \(\ell_1\) norm) the spectrum of a graph \(G\), i.e. the eigenvalues of its Laplacian, given random query access to the nodes of \(G\) and to the neighbors of any given node. They also study the applications of this result to property testing in the bounded-degree model, showing that a large class of spectral properties of high-girth graphs is testable.

Then, we go quantum:

Schur-Weyl Duality for the Clifford Group with Applications: Property Testing, a Robust Hudson Theorem, and de Finetti Representations, by David Gross, Sepehr Nezami, and Michael Walter (arXiv).
Introducing and studying a duality theory for the Clifford group, the authors are able (among other results) to resolve an open question in quantum property testing, establishing a constant-query tester (indeed, making 6 queries) for testing whether an unknown quantum state is a stabilizer state. The previous best upper bound was linear in the number of qubits, as it proceeded by learning the state (“testing by learning”).

Quantum Lower Bound for a Tripartite Version of the Hidden Shift Problem, by Aleksandrs Belovs (arXiv). This work introduces and studies a generalization of (both) the hidden shift and 3-sum problems, and shows an \(\Omega(n^{1/3})\) lower bound on its quantum query complexity. The author also considers a property testing version of the problem, for which he proves a similar lower bound—interestingly, this polynomial lower bound is shown using the adversary method, evading the “property testing barrier” which states that (a restricted version of) this method cannot yield better than a constant-query lower bound.

And to conclude, a distribution testing paper:

Approximate Profile Maximum Likelihood, by Dmitri S. Pavlichin, Jiantao Jiao, and Tsachy Weissman (arXiv) This paper proposes an efficient (linear-time) algorithm to approximate the profile maximum likelihood of a sequence of i.i.d. samples from an unknown distribution, i.e. the probability distribution which, ignoring the labels of the samples and keeping only the collision counts, maximizes the likelihood of the sequence observed. This provides a candidate solution to an open problem suggested by Orlitsky in a FOCS’17 workshop (see also Open problem 84), and one which would have direct implications to tolerant testing and estimation of symmetric properties of distributions.

New Book: Introduction to Property Testing, by Oded Goldreich

A tad less than one year and a half ago, we announced on this blog a then-upcoming book on property testing, by Oded Goldreich. Now, just in time for the Holiday season, or for perusing on the beach if you live in the south hemisphere, the book is out!

Covering a broad swath of topics in property testing, Introduction to Property Testing, by Oded Goldreich (published by Cambridge University Press) is, quoting Amazon:

An extensive and authoritative introduction to property testing, the study of super-fast algorithms for the structural analysis of large quantities of data in order to determine global properties. This book can be used both as a reference book and a textbook, and includes numerous exercises.

See the book’s website for more resources and the (detailed) table of contents.

FOCS 2017: Workshop on Distribution Testing

As it may have transpired by now, FOCS 2017 (a.k.a. the 58th Annual Symposium on Foundations of Computer Science) is around the corner.

I am not going to go into the details of the talks on or related to property testing at the conference itself,\({}^{(\ast)}\) but instead focus on (one of) the workshops.

This FOCS will include 3 workshops,\({}^{(\dagger)}\) on Saturday 14th (gird your loins — they last the full day!):

  • Linear Sketching as a Tool for Everything
  • Frontiers in Distribution Testing
  • Hardness Escalation in Communication Complexity and Query Complexity

The second,  Frontiers in Distribution Testing, is specifically about recent advances, new questions, and novel directions in property testing of distributions. Co-organized by Gautam Kamath and me, its official objective is to “catch the community up in recent developments, and highlight some of the most interesting frontiers in distribution testing.”

It’ll feature a stellar lineup of speakers\({}^{(\ddagger)}\) (namely, Ilias Diakonikolas, Jiantao Jiao, Alon Orlitsky, Constantinos Daskalakis, Ryan O’Donnell, Ronitt Rubinfeld, and Tom Gur), as well as an open problem session — and, very possibly, pictures of eggs. The program is available, with all relevant details, on the workshop webpage.

If you are around next Saturday, consider attending — and submit your favorite open problems!

\({}^{(\ast)}\) Hint: look at Sessions 7A and 10B.
\({}^{(\dagger)}\) Thanks to the FOCS workshop chairs James Lee and Aleksander Mądry.
\({}^{(\ddagger)}\) A lineup of stellar speakers.

News for September 2017

Summer came and went, and Fall begins at full throttle with a (metric) ton of papers. Eight that we counted — if any was missed, please mention it in the comments!

Efficient Removal without Efficient Regularity, by Lior Gishboliner, Asaf Shapira (arXiv). Obtaining efficient removal lemmata for graphs pattern (such as triangle, to name the most famous), that is removal results with bounds on the number of copies of the pattern that is not mind-blowingly huge like a tower of \(\varepsilon\), is a classic and longstanding problem. This work makes significant progress for the last remaining case, i.e. for the pattern \(C_4\): providing bounds that are merely exponential in \(\varepsilon\).

Local decoding and testing of polynomials over grids, by Srikanth Srinivasan, Madhu Sudan (arXiv, ECCC). In this work, the authors study the local decodability and local testability of error-correcting codes corresponding to low-degree polynomials on the grid \(\{0,1\}^n\) (over a field \(\mathbb{F}\supseteq \{0,1\}\)). Obtaining both positive and negative results on these, a consequence of their results is a separation between local testability and local decodability for a natural family of codes.

Lower Bounds for Approximating Graph Parameters via Communication Complexity, by Talya Eden and Will Rosenbaum (arXiv). This paper establishes an analogue of the framework of Blais, Brody, and Matulef (2012), which enabled one to obtain property testing lower bounds by a reduction from communication complexity, for the setting of graph parameter estimation. The authors then leverage this technique to give new and simpler proofs of lower bounds for several such estimation tasks.

A Note on Property Testing Sum of Squares and Multivariate Polynomial Interpolation, by Aaron Potechin and Liu Yang (arXiv). The authors introduce and study the question of testing “sum-of-square-ness,” i.e. the property of a degree-\(d\)-polynomial being a sum of squares. Specifically, they show that one-sided sample-based testers cannot do much better than the trivial approach, that is that they require sample complexity \(n^{\Omega(d)}\) — while learning the polynomial can be done with \(n^{O(d)}\) samples.

Sharp Bounds for Generalized Uniformity Testing, by Ilias Diakonikolas, Daniel Kane, and Alistair Stewart (arXiv, ECCC). Remember the post from last month, which included a paper on “Generalized Uniformity Testing”? Well, this paper more or less settles the question, establishing tight bounds on the sample complexity of testing whether an (unknown) probability distribution over an (unknown) discrete domain is uniform on its support, or far from every uniform distribution. Specifically, the authors significantly strengthen the previous upper bound, by getting the right dependence on \(\varepsilon\) for all regimes; and complement it by a matching worst-case lower bound.

Sample-Optimal Identity Testing with High Probability, by Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price  (ECCC). Usually, in property testing we do not care too much about the error probability \(\delta\): if one can achieve \(1/3\), then simple repetition can bring it down to \(\delta\) at the mild price of a \(\log(1/\delta)\) factor in the query/sample complexity. Is that necessary, though? This paper shows that for uniformity and identity testing of distributions, the answer is “no”: for some regimes, this repetition trick is strictly suboptimal, as one can pay instead only a multiplicative \(\sqrt{\log(1/\delta)}\). And quite interestingly, this improvement is achieved with the simplest algorithm one can think of: by considering the empirical distribution obtained from the samples.

A Family of Dictatorship Tests with Perfect Completeness for 2-to-2 Label Cover, by Joshua Brakensiek and Venkatesan Guruswami (ECCC). While I tried to paraphrase the original abstract, but my attempts only succeeded in making it less clear; and, for fear of botching the job, decided to instead quote said abstract: “[the authors] give a family of dictatorship tests with perfect completeness [that is, one-sided] and low-soundness for 2-to-2 constraints. The associated 2-to-2 conjecture has been the basis of some previous inapproximability results with perfect completeness. However, evidence towards the conjecture in the form of integrality gaps even against weak semidefinite programs has been elusive. [Their] result provides some indication of the expressiveness and non-triviality of 2-to-2 constraints, given the close connections between dictatorship tests and satisfiability and approximability of CSPs.”

A polynomial bound for the arithmetic \(k\)-cycle removal lemma in vector spaces, by Jacob Fox, László Miklós Lovász, and Lisa Sauermann (arXiv). And back to removal lemmata! This work proves a generalization of Green’s arithmetic \(k\)-cycle removal lemma, which held for any \(k\geq 3\) and abelian group \(G\); however, the bounds in this lemma were quite large — i.e., tower-ype. Here, the authors establish an efficient lemma (with polynomial bounds) for the case of the group \(\mathbb{F}_p^n\) (where \(p\geq 2\) is any fixed prime, and \(k\geq 3\)).

Update (10/04): Finally, a paper we covered last summerThe Dictionary Testing Problem, by Siddharth Barman, Arnab Bhattacharyya, and Suprovat Ghoshal, went under signficant changes. Now titled Testing Sparsity over Known and Unknown Bases, it now includes (in addition to the previous results) a testing algorithm for sparsity with regard to a specific basis: given a matrix \(A \in \mathbb{R}^{d \times m}\) and unknown input vector \(y \in \mathbb{R}^d\), does \(y\) equal \(Ax\) for some \(k\)-sparse vector \(x\), or is it far from all such representations?

Update (10/5): we missed a recent paper of Benjamin Fish, Lev Reyzin, and Benjamin Rubinstein on Sublinear-Time Adaptive Data Analysis (arXiv). While not directly falling into the umbrella of property testing, this work considers sublinear-time algorithms for adaptive data analysis — similar in goal and spirit to property testing.

News for June 2017

Happy Fourth of July to everyone! Last month was prolific for property testing, as we counted no less than nine papers making their way online!*

Parameterized Property Testing of Functions, by Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, and Nithin Varma. (ECCC) (A preliminary version was published in ITCS’17). In property testing, here of functions, it is standard to parameterize the query complexity by the two “obvious” parameters — namely, the domain size parameter \(n\), and the proximity parameter \(\varepsilon\). While this often leads to a good understanding of the landscape, sometime a more fine-grained analysis may be useful, to capture the complexity of the question in terms of the setting-specific “right” parameters. This work initiates this line of inquiry for functions \(f\colon [n] \to \mathbb{R}\), showing examples where classic lower bounds can be circumvented by focusing on a more relevant parameters of the problem.

A Lower Bound for Nonadaptive, One-Sided Error Testing of Unateness of Boolean Functions over the Hypercube, by Roksana Baleshzar, Deeparnab Chakrabarty, Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, C. Seshadhri (arXiv). A Boolean function \(f\colon \{0,1\}^n\to \{0,1\}\) is said to be unate if it is monotone (either increasing or decreasing) in each coordinate; or, equivalently, if there exists \(\sigma\in\{0,1\}^n\) such that \(x\mapsto f(x\oplus\sigma)\) is monotone. As reported in the previous months, there has been a significant amount of work lately on testing unateness of functions (including real-valued). In this short paper, the authors improve a lower bound of Chen et al. (2017) on non-adaptive, one-sided unateness testing of Boolean functions, bringing it from \(\Omega(\frac{n}{\log^2 n})\) to \(\Omega(\frac{n}{\log n})\) — leaving only a gap of \(\log^2 n\) between known upper and lower bounds.

Adaptivity is exponentially powerful for testing monotonicity of halfspaces, by Xi Chen, Rocco A. Servedio, Li-Yang Tan, and Erik Waingarten (arXiv:). While monotonicity testing of general Boolean functions has been widely studied these past years, testing monotonicity of functions promised to belong to a specific class (here, halfspaces — linear threshold functions) is much less understood. In this work, the authors show that testing monotonicity of halfspaces can done adaptively with \(\mathrm{poly}(\log n, 1/\varepsilon)\) queries. Since the \(\Omega(n^{1/2})\) non-adaptive lower bound for general monotonicity testing relied on instances that were halfspaces, it applies here as well — showing an exponential gap between adaptive and non-adaptive testing in this case. “It’s as extreme as it gets!”

Testing Piecewise Functions, by Steve Hanneke and Liu Yang (arXiv). Generalizing the concept of “unions of intervals,” the authors consider here the following general question: fix a class of functions \(\mathcal{H}\subseteq \mathcal{Y}^{\mathcal{X}}\), where \(\mathcal{X},\mathcal{Y}\) are “nice” spaces (typically \(\mathcal{X}=\mathbb{R}\)) and parameter \(k\) and \(\varepsilon\). The goal is to test whether (i) one can partition \(\mathcal{X}\) into \(k\) pieces and find \(h_1,\dots,h_k\in\mathcal{H}\), such that \(f=h_i\) on the \(i\)-th piece; or (ii) \(f\) is \(\varepsilon\)-far (for a notion of distance depending on the measure on \(\mathcal{X}\)) from every such “\(k\)-piecewise function.”
They give upper bounds (as well as a lower bound) for the query complexity of this question in both the active testing and the sample-based testing settings (for the latter, considering the class \(\mathcal{H}\) of constant functions).

Sample-based high-dimensional convexity testing, by Xi Chen, Adam Freilich, Rocco A. Servedio, and Timothy Sun (arXiv). In the setting considered in this paper, there exists an unknown set \(S\subseteq \mathbb{R}^n\), and the algorithm is provided with samples \(\langle x, S(x)\rangle\), where \(x\) is distributed according to a standard Gaussian distribution and the label \(S(x)\) is \(1\) iff \(x\in S\). The goal is to test whether \(S\) is convex (or $\varepsilon$-far from convex under the standard Gaussian distribution). The authors then show near-matching upper and lower bounds: one-sided testing of convexity in this testing has sample complexity \(2^{\tilde{\Theta}(n)}\), while two-sided testing has sample complexity \(2^{\tilde{\Theta}(\sqrt{n})}\).

Distributed Detection of Cycles, by Pierre Fraigniaud and Dennis Olivetti (arXiv); and Distributed Subgraph Detection, by Pierre Fraigniaud, Pedro Montealegre, Dennis Olivetti, Ivan Rapaport, and Ioan Todinca (arXiv). Improving the understanding of distributed property testing of graphs in the CONGEST model (already touched upon last month), these two works tackle the question of distributed testing of subgraph freeness. The first considers testing cycle-freeness, showing that \(C_k\)-freeness can be tested in the CONGEST model in a constant number of rounds (and with one-sided error). The second considers the more general question testing \(H\)-freeness of graphs, for a larger class of subgraphs that includes cycles. Specifically, they show that for any pattern \(H\) consisting of a couple of nodes, connected in an arbitrary manner to a tree, \(H\)-freeness can be tested (with one-sided error) in \(O(1)\) rounds.

Hypothesis Testing For Densities and High-Dimensional Multinomials: Sharp Local Minimax Rates, by Sivaraman Balakrishnan and Larry Wasserman (arXiv). In distribution testing, the question of testing identity to a known discrete distribution \(p\) is almost fully settled, with near tight “instance-optimal” bounds. But what about the related question of testing identity to \(p\) (not necessarily discrete) under the additional promise that both \(p\) and the unknown distribution \(q\) both belong to a specific class \(C\) of distributions, instead of arbitrary? This is the problem this work tackles, for the specific case of \(C\) being (i) the class of multinomial distributions, and (ii) the class of (continuous) Lipschitz distributions. In both cases, the authors establish some near-tight bounds, both of an “instance-optimal” flavor.

On Axis-Parallel Tests for Tensor Product Codes, by Alessandro Chiesa, Peter Manohar, and Igor Shinkar (ECCC). Low-degree testing has its roots in the origins of property testing, and its connection to the PCP theorem; and has been a fruitful and rich line of work since. Here, the authors analyze a type of low-degree test introduced in 2006 by Ben-Sasson and Sudan, which constrains the tester to only considers restrictions along axis-parallel hyperplanes; and establish new results on these class of (weaker but more general) testers.

* As usual, if you find a paper we forgot or misrepresented, please signal it in the comments below.

News for March 2017

March was a relatively good month for property testing, with 3 papers — including a foray in differential privacy.

Optimal Unateness Testers for Real-Valued Functions: Adaptivity Helps, by Roksana Baleshzar, Deeparnab Chakrabarty, Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, and C. Seshadhri (arXiv). Testing unateness (that is, whether a function is monotone after a rotation of the hypercube) has recently received significant attention. Understanding the power of adaptivity in property testing, on the other hand, has been a long-standing question, which has sparked much interest over the past decade. In this paper, the authors obtain optimal results for testing unateness of real-valued functions \(f\colon \{0,1\}^d\to \mathbb{R}\) and \(f\colon [n]^d\to \mathbb{R}\) (up to the dependence on the proximty parameter \(\varepsilon\)), and show that — unlike what happens for monotonicity testing of real-valued functions — adaptivity helps. Namely, adaptive testing of unatess of functions \(f\colon \{0,1\}^d\to \mathbb{R}\) has query complexity \(O(d/\varepsilon)\), but non-adaptive testers require \(\Omega(d\log d)\) queries.
This work is a merged, and significantly improved version of two papers we covered a few months back, by the same authors.

Switching from function to distribution testing, last month saw two independent samples appear online:

Near-Optimal Closeness Testing of Discrete Histogram Distributions, by Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin (arXiv). Closeness testing of discrete distributions is the question of testing, given sample access to two unknown distributions \(p,q\) over a known discrete domain \([n] \stackrel{\rm def}{=} \{1,\dots,n\}\), whether \(p=q\) or \(d_{\rm TV}(p,q)> \varepsilon\) (are the two distributions the same, or differ significantly in total variation distance?). This quite fundamental question is by now fully understood in the general case, where \(p,q\) are allowed to be arbitrary distributions over ; but what if one has some additional knowledge about them? Specifically, what if we were guaranteed that both distributions have some “nice structure” — e.g., that they are piecewise constant with \(k\) pieces (\(k\)-histograms)? Leveraging machinery the authors developed in a series of previous work (for the continuous case), the authors show that in this case, the sample complexity of “testing closeness under structural assumptions” is quite different than in the general case; and that the resulting (near tight) bound is a rather intricate tradeoff between the three parameters \(n,k,\varepsilon\).

Priv’IT: Private and Sample Efficient Identity Testing, by Bryan Cai, Constantinos Daskalakis, and Gautam Kamath (arXiv). Distribution testing is a vibrant field; differential privacy (DP) is an incredibly active and topical area. What about differentially private distribution testing — a.k.a. testing the data without learning too much about any single sample? In this paper, the authors address the task of performing identity testing of distributions (“given samples from an unknown arbitrary distribution, decide whether it matches a fixed known probability distribution/model”) in the DP setting, focusing on getting the best of both worlds: how to guarantee privacy without sacrificing efficiency. Not only do they obtain asymptotically near-optimal bounds in both the testing and differential privacy parameters — they also run some experiments to validate their approach empirically. (Plus, the title is rather cute.)

Usual disclaimer: if we forgot or misrepresented your paper, please signal it in the comments — we’ll address it as quickly as we can.

News for December 2016

December was indeed a merry month for property testing, with seven new papers appearing online.* Let’s hope the trend continues!

Cube vs. Cube Low Degree Test, by Amey Bhangale, Irit Dinur, and Inbal Livni Navon (ECCC). This work provides a new and improved analysis of the “low-degree test” of Raz and Safra, tightening the dependence on the alphabet size of the soundness parameter. Specifically, the goal is as follows: given query access to the encoding of a function \(f\colon \mathbb{F}^m\to \mathbb{F}\) as a “cube table,” decide whether \(f\) is a low-degree polynomial (i.e., of degree at most \(d\)). With direct applications to PCP theorems, this question has a long history; here, the authors focus on a very simple and natural test, introduced by Raz–Safra and Arora–Sudan. In particular, they improve the soundness guarantee, which previously required the error to be at least \(\textrm{poly}(d)/\mathbb{F}^{1/8}\), to obtain a dependence on the field size which is only \(\mathbb{F}^{-1/2}\).

Robust Multiplication-based Tests for Reed-Muller Codes, by Prahladh Harsha and Srikanth Srinivasan (arXiv). Given a function \(f\colon \mathbb{F}_q^n\to\mathbb{F}_q\) purported to be a degree-\(d\) polynomial, deciding whether this is indeed the case is a question with applications to hardness of approximation and — as the title strongly hints— to testing of Reed—Muller codes. Here, the authors generalize and improve on a test originally due to Dinur and Guruswami, improving on the soundness: that is, they show a robust version of the soundness guarantee of this multiplication test, answering a question left open by Dinur and Guruswami.

A Note on Testing Intersection of Convex Sets in Sublinear Time, by Israela Solomon (arXiv). In this note, the author addresses the following testing question: given \(n\) convex sets in \(\mathbb{R}^d\), distinguish between the case where (i) the intersection is non-empty, and (ii) even after removing any \(\varepsilon n\) sets, the intersection is still empty. Through the use of a generalization of Helly’s Theorem due Katchalski and Liu (1979), the author then provides and analyzes an algorithm for this question, with query complexity \(O(\log 1/(d\varepsilon))\).

A Characterization of Constant-Sample Testable Properties, by Eric Blais and Yuichi Yoshida (arXiv). A lot of work in property testing has been to understand which properties can be locally tested, i.e. admit testers with constant query complexity. Here, the authors tackle — and answer — the variant of this question in the sample-based testing model, that is restricting the ability of the algorithm to only obtain the value of the function on independently and uniformly distributed locations. Namely, they provide a full characterization of the properties of Boolean-valued function which are testable with constant sample complexity, resolving the natural question of whether most properties are easily tested from samples only. As it turns out, only those properties which are (essentially) \(O(1)\)-part-symmetric have this nice testability — where a property \(\mathcal{P}\) is \(k\)-part-symmetric if one can divide the input variables in \(k\) blocks, such that membership to \(\mathcal{P}\) is invariant by permutations inside each block.

The last three are all works on distribution testing, and all concerned with the high-dimensional case. Indeed, most of the distribution testing literature so far has been focusing in probability distributions over arbitrary discrete sets or the one-dimension ordered line; however, when trying to use these results in the (arbitrary) high-dimensional case on is subjected to the curse of dimensionality. Can one leverage (presumed) structure to reduce the cost, and obtain computationally and sample-efficient testers?

Testing Ising Models, by Costis Daskalakis, Nishanth Dikkala, and Gautam Kamath (arXiv). In this paper, the authors take on the above question in the context of Markov Random Fields, and more specifically in the Ising model. Modeling the (unknown) high-dimensional distributions as Ising models with some promise on the parameters, they tackle the two questions of identity and independence testing; respectively, “is the unknown Ising model equal to a fixed, known reference distribution?” and “is the high-dimensional, a priori complex distribution a product distribution?”. They obtain both testing algorithms and lower bounds for these two problems, where the soundness guarantee sought is in terms of the symmetrized Kullback—Leibler divergence (a notion of distance more stringent that the statistical distance).

Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing, by Costis Daskalakis and Qinxuan Pan (arXiv). Another natural model to study structured high-dimensional distribution is that of Bayesian networks, that is directed graphs providing a succinct description of the dependencies between coordinates. This paper studies the closeness testing problem (testing if two unknown distributions are equal or far, with regard to the usual statistical distance) for Bayesian networks, parameterized by the dimension, alphabet size, and (promise on the) maximum in-degree of the unknown Bayes net. At its core is a new inequality relating the (squared) Hellinger distance of two Bayesian networks to the sum of the (squared) Hellinger distances between their marginals.

Testing Bayesian Networks, by Clément Canonne, Ilias Diakonikolas, Daniel Kane, and Alistair Stewart (arXiv). Tackling the testing of Bayesian networks as well, this second paper considers two of the most standard testing questions — identity (one-unknown) and closeness (two-unknown) testing — under various assumptions, in order to pinpoint the exact sample complexity in each case. Specifically the goal is to see when (and under which natural restrictions) does testing become easier than learning for Bayesian networks, focusing on the dimension and maximum in-degree as parameters.

* As usual, if we forgot one or you find imprecisions in our review of a paper, please let us now in the comments below.