Author Archives: Clement Canonne

News for December 2018

Happy near year, and best wishes to those close and \(\varepsilon\)-far! December concluded the year with 4 new preprints, spanning quite a lot of the property testing landscape:

Testing Stability Properties in Graphical Hedonic Games, by Hendrik Fichtenberger and Anja Rey (arXiv). The authors of this paper consider the problem of deciding whether a given hedonic game possesses some “coalition stability” in a property testing framework. Namely, recall that a hedonic game is a game where players (nodes) form coalitions (subsets of nodes) based on their individual preferences and local information about the considered coalition, thus resulting in a partition of the original graph.
Several notions exist to evaluate how good such a partition is, based on how “stable” the given coalitions are. This work focuses on hedonic games corresponding to bounded-degree graphs, introducing and studying the property testing question of deciding (for several such notions of stability) whether a given game admits a stable coalition structure, or is far from admitting such a partition.

Spectral methods for testing cluster structure of graphs, by Sandeep Silwal and Jonathan Tidor (arXiv). Staying among bounded-degree graphs, we turn to testing clusterability of graphs, the focus of this paper. Given an \(n\)-node graph \(G\) of degree at most \(d\) and parameters \(k, \phi\), say that \(G\) is \((k, \phi)\)-clusterable if it can be partitioned in \(k\) parts of inner conductance at least \(\phi\).
Analyzing properties of a random walk on \(G\), this work gives a bicriterion guarantee (\((k, \phi)\)-clusterable vs. \(\varepsilon\)-far from \((k, \phi^\ast)\)-clusterable, where \(\phi^\ast \approx \varepsilon^2\phi^2\)) for the case \(k=2\), improving on previous work by Czumaj, Peng, and Sohler’15.

We then switch from graphs to probability distributions with our third paper:

Inference under Information Constraints I: Lower Bounds from Chi-Square Contraction, by Jayadev Acharya, Clément Canonne, and Himanshu Tyagi (arXiv). (Disclaimer: I’m one of the authors.) In this paper, the first of an announced series of three, the authors generalize the settings of two previous works we covered here and there to consider the general question of distribution testing and learning when the \(n\) i.i.d. samples are distributed among \(n\) players, which each can only communicate their sample to the central algorithm by respecting some pre-specified local information constraint (e.g., privacy, or noise, or communication budget). This paper develops a general lower bound framework to study such questions, with a systematic focus on the power of public vs. private randomness between the \(n\) parties, and instantiate it to obtain tight bounds in the aforementioned locally private and communication-limited settings. (Spoiler: public randomness strictly helps, but not always.)

Finally, after games, graphs, and distributions, our fourth paper of the month concerns testing of functions:

Partial Function Extension with Applications to Learning and Property Testing, by Umang Bhaskar and Gunjan Kumar (arXiv). This work focuses on a problem quite related to property testing, that of partial function extension: given as input \(n\) pairs point/value from a purported function on a domain \(X\) of size \(|X| > n\), one is tasked with deciding whether there does exist (resp., with finding) a function \(f\) on \(X\) consistent with these \(n\) values which further satisfies a specific property, such as linearity or convexity. This is indeed very reminiscent of property testing, where one gets to query these \(n\) points and must decide (approximate) consistency with such a well-behaved function. Here, the authors study the computational hardness of this partial function extension problem, specifically for properties such as subadditivity and XOS (a sub-property of subadditivity); and as corollaries obtain new property testers for the classes of subadditive and XOS functions.

As usual, if you know of some work we missed from last December, let us know in the comments!

News for September 2018

Last month was slower than usual, with only one property testing paper. We did miss a few papers, so hopefully this list is now up to date.

Property testing and expansion in cubical complexes, by David Garber, Uzi Vishne (arXiv). Consider the question of testing if an arbitrary function \(f\colon V\times V \to\{-1,1\}\) is of the form \(f(x,y) = h(x)h(y)\) for some \(h\colon V\to\{-1,1\}\). An intuitive one-sided test, shown to work by Lubotzky and Kaufman (2014), is to pick uniformly random \(x,y,z\in V\) and check that \(f(x,y)f(y,z)f(z,x)=1\). This paper considers the high-dimensional generalization of testing the property that a function\(f\colon V\times V \times V\times V \to\{-1,1\}\) is of the form \(f(w,x,y,z) = \alpha\cdot h(w,x)h(y,x)h(y,z) h(w,z)\), for some \(h\colon V\times V\to\{-1,1\}\) and sign \(\alpha\in\{-1,1\}\). The authors derive necessary and sufficient conditions for testability of this property, by formulating it in the language of incidence geometry and exploiting this connection.

Local Computation Algorithms for the Lovász Local Lemma, by Dimitris Achlioptas, Themis Gouleakis and Fotis Iliopoulos (arXiv). There has been significant work in the past decade on constructive versions of the Lovász Local Lemma (LLL), since the seminal work of Moser-Tardos. This paper designs news Local Computation Algorithms (LCAs) for the LLL. It’s best to consider the problem of \(k\)-SAT. Consider a \(k\)-CNF \(\phi\) with \(n\) variables, \(m\) clauses, where every variable is in at most \(d\) clauses. By the LLL, if \(d \leq 2^k/ke\), then \(\phi\) is satisfiable. An LCA would compute any bit of a satisfying assignment, by making sublinear queries into \(\phi\). This was first studied by Rubinfeld-Tamir-Vardi-Xie. Their LCA would make polylogarithmically many queries, but requires a stronger condition that what LLL achieves. This paper gives the first sublinear LCA with precisely the LLL conditions, though the number of queries is \(n^\beta\) (for \(\beta \lt 1\)). The main result is an LCA for an abstract LLL formulation, that also leads to LCAs for graph coloring. Roughly speaking, for a graph with maximum degree \(\Delta\) where all neighborhoods are sufficiently far from cliques, the LLL shows that the chromatic number bound of \(\Delta + 1\) can be beaten. This result gives an LCA for graph coloring under these LLL conditions.

Sublinear Time Low-Rank Approximation of Distance Matrices, by Ainesh Bakshi and David P. Woodruff (arXiv). Consider two sets of points \(P\) and \(Q\) in a metric space, with \(m\) and \(n\) points respectively. The \(m \times n\) distance matrix \(A\) contains all pairwise distances between them. This paper studies approximating \(A\) using a low rank representation, without reading all the entries in \(A\). The main result is as follows. For rank parameter \(k\), let \(A_k\) be the closest (by Frobenius norm) rank-\(k\)-approximation to \(A\). There is a \(O(m^{1+\gamma} + n^{1+\gamma}poly(k\epsilon^{-1}))\) (for arbitrary \(\gamma > 0\)) algorithm that outputs a rank \(k\)-matrix \(B\) with the following property: \(\|A-B\|^2_F \leq \|A-A_k\|^2_F + \epsilon \|A\|^2_F\). Interestingly, there is a lower bound showing that a \(o(mn)\) algorithm cannot get a multiplicative approximation. One technical ingredient is a method to sample column norms of \(A\), under an approximate triangle inequality constraint. This allows one to compute smaller matrices that approximate \(A\), on which one can directly compute an approximate rank-\(k\) decomposition.

On Solving Linear Systems in Sublinear Time, by Alexandr Andoni, Robert Krauthgamer, Yosef Pogrow (arXiv). Solving Laplacian linear systems is an immensely deep area, with lots of exciting recent work. This paper studies sublinear algorithms for such problems. Consider a Laplacian matrix \(L\) (think of \(I – A/d\), for adjacency matrix \(A\) of a \(d\)-regular graph). The aim is to solve \(Lx = b\), for \(x, b \in {\mathbb R}^n\). Let the solution be \(x^*\). The main result shows that one can approximate any entry in sublinear time. Specifically, for any coordinate \(i\), one can output an approximate \(\hat{x_i}\) such that \(|\hat{x_i} – x^*_i| \leq \|x^*\|_\infty\). The running time is essentially \(d\epsilon^{-2}\kappa^3\), where \(\kappa\) is a bound on the condition number of \(L\). There are generalizations for Symmetrically Diagonally Dominant (SDD) matrices, a generalization of Laplacians. There is an \(\Omega(n^{1/d^2})\) lower bound for solving general PSD systems, and a lower bound showing that \(\Omega(\kappa^2)\) queries into \(b\) are necessary.

News for June 2018

The summer gets off to a flying start, with three property testing papers, spanning differential privacy, distribution testing, and juntas in Gaussian space!

On closeness to \(k\)-wise uniformity, by Ryan O’Donnell and Yu Zhao (arXiv)
In this paper, the authors consider the following structural question about probability distributions over the Boolean hypercube \(\{-1,1\}^n\): ” what is the relation between total variation distance \(\delta\) to \(k\)-wise independence, and bound \(\varepsilon\) on the Fourier coefficients of the distribution on degrees up to \(k\)?”

While this question might seem a bit esoteric at first glance, it has direct and natural applications to derandomization, and of course to distribution testing (namely, to test \(k\)-wise independence and its generalization, \((\varepsilon, k)\)-wise independence of distributions over the hypercube).

The main contribution here is to improve (by a \((\log n)^{O(k)}\) factor) the bounds on \(\delta (n,k,\varepsilon)\) over the previous work by Alon et al. [AAK+07], making them either tight (for \(k\) even) or near-tight. To do so, the authors introduce a new hammer to the game, using linear programming duality in the proof of both their upper and lower bounds.

Property Testing for Differential Privacy, by Anna Gilbert and Audra McMillan (arXiv)
Differential privacy, as introduced by Dwork et al., needs no introduction. Property testing, especially on this website, needs even less. What about a combination of the two? Namely, given black-box access to an algorithm claiming to perform a differentially private computation, how to test whether this is indeed the case?

Introducing and considering this quite natural question for the first time [01/31/2019: see erratum below], this work shows, roughly speaking, that testing differential privacy is hard. Specifically, they show that for many notions of differential privacy (pure, approximate, and their distributional counterparts), testing is either impossible or possible but not with a sublinear number of queries (even when the tester is provided with side information about the black-box). In other terms, as the authors put it: trusting the privacy of an algorithm “requires compromise by either the verifier or algorithm owner” (and, in the latter case, even then it’s not a simple matter).

Is your data low-dimensional?, by Anindya De, Elchanan Mossel, and Joe Neeman (arXiv)
(Well, is it?) To state it upfront, I am biased here, as it is a problem I was very eager to see investigated to begin with. To recap, the question is as follows: “given query access to some unknown Boolean-valued function \(f\colon \mathbb{R}^n \to \{-1,1\}\) over the high-dimensional space \(\mathbb{R}^n\) endowed with the Gaussian measure, how can one check whether \(f\) only depends on “few” (i.e., \(k \ll n\)) variables?”

This is the continuous, Gaussian version of the (quite famous) junta testing problem, which has gathered significant attention over the past years (the Gaussian version has, to the best of my knowledge, never been investigated). Now, the above formulation has a major flaw: specifically, it is uninteresting. In Gaussian space*, who cares about the particular basis I expressed my input vector in? So a more relevant question, and that that the authors tackle, is the more robust and natural one: “given query access to some unknown Boolean-valued function \(f\colon \mathbb{R}^n \to \{-1,1\}\) over the high-dimensional space \(\mathbb{R}^n\) endowed with the Gaussian measure, how can one check whether \(f\) only depends on a low-dimensional linear combination of the variables?” Or, put differently, does all the relevant information for \(f\) live in a low-dimensional subspace?

De, Mossel, and Neeman show how can do this, non-adaptively, with a query complexity independent of the dimension \(n\) (hurray!), but instead polynomial in \(k\), the distance parameter \(\varepsilon\), and the surface area \(s\) of \(f\). And since this last parameter may seem quite arbitrary, they also proceed to show that a polynomial dependence in this \(s\) is indeed required.

*”In Gaussian space, no one can hear you change basis?”


Erratum (01/31/2019): It was brought to our attention that our overview of “Property Testing for Differential Privacy” was overlooking a key part of the literature; specifically, a work of Dixit et al. (TCC 2013)  which introduces this very question. From the abstract:

How does one ensure that [those third-parties] have implemented their algorithms in a way which meet the specifications of the privacy requirements? […] In this work, we propose a new approach to the above problem which we call privacy testing. We do this by formulating the above problem in the well-studied framework of property testing.

News for March 2018

March has been a relatively slow month for property testing, with 3 works appearing online. (If you notice we missed some, please leave a comment pointing it out)

Edge correlations in random regular hypergraphs and applications to subgraph testing, by Alberto Espuny Díaz, Felix Joos, Daniela Kühn, and Deryk Osthus (arXiv). While testing subgraph-freness in the dense graph model is now well-understood, after a series of works culminating in a complete characterization of the testing problems which admit constant-query testers, the corresponding question for hypergraphs is far from resolved. In this paper, the authors develop new techniques for the study of study of random regular hypergraphs, which imply new testing results for subhypergraph-freeness testing, improving on the state-of-the-art for some parameter regimes (e.g., when the input graph satisfies some average-degree condition).

Back from hypergraphs to graphs, we also have:
The Subgraph Testing Model, by Oded Goldreich and Dana Ron (ECCC). Here, the authors introduce a new model for property testing of graphs, where the goal is to test if an unknown subgraph \(F\) of an explicitly given graph \(G=(V,E)\) satisfies the desired property. The testing algorithm is provided access to \(F\) via membership queries, i.e., through query access to the indicator function \(\mathbf{1}_F\colon E \to \{0,1\}\). (In some very hazy sense, this is reminiscent of the active learning or testing frameworks, where one gets more or less free access to unlabeled data but pays to see their label.) As a sample of the results obtained, the paper establishes that this new model and the bounded-degree graph model are incomparable: there exist properties easier to test in one model than the other, and vice-versa — and some properties equally easy to test in both.

And finally, to drive home the point that “models matter a lot,” we have our third paper:
Every set in P is strongly testable under a suitable encoding, by Irit Dinur, Oded Goldreich, and Tom Gur (ECCC). It is known that the choice of representation of the objects has a large impact in property testing: for instance, the complexity of testing a given property can change drastically between the dense and bounded-degree graph models. This work provides another example of such a strong dependence on the representation: while membership to some sets in \(P\) is known to be hard to test, the authors here prove that, for every set \(S\in P\), there exists a (polynomial-time, invertible) encoding \(E_S\colon \{0,1\}^\ast\to \{0,1\}^\ast\) such that testing membership to \(S\) under this encoding is easy. (They actually show even stronger a statement: namely, that under this encoding the set admits a “proximity-oblivious tester,” that is a constant-query testing algorithm which rejects with probability function of the distance to \(S\).)

Finally, on a non-property testing note: Edith Cohen, Vitaly Feldman, Omer Reingold, and Ronitt Rubinfeld recently wrote a pledge for inclusiveness in the TCS community, available here: https://www.gopetition.com/petitions/a-pledge-for-inclusiveness-in-toc.html
If you haven’t seen it already, we encourage you to read it.

Update: Fixed a mistake in the overview of the second paper; as pointed out by Oded in the comments, the main comparison was between the new model and the bounded-degree graph model, not the dense graph one.

News for December 2017

December 2017 concluded the year in style, with seven property testing papers spanning quite the range. Let’s hope 2018 will keep up on that trend! (And, of course, if we missed any, please point it out in the comments.The blame would be on our selves from the past year.)

We begin with graphs:

High Dimensional Expanders, by Alexander Lubotzky (arXiv). This paper surveys the recent developments in studying high dimensional expander graphs, a recent generalization of expanders which has become quite active in the past years and has intimate connections to property testing.

Generalized Turán problems for even cycles, by Dániel Gerbner, Ervin Győri, Abhishek Methuku, and Máté Vizer (arXiv).
A Generalized Turán Problem and its Applications, by Lior Gishboliner and Asaf Shapira (arXiv).
In these two independent works, the authors study questions of a following flavor: two subgraphs (patterns) \(H,H’\), what is the maximum number of copies of \(H\) which can exist in a graph \(G\) promised to be \(H’\)-free? They consider the case where the said patterns are cycles on \(\ell,k\) vertices respectively, and obtain asymptotic bounds on the above quantity (the two papers obtain somewhat incomparable bounds, and the first focuses on the case where both \(\ell,k\) are even). These estimates, in turn, have applications to graph removal lemmata, as discussed in the second work (Section 1.2): specifically, it implies the existence of a removal lemma with a
tight super-polynomial bound, a question which was previously open.

Approximating the Spectrum of a Graph, by David Cohen-Steiner, Weihao Kong, Christian Sohler, and Gregory Valiant (arXiv). The authors obtain constant-time and query algorithm for the task of approximating (in \(\ell_1\) norm) the spectrum of a graph \(G\), i.e. the eigenvalues of its Laplacian, given random query access to the nodes of \(G\) and to the neighbors of any given node. They also study the applications of this result to property testing in the bounded-degree model, showing that a large class of spectral properties of high-girth graphs is testable.

Then, we go quantum:

Schur-Weyl Duality for the Clifford Group with Applications: Property Testing, a Robust Hudson Theorem, and de Finetti Representations, by David Gross, Sepehr Nezami, and Michael Walter (arXiv).
Introducing and studying a duality theory for the Clifford group, the authors are able (among other results) to resolve an open question in quantum property testing, establishing a constant-query tester (indeed, making 6 queries) for testing whether an unknown quantum state is a stabilizer state. The previous best upper bound was linear in the number of qubits, as it proceeded by learning the state (“testing by learning”).

Quantum Lower Bound for a Tripartite Version of the Hidden Shift Problem, by Aleksandrs Belovs (arXiv). This work introduces and studies a generalization of (both) the hidden shift and 3-sum problems, and shows an \(\Omega(n^{1/3})\) lower bound on its quantum query complexity. The author also considers a property testing version of the problem, for which he proves a similar lower bound—interestingly, this polynomial lower bound is shown using the adversary method, evading the “property testing barrier” which states that (a restricted version of) this method cannot yield better than a constant-query lower bound.

And to conclude, a distribution testing paper:

Approximate Profile Maximum Likelihood, by Dmitri S. Pavlichin, Jiantao Jiao, and Tsachy Weissman (arXiv) This paper proposes an efficient (linear-time) algorithm to approximate the profile maximum likelihood of a sequence of i.i.d. samples from an unknown distribution, i.e. the probability distribution which, ignoring the labels of the samples and keeping only the collision counts, maximizes the likelihood of the sequence observed. This provides a candidate solution to an open problem suggested by Orlitsky in a FOCS’17 workshop (see also Open problem 84), and one which would have direct implications to tolerant testing and estimation of symmetric properties of distributions.

New Book: Introduction to Property Testing, by Oded Goldreich

A tad less than one year and a half ago, we announced on this blog a then-upcoming book on property testing, by Oded Goldreich. Now, just in time for the Holiday season, or for perusing on the beach if you live in the south hemisphere, the book is out!

Covering a broad swath of topics in property testing, Introduction to Property Testing, by Oded Goldreich (published by Cambridge University Press) is, quoting Amazon:

An extensive and authoritative introduction to property testing, the study of super-fast algorithms for the structural analysis of large quantities of data in order to determine global properties. This book can be used both as a reference book and a textbook, and includes numerous exercises.

See the book’s website for more resources and the (detailed) table of contents.

FOCS 2017: Workshop on Distribution Testing

As it may have transpired by now, FOCS 2017 (a.k.a. the 58th Annual Symposium on Foundations of Computer Science) is around the corner.

I am not going to go into the details of the talks on or related to property testing at the conference itself,\({}^{(\ast)}\) but instead focus on (one of) the workshops.

This FOCS will include 3 workshops,\({}^{(\dagger)}\) on Saturday 14th (gird your loins — they last the full day!):

  • Linear Sketching as a Tool for Everything
  • Frontiers in Distribution Testing
  • Hardness Escalation in Communication Complexity and Query Complexity

The second,  Frontiers in Distribution Testing, is specifically about recent advances, new questions, and novel directions in property testing of distributions. Co-organized by Gautam Kamath and me, its official objective is to “catch the community up in recent developments, and highlight some of the most interesting frontiers in distribution testing.”

It’ll feature a stellar lineup of speakers\({}^{(\ddagger)}\) (namely, Ilias Diakonikolas, Jiantao Jiao, Alon Orlitsky, Constantinos Daskalakis, Ryan O’Donnell, Ronitt Rubinfeld, and Tom Gur), as well as an open problem session — and, very possibly, pictures of eggs. The program is available, with all relevant details, on the workshop webpage.

If you are around next Saturday, consider attending — and submit your favorite open problems!

\({}^{(\ast)}\) Hint: look at Sessions 7A and 10B.
\({}^{(\dagger)}\) Thanks to the FOCS workshop chairs James Lee and Aleksander Mądry.
\({}^{(\ddagger)}\) A lineup of stellar speakers.

News for September 2017

Summer came and went, and Fall begins at full throttle with a (metric) ton of papers. Eight that we counted — if any was missed, please mention it in the comments!

Efficient Removal without Efficient Regularity, by Lior Gishboliner, Asaf Shapira (arXiv). Obtaining efficient removal lemmata for graphs pattern (such as triangle, to name the most famous), that is removal results with bounds on the number of copies of the pattern that is not mind-blowingly huge like a tower of \(\varepsilon\), is a classic and longstanding problem. This work makes significant progress for the last remaining case, i.e. for the pattern \(C_4\): providing bounds that are merely exponential in \(\varepsilon\).

Local decoding and testing of polynomials over grids, by Srikanth Srinivasan, Madhu Sudan (arXiv, ECCC). In this work, the authors study the local decodability and local testability of error-correcting codes corresponding to low-degree polynomials on the grid \(\{0,1\}^n\) (over a field \(\mathbb{F}\supseteq \{0,1\}\)). Obtaining both positive and negative results on these, a consequence of their results is a separation between local testability and local decodability for a natural family of codes.

Lower Bounds for Approximating Graph Parameters via Communication Complexity, by Talya Eden and Will Rosenbaum (arXiv). This paper establishes an analogue of the framework of Blais, Brody, and Matulef (2012), which enabled one to obtain property testing lower bounds by a reduction from communication complexity, for the setting of graph parameter estimation. The authors then leverage this technique to give new and simpler proofs of lower bounds for several such estimation tasks.

A Note on Property Testing Sum of Squares and Multivariate Polynomial Interpolation, by Aaron Potechin and Liu Yang (arXiv). The authors introduce and study the question of testing “sum-of-square-ness,” i.e. the property of a degree-\(d\)-polynomial being a sum of squares. Specifically, they show that one-sided sample-based testers cannot do much better than the trivial approach, that is that they require sample complexity \(n^{\Omega(d)}\) — while learning the polynomial can be done with \(n^{O(d)}\) samples.

Sharp Bounds for Generalized Uniformity Testing, by Ilias Diakonikolas, Daniel Kane, and Alistair Stewart (arXiv, ECCC). Remember the post from last month, which included a paper on “Generalized Uniformity Testing”? Well, this paper more or less settles the question, establishing tight bounds on the sample complexity of testing whether an (unknown) probability distribution over an (unknown) discrete domain is uniform on its support, or far from every uniform distribution. Specifically, the authors significantly strengthen the previous upper bound, by getting the right dependence on \(\varepsilon\) for all regimes; and complement it by a matching worst-case lower bound.

Sample-Optimal Identity Testing with High Probability, by Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price  (ECCC). Usually, in property testing we do not care too much about the error probability \(\delta\): if one can achieve \(1/3\), then simple repetition can bring it down to \(\delta\) at the mild price of a \(\log(1/\delta)\) factor in the query/sample complexity. Is that necessary, though? This paper shows that for uniformity and identity testing of distributions, the answer is “no”: for some regimes, this repetition trick is strictly suboptimal, as one can pay instead only a multiplicative \(\sqrt{\log(1/\delta)}\). And quite interestingly, this improvement is achieved with the simplest algorithm one can think of: by considering the empirical distribution obtained from the samples.

A Family of Dictatorship Tests with Perfect Completeness for 2-to-2 Label Cover, by Joshua Brakensiek and Venkatesan Guruswami (ECCC). While I tried to paraphrase the original abstract, but my attempts only succeeded in making it less clear; and, for fear of botching the job, decided to instead quote said abstract: “[the authors] give a family of dictatorship tests with perfect completeness [that is, one-sided] and low-soundness for 2-to-2 constraints. The associated 2-to-2 conjecture has been the basis of some previous inapproximability results with perfect completeness. However, evidence towards the conjecture in the form of integrality gaps even against weak semidefinite programs has been elusive. [Their] result provides some indication of the expressiveness and non-triviality of 2-to-2 constraints, given the close connections between dictatorship tests and satisfiability and approximability of CSPs.”

A polynomial bound for the arithmetic \(k\)-cycle removal lemma in vector spaces, by Jacob Fox, László Miklós Lovász, and Lisa Sauermann (arXiv). And back to removal lemmata! This work proves a generalization of Green’s arithmetic \(k\)-cycle removal lemma, which held for any \(k\geq 3\) and abelian group \(G\); however, the bounds in this lemma were quite large — i.e., tower-ype. Here, the authors establish an efficient lemma (with polynomial bounds) for the case of the group \(\mathbb{F}_p^n\) (where \(p\geq 2\) is any fixed prime, and \(k\geq 3\)).

Update (10/04): Finally, a paper we covered last summerThe Dictionary Testing Problem, by Siddharth Barman, Arnab Bhattacharyya, and Suprovat Ghoshal, went under signficant changes. Now titled Testing Sparsity over Known and Unknown Bases, it now includes (in addition to the previous results) a testing algorithm for sparsity with regard to a specific basis: given a matrix \(A \in \mathbb{R}^{d \times m}\) and unknown input vector \(y \in \mathbb{R}^d\), does \(y\) equal \(Ax\) for some \(k\)-sparse vector \(x\), or is it far from all such representations?

Update (10/5): we missed a recent paper of Benjamin Fish, Lev Reyzin, and Benjamin Rubinstein on Sublinear-Time Adaptive Data Analysis (arXiv). While not directly falling into the umbrella of property testing, this work considers sublinear-time algorithms for adaptive data analysis — similar in goal and spirit to property testing.

News for June 2017

Happy Fourth of July to everyone! Last month was prolific for property testing, as we counted no less than nine papers making their way online!*

Parameterized Property Testing of Functions, by Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, and Nithin Varma. (ECCC) (A preliminary version was published in ITCS’17). In property testing, here of functions, it is standard to parameterize the query complexity by the two “obvious” parameters — namely, the domain size parameter \(n\), and the proximity parameter \(\varepsilon\). While this often leads to a good understanding of the landscape, sometime a more fine-grained analysis may be useful, to capture the complexity of the question in terms of the setting-specific “right” parameters. This work initiates this line of inquiry for functions \(f\colon [n] \to \mathbb{R}\), showing examples where classic lower bounds can be circumvented by focusing on a more relevant parameters of the problem.

A Lower Bound for Nonadaptive, One-Sided Error Testing of Unateness of Boolean Functions over the Hypercube, by Roksana Baleshzar, Deeparnab Chakrabarty, Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, C. Seshadhri (arXiv). A Boolean function \(f\colon \{0,1\}^n\to \{0,1\}\) is said to be unate if it is monotone (either increasing or decreasing) in each coordinate; or, equivalently, if there exists \(\sigma\in\{0,1\}^n\) such that \(x\mapsto f(x\oplus\sigma)\) is monotone. As reported in the previous months, there has been a significant amount of work lately on testing unateness of functions (including real-valued). In this short paper, the authors improve a lower bound of Chen et al. (2017) on non-adaptive, one-sided unateness testing of Boolean functions, bringing it from \(\Omega(\frac{n}{\log^2 n})\) to \(\Omega(\frac{n}{\log n})\) — leaving only a gap of \(\log^2 n\) between known upper and lower bounds.

Adaptivity is exponentially powerful for testing monotonicity of halfspaces, by Xi Chen, Rocco A. Servedio, Li-Yang Tan, and Erik Waingarten (arXiv:). While monotonicity testing of general Boolean functions has been widely studied these past years, testing monotonicity of functions promised to belong to a specific class (here, halfspaces — linear threshold functions) is much less understood. In this work, the authors show that testing monotonicity of halfspaces can done adaptively with \(\mathrm{poly}(\log n, 1/\varepsilon)\) queries. Since the \(\Omega(n^{1/2})\) non-adaptive lower bound for general monotonicity testing relied on instances that were halfspaces, it applies here as well — showing an exponential gap between adaptive and non-adaptive testing in this case. “It’s as extreme as it gets!”

Testing Piecewise Functions, by Steve Hanneke and Liu Yang (arXiv). Generalizing the concept of “unions of intervals,” the authors consider here the following general question: fix a class of functions \(\mathcal{H}\subseteq \mathcal{Y}^{\mathcal{X}}\), where \(\mathcal{X},\mathcal{Y}\) are “nice” spaces (typically \(\mathcal{X}=\mathbb{R}\)) and parameter \(k\) and \(\varepsilon\). The goal is to test whether (i) one can partition \(\mathcal{X}\) into \(k\) pieces and find \(h_1,\dots,h_k\in\mathcal{H}\), such that \(f=h_i\) on the \(i\)-th piece; or (ii) \(f\) is \(\varepsilon\)-far (for a notion of distance depending on the measure on \(\mathcal{X}\)) from every such “\(k\)-piecewise function.”
They give upper bounds (as well as a lower bound) for the query complexity of this question in both the active testing and the sample-based testing settings (for the latter, considering the class \(\mathcal{H}\) of constant functions).

Sample-based high-dimensional convexity testing, by Xi Chen, Adam Freilich, Rocco A. Servedio, and Timothy Sun (arXiv). In the setting considered in this paper, there exists an unknown set \(S\subseteq \mathbb{R}^n\), and the algorithm is provided with samples \(\langle x, S(x)\rangle\), where \(x\) is distributed according to a standard Gaussian distribution and the label \(S(x)\) is \(1\) iff \(x\in S\). The goal is to test whether \(S\) is convex (or $\varepsilon$-far from convex under the standard Gaussian distribution). The authors then show near-matching upper and lower bounds: one-sided testing of convexity in this testing has sample complexity \(2^{\tilde{\Theta}(n)}\), while two-sided testing has sample complexity \(2^{\tilde{\Theta}(\sqrt{n})}\).

Distributed Detection of Cycles, by Pierre Fraigniaud and Dennis Olivetti (arXiv); and Distributed Subgraph Detection, by Pierre Fraigniaud, Pedro Montealegre, Dennis Olivetti, Ivan Rapaport, and Ioan Todinca (arXiv). Improving the understanding of distributed property testing of graphs in the CONGEST model (already touched upon last month), these two works tackle the question of distributed testing of subgraph freeness. The first considers testing cycle-freeness, showing that \(C_k\)-freeness can be tested in the CONGEST model in a constant number of rounds (and with one-sided error). The second considers the more general question testing \(H\)-freeness of graphs, for a larger class of subgraphs that includes cycles. Specifically, they show that for any pattern \(H\) consisting of a couple of nodes, connected in an arbitrary manner to a tree, \(H\)-freeness can be tested (with one-sided error) in \(O(1)\) rounds.

Hypothesis Testing For Densities and High-Dimensional Multinomials: Sharp Local Minimax Rates, by Sivaraman Balakrishnan and Larry Wasserman (arXiv). In distribution testing, the question of testing identity to a known discrete distribution \(p\) is almost fully settled, with near tight “instance-optimal” bounds. But what about the related question of testing identity to \(p\) (not necessarily discrete) under the additional promise that both \(p\) and the unknown distribution \(q\) both belong to a specific class \(C\) of distributions, instead of arbitrary? This is the problem this work tackles, for the specific case of \(C\) being (i) the class of multinomial distributions, and (ii) the class of (continuous) Lipschitz distributions. In both cases, the authors establish some near-tight bounds, both of an “instance-optimal” flavor.

On Axis-Parallel Tests for Tensor Product Codes, by Alessandro Chiesa, Peter Manohar, and Igor Shinkar (ECCC). Low-degree testing has its roots in the origins of property testing, and its connection to the PCP theorem; and has been a fruitful and rich line of work since. Here, the authors analyze a type of low-degree test introduced in 2006 by Ben-Sasson and Sudan, which constrains the tester to only considers restrictions along axis-parallel hyperplanes; and establish new results on these class of (weaker but more general) testers.

* As usual, if you find a paper we forgot or misrepresented, please signal it in the comments below.

News for March 2017

March was a relatively good month for property testing, with 3 papers — including a foray in differential privacy.

Optimal Unateness Testers for Real-Valued Functions: Adaptivity Helps, by Roksana Baleshzar, Deeparnab Chakrabarty, Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, and C. Seshadhri (arXiv). Testing unateness (that is, whether a function is monotone after a rotation of the hypercube) has recently received significant attention. Understanding the power of adaptivity in property testing, on the other hand, has been a long-standing question, which has sparked much interest over the past decade. In this paper, the authors obtain optimal results for testing unateness of real-valued functions \(f\colon \{0,1\}^d\to \mathbb{R}\) and \(f\colon [n]^d\to \mathbb{R}\) (up to the dependence on the proximty parameter \(\varepsilon\)), and show that — unlike what happens for monotonicity testing of real-valued functions — adaptivity helps. Namely, adaptive testing of unatess of functions \(f\colon \{0,1\}^d\to \mathbb{R}\) has query complexity \(O(d/\varepsilon)\), but non-adaptive testers require \(\Omega(d\log d)\) queries.
This work is a merged, and significantly improved version of two papers we covered a few months back, by the same authors.

Switching from function to distribution testing, last month saw two independent samples appear online:

Near-Optimal Closeness Testing of Discrete Histogram Distributions, by Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin (arXiv). Closeness testing of discrete distributions is the question of testing, given sample access to two unknown distributions \(p,q\) over a known discrete domain \([n] \stackrel{\rm def}{=} \{1,\dots,n\}\), whether \(p=q\) or \(d_{\rm TV}(p,q)> \varepsilon\) (are the two distributions the same, or differ significantly in total variation distance?). This quite fundamental question is by now fully understood in the general case, where \(p,q\) are allowed to be arbitrary distributions over ; but what if one has some additional knowledge about them? Specifically, what if we were guaranteed that both distributions have some “nice structure” — e.g., that they are piecewise constant with \(k\) pieces (\(k\)-histograms)? Leveraging machinery the authors developed in a series of previous work (for the continuous case), the authors show that in this case, the sample complexity of “testing closeness under structural assumptions” is quite different than in the general case; and that the resulting (near tight) bound is a rather intricate tradeoff between the three parameters \(n,k,\varepsilon\).

Priv’IT: Private and Sample Efficient Identity Testing, by Bryan Cai, Constantinos Daskalakis, and Gautam Kamath (arXiv). Distribution testing is a vibrant field; differential privacy (DP) is an incredibly active and topical area. What about differentially private distribution testing — a.k.a. testing the data without learning too much about any single sample? In this paper, the authors address the task of performing identity testing of distributions (“given samples from an unknown arbitrary distribution, decide whether it matches a fixed known probability distribution/model”) in the DP setting, focusing on getting the best of both worlds: how to guarantee privacy without sacrificing efficiency. Not only do they obtain asymptotically near-optimal bounds in both the testing and differential privacy parameters — they also run some experiments to validate their approach empirically. (Plus, the title is rather cute.)

Usual disclaimer: if we forgot or misrepresented your paper, please signal it in the comments — we’ll address it as quickly as we can.