Author Archives: Clement Canonne

News for April 2019

After a quite slow month of March, things sped up quite significantly in April: six different papers, ranging from graph testing to function testing to quantum distribution testing!

Update (05/04): We missed one. Seven!

Junta correlation is testable, by Anindya De, Elchanan Mossel, and Joe Neeman (arXiv). Junta testing really has seen a lot of attention and progress over the pass couple years! In this new work, the focus is on the tolerant version of (Boolean) junta testing, where the goal is to decide whether a given function \(f\colon \{-1,1\}^n\to \{-1,1\}\) is close to some \(k\)-junta, or far from any such simple function. The current paper improves on previous work of Blais, Canonne, Eden, Levi, and Ron, which showed how to distinguish between \(\varepsilon\)-close to \(k\)-junta and \(16\varepsilon\)-far from \(k\)-junta in \(O_{k,\varepsilon}(1)\) queries, in two ways: (i) the gap-factor of 16 can now be made arbitrarily small, yielding the first fully tolerant non-trivial junta tester; and (ii) the new algorithm also identifies the underlying “core junta” (up to permutation of the coordinates). Besides the other results contained in the paper (such as results for a relaxation of the tolerant question akin to that of Blais et al.), a key aspect of this paper is to go beyond the use of (set) influence as a proxy for distance to junta-ness which underlied all previous work, thus potentially opening a new avenue towards get fully tolerant, \(\mathrm{poly}(k,1/\varepsilon)\)-query junta testing algorithms.

Testing Tensor Products, by Irit Dinur and Konstantin Golubev (arXiv). In this paper, the authors consider the following testing question: given query access to a Boolean function \(f\colon [n]^d \to \mathbb{F}_2\), test whether \(f\) is of the form \(f(x) = f_1(x_1)+\dots f_d(x_d)\) (equivalently, this is the task of testing whether a \(d\)-dimensional tensor with \(\pm 1\) is of rank \(1\)). They provide two different proximity-oblivious testers (POT) for this task: a \(4\)-query one, reminiscent and building upon the BLR linearity test; as well as a \((d+1)\)-query POT (which they dub “Shapka Test,” one can assume for their love of warm and comfortable hats), whose analysis is significantly simpler.

Changing direction, the next paper we saw is concerned with quantum testing of (regular, good ol’ classical) probability distributions:

Quantum Algorithms for Classical Probability Distributions, by Aleksandrs Belovs (arXiv). This work on quantum distribution testing considers 4 of the existing models of access to samples from an unknown probability distribution, analyzes their relationship, and argues their merits. Further, the autho establishes that for the simple hypothesis testing question of distinguishing between two (known, fixed) probability distributions \(p,q\), in all four models the optimal sample complexity is proportional to the inverse of the Hellinger distance between \(p\) and \(q\)—in contrast to the classical setting, where it is known to be proportional to the squared inverse of this distance.

Turning to graphs, the next work considers clustering of bounded-degree graphs, a topic we recently discussed as well:

Robust Clustering Oracle and Local Reconstructor of Cluster Structure of Graphs, by Pan Peng (arXiv). Consider the following noisy model: an unknown bounded-degree graph (of maximum degree \(d\)) \(G\) which is \((k, \phi_{\rm in},\phi_{\rm out})\)-clusterable (i.e., clusterable in at most \(k\) clusters of inner and outer conductances bounded by \(\phi_{\rm in},\phi_{\rm out}\)) is adversarially modified in at most \($\varepsilon d n\) edges between clusters. This noise model, introduced in this work, leads to the natural questions of “recovering the underlying graph \(G\)”: this is what the author tackles, by designing sublinear-time local clustering oracles and local reconstruction algorithms (local filters) in this setting. Further, in view of the noise model reminiscent of property testing in the bounded-degree graph setting, connections to testing clusterability are discussed; the implications of the results for testing clusterability are discussed in Section 1.4.

A Faster Local Algorithm for Detecting Bounded-Size Cuts with Applications to Higher-Connectivity Problems, by Sebastian Forster and Liu Yang (arXiv). The authors focus on the problem of finding an edge (or vertex) cut in a given graph from a local viewpoint, i.e., in the setting of local computation algorithms, and provide a slew of results in detecting bounded-size cuts. This in turn has direct implications for testing \(k\)-edge and \(k\)-vertex connectivity, directed and undirected, in both the bounded-degree and general (unbounded degree) models, improving on or subsuming the current state-of-the-art across the board (see Section 1.2.4 of the paper, and the very helpful Tables 3 and 4, for a summary of the improvements).

Random walks and forbidden minors II: A \(\mathrm{poly}(d\varepsilon^−1)\)-query tester for minor-closed properties of bounded degree graphs, by Akash Kumar, C. Seshadhri, and Andrew Stolman (ECCC). Following their breakthrough from last May, the authors are at it again! Leveraging a random-walk approach, they resolve the following open question in testing bounded-degree graph: is there a \(\mathrm{poly}(1/\varepsilon)\)-query tester for planarity (and, more generality, for minor-closed graph properties)? The previous state-of-the-art, due to Levi and Ron, had a quasipolynomial dependence on \(1/\varepsilon\).
Without spoiling too much: the answer to the open question is yes— and further the authors settle it by showing an even more general result on testing \(H\)-freeness (for any constant-size graph \(H\)).

Update (05/04):

Testing Unateness Nearly Optimally, by Xi Chen and Erik Waingarten (arXiv). Recall that a function \(f\colon \{0,1\}^n\to \{0,1\}\) is said to be unate if there exists some \(s\in\{0,1\}^n\) such that \(f(\cdot \oplus s)\) is monotone; i.e., if \(f\) is either non-increasing or non-decreasing in each coordinate. Testing unateness has seen a surge of interest over the past year or so; this work essentially settles the question, up to polylogarithmic factors in \(n\) (and the exact dependence on \(\varepsilon\)). Namely, the authors present and analyze an \(\tilde{O}(n^{2/3}/\varepsilon^2)\)-query adaptive tester for unateness, which nearly matches the \(\tilde{\Omega}(n^{2/3})\)-query lower bound for adaptive testers previously established by Chen, Waingarten, and Xie.

News for March 2019

Last month was very calm. Eerily calm in fact: no property testing or related sublinear-time algorithm in view!* Gird your loins for the April batch, I reckon?

\({}^\ast\) That we saw, at least. if we missed some, please mention it in the comments below!)

Update: As mentioned in the comments, we did indeed missed two (related) works on distribution estimation from a competitive viewpoint. Namely, for a large class of properties (entropy, distance to uniformity, support size…), Hao, Orlitsky, Suresh, and Wu provide in the first paper (arXiv 1904.00070) an “instance-optimal(ish)” estimator which does “as well” with \(m/\sqrt{\log m}\) samples than the natural and naive empirical estimator would do with \(m\) (thus, in some sense, amplifying the data size). In the following paper (arXiv 1903.01432), Hao and Orlitsky improve this and remove the “ish” to get the optimal amplification \(m/\log m\).

News for December 2018

Happy near year, and best wishes to those close and \(\varepsilon\)-far! December concluded the year with 4 new preprints, spanning quite a lot of the property testing landscape:

Testing Stability Properties in Graphical Hedonic Games, by Hendrik Fichtenberger and Anja Rey (arXiv). The authors of this paper consider the problem of deciding whether a given hedonic game possesses some “coalition stability” in a property testing framework. Namely, recall that a hedonic game is a game where players (nodes) form coalitions (subsets of nodes) based on their individual preferences and local information about the considered coalition, thus resulting in a partition of the original graph.
Several notions exist to evaluate how good such a partition is, based on how “stable” the given coalitions are. This work focuses on hedonic games corresponding to bounded-degree graphs, introducing and studying the property testing question of deciding (for several such notions of stability) whether a given game admits a stable coalition structure, or is far from admitting such a partition.

Spectral methods for testing cluster structure of graphs, by Sandeep Silwal and Jonathan Tidor (arXiv). Staying among bounded-degree graphs, we turn to testing clusterability of graphs, the focus of this paper. Given an \(n\)-node graph \(G\) of degree at most \(d\) and parameters \(k, \phi\), say that \(G\) is \((k, \phi)\)-clusterable if it can be partitioned in \(k\) parts of inner conductance at least \(\phi\).
Analyzing properties of a random walk on \(G\), this work gives a bicriterion guarantee (\((k, \phi)\)-clusterable vs. \(\varepsilon\)-far from \((k, \phi^\ast)\)-clusterable, where \(\phi^\ast \approx \varepsilon^2\phi^2\)) for the case \(k=2\), improving on previous work by Czumaj, Peng, and Sohler’15.

We then switch from graphs to probability distributions with our third paper:

Inference under Information Constraints I: Lower Bounds from Chi-Square Contraction, by Jayadev Acharya, Clément Canonne, and Himanshu Tyagi (arXiv). (Disclaimer: I’m one of the authors.) In this paper, the first of an announced series of three, the authors generalize the settings of two previous works we covered here and there to consider the general question of distribution testing and learning when the \(n\) i.i.d. samples are distributed among \(n\) players, which each can only communicate their sample to the central algorithm by respecting some pre-specified local information constraint (e.g., privacy, or noise, or communication budget). This paper develops a general lower bound framework to study such questions, with a systematic focus on the power of public vs. private randomness between the \(n\) parties, and instantiate it to obtain tight bounds in the aforementioned locally private and communication-limited settings. (Spoiler: public randomness strictly helps, but not always.)

Finally, after games, graphs, and distributions, our fourth paper of the month concerns testing of functions:

Partial Function Extension with Applications to Learning and Property Testing, by Umang Bhaskar and Gunjan Kumar (arXiv). This work focuses on a problem quite related to property testing, that of partial function extension: given as input \(n\) pairs point/value from a purported function on a domain \(X\) of size \(|X| > n\), one is tasked with deciding whether there does exist (resp., with finding) a function \(f\) on \(X\) consistent with these \(n\) values which further satisfies a specific property, such as linearity or convexity. This is indeed very reminiscent of property testing, where one gets to query these \(n\) points and must decide (approximate) consistency with such a well-behaved function. Here, the authors study the computational hardness of this partial function extension problem, specifically for properties such as subadditivity and XOS (a sub-property of subadditivity); and as corollaries obtain new property testers for the classes of subadditive and XOS functions.

As usual, if you know of some work we missed from last December, let us know in the comments!

News for September 2018

Last month was slower than usual, with only one property testing paper. We did miss a few papers, so hopefully this list is now up to date.

Property testing and expansion in cubical complexes, by David Garber, Uzi Vishne (arXiv). Consider the question of testing if an arbitrary function \(f\colon V\times V \to\{-1,1\}\) is of the form \(f(x,y) = h(x)h(y)\) for some \(h\colon V\to\{-1,1\}\). An intuitive one-sided test, shown to work by Lubotzky and Kaufman (2014), is to pick uniformly random \(x,y,z\in V\) and check that \(f(x,y)f(y,z)f(z,x)=1\). This paper considers the high-dimensional generalization of testing the property that a function\(f\colon V\times V \times V\times V \to\{-1,1\}\) is of the form \(f(w,x,y,z) = \alpha\cdot h(w,x)h(y,x)h(y,z) h(w,z)\), for some \(h\colon V\times V\to\{-1,1\}\) and sign \(\alpha\in\{-1,1\}\). The authors derive necessary and sufficient conditions for testability of this property, by formulating it in the language of incidence geometry and exploiting this connection.

Local Computation Algorithms for the Lovász Local Lemma, by Dimitris Achlioptas, Themis Gouleakis and Fotis Iliopoulos (arXiv). There has been significant work in the past decade on constructive versions of the Lovász Local Lemma (LLL), since the seminal work of Moser-Tardos. This paper designs news Local Computation Algorithms (LCAs) for the LLL. It’s best to consider the problem of \(k\)-SAT. Consider a \(k\)-CNF \(\phi\) with \(n\) variables, \(m\) clauses, where every variable is in at most \(d\) clauses. By the LLL, if \(d \leq 2^k/ke\), then \(\phi\) is satisfiable. An LCA would compute any bit of a satisfying assignment, by making sublinear queries into \(\phi\). This was first studied by Rubinfeld-Tamir-Vardi-Xie. Their LCA would make polylogarithmically many queries, but requires a stronger condition that what LLL achieves. This paper gives the first sublinear LCA with precisely the LLL conditions, though the number of queries is \(n^\beta\) (for \(\beta \lt 1\)). The main result is an LCA for an abstract LLL formulation, that also leads to LCAs for graph coloring. Roughly speaking, for a graph with maximum degree \(\Delta\) where all neighborhoods are sufficiently far from cliques, the LLL shows that the chromatic number bound of \(\Delta + 1\) can be beaten. This result gives an LCA for graph coloring under these LLL conditions.

Sublinear Time Low-Rank Approximation of Distance Matrices, by Ainesh Bakshi and David P. Woodruff (arXiv). Consider two sets of points \(P\) and \(Q\) in a metric space, with \(m\) and \(n\) points respectively. The \(m \times n\) distance matrix \(A\) contains all pairwise distances between them. This paper studies approximating \(A\) using a low rank representation, without reading all the entries in \(A\). The main result is as follows. For rank parameter \(k\), let \(A_k\) be the closest (by Frobenius norm) rank-\(k\)-approximation to \(A\). There is a \(O(m^{1+\gamma} + n^{1+\gamma}poly(k\epsilon^{-1}))\) (for arbitrary \(\gamma > 0\)) algorithm that outputs a rank \(k\)-matrix \(B\) with the following property: \(\|A-B\|^2_F \leq \|A-A_k\|^2_F + \epsilon \|A\|^2_F\). Interestingly, there is a lower bound showing that a \(o(mn)\) algorithm cannot get a multiplicative approximation. One technical ingredient is a method to sample column norms of \(A\), under an approximate triangle inequality constraint. This allows one to compute smaller matrices that approximate \(A\), on which one can directly compute an approximate rank-\(k\) decomposition.

On Solving Linear Systems in Sublinear Time, by Alexandr Andoni, Robert Krauthgamer, Yosef Pogrow (arXiv). Solving Laplacian linear systems is an immensely deep area, with lots of exciting recent work. This paper studies sublinear algorithms for such problems. Consider a Laplacian matrix \(L\) (think of \(I – A/d\), for adjacency matrix \(A\) of a \(d\)-regular graph). The aim is to solve \(Lx = b\), for \(x, b \in {\mathbb R}^n\). Let the solution be \(x^*\). The main result shows that one can approximate any entry in sublinear time. Specifically, for any coordinate \(i\), one can output an approximate \(\hat{x_i}\) such that \(|\hat{x_i} – x^*_i| \leq \|x^*\|_\infty\). The running time is essentially \(d\epsilon^{-2}\kappa^3\), where \(\kappa\) is a bound on the condition number of \(L\). There are generalizations for Symmetrically Diagonally Dominant (SDD) matrices, a generalization of Laplacians. There is an \(\Omega(n^{1/d^2})\) lower bound for solving general PSD systems, and a lower bound showing that \(\Omega(\kappa^2)\) queries into \(b\) are necessary.

News for June 2018

The summer gets off to a flying start, with three property testing papers, spanning differential privacy, distribution testing, and juntas in Gaussian space!

On closeness to \(k\)-wise uniformity, by Ryan O’Donnell and Yu Zhao (arXiv)
In this paper, the authors consider the following structural question about probability distributions over the Boolean hypercube \(\{-1,1\}^n\): ” what is the relation between total variation distance \(\delta\) to \(k\)-wise independence, and bound \(\varepsilon\) on the Fourier coefficients of the distribution on degrees up to \(k\)?”

While this question might seem a bit esoteric at first glance, it has direct and natural applications to derandomization, and of course to distribution testing (namely, to test \(k\)-wise independence and its generalization, \((\varepsilon, k)\)-wise independence of distributions over the hypercube).

The main contribution here is to improve (by a \((\log n)^{O(k)}\) factor) the bounds on \(\delta (n,k,\varepsilon)\) over the previous work by Alon et al. [AAK+07], making them either tight (for \(k\) even) or near-tight. To do so, the authors introduce a new hammer to the game, using linear programming duality in the proof of both their upper and lower bounds.

Property Testing for Differential Privacy, by Anna Gilbert and Audra McMillan (arXiv)
Differential privacy, as introduced by Dwork et al., needs no introduction. Property testing, especially on this website, needs even less. What about a combination of the two? Namely, given black-box access to an algorithm claiming to perform a differentially private computation, how to test whether this is indeed the case?

Introducing and considering this quite natural question for the first time [01/31/2019: see erratum below], this work shows, roughly speaking, that testing differential privacy is hard. Specifically, they show that for many notions of differential privacy (pure, approximate, and their distributional counterparts), testing is either impossible or possible but not with a sublinear number of queries (even when the tester is provided with side information about the black-box). In other terms, as the authors put it: trusting the privacy of an algorithm “requires compromise by either the verifier or algorithm owner” (and, in the latter case, even then it’s not a simple matter).

Is your data low-dimensional?, by Anindya De, Elchanan Mossel, and Joe Neeman (arXiv)
(Well, is it?) To state it upfront, I am biased here, as it is a problem I was very eager to see investigated to begin with. To recap, the question is as follows: “given query access to some unknown Boolean-valued function \(f\colon \mathbb{R}^n \to \{-1,1\}\) over the high-dimensional space \(\mathbb{R}^n\) endowed with the Gaussian measure, how can one check whether \(f\) only depends on “few” (i.e., \(k \ll n\)) variables?”

This is the continuous, Gaussian version of the (quite famous) junta testing problem, which has gathered significant attention over the past years (the Gaussian version has, to the best of my knowledge, never been investigated). Now, the above formulation has a major flaw: specifically, it is uninteresting. In Gaussian space*, who cares about the particular basis I expressed my input vector in? So a more relevant question, and that that the authors tackle, is the more robust and natural one: “given query access to some unknown Boolean-valued function \(f\colon \mathbb{R}^n \to \{-1,1\}\) over the high-dimensional space \(\mathbb{R}^n\) endowed with the Gaussian measure, how can one check whether \(f\) only depends on a low-dimensional linear combination of the variables?” Or, put differently, does all the relevant information for \(f\) live in a low-dimensional subspace?

De, Mossel, and Neeman show how can do this, non-adaptively, with a query complexity independent of the dimension \(n\) (hurray!), but instead polynomial in \(k\), the distance parameter \(\varepsilon\), and the surface area \(s\) of \(f\). And since this last parameter may seem quite arbitrary, they also proceed to show that a polynomial dependence in this \(s\) is indeed required.

*”In Gaussian space, no one can hear you change basis?”


Erratum (01/31/2019): It was brought to our attention that our overview of “Property Testing for Differential Privacy” was overlooking a key part of the literature; specifically, a work of Dixit et al. (TCC 2013)  which introduces this very question. From the abstract:

How does one ensure that [those third-parties] have implemented their algorithms in a way which meet the specifications of the privacy requirements? […] In this work, we propose a new approach to the above problem which we call privacy testing. We do this by formulating the above problem in the well-studied framework of property testing.

News for March 2018

March has been a relatively slow month for property testing, with 3 works appearing online. (If you notice we missed some, please leave a comment pointing it out)

Edge correlations in random regular hypergraphs and applications to subgraph testing, by Alberto Espuny Díaz, Felix Joos, Daniela Kühn, and Deryk Osthus (arXiv). While testing subgraph-freness in the dense graph model is now well-understood, after a series of works culminating in a complete characterization of the testing problems which admit constant-query testers, the corresponding question for hypergraphs is far from resolved. In this paper, the authors develop new techniques for the study of study of random regular hypergraphs, which imply new testing results for subhypergraph-freeness testing, improving on the state-of-the-art for some parameter regimes (e.g., when the input graph satisfies some average-degree condition).

Back from hypergraphs to graphs, we also have:
The Subgraph Testing Model, by Oded Goldreich and Dana Ron (ECCC). Here, the authors introduce a new model for property testing of graphs, where the goal is to test if an unknown subgraph \(F\) of an explicitly given graph \(G=(V,E)\) satisfies the desired property. The testing algorithm is provided access to \(F\) via membership queries, i.e., through query access to the indicator function \(\mathbf{1}_F\colon E \to \{0,1\}\). (In some very hazy sense, this is reminiscent of the active learning or testing frameworks, where one gets more or less free access to unlabeled data but pays to see their label.) As a sample of the results obtained, the paper establishes that this new model and the bounded-degree graph model are incomparable: there exist properties easier to test in one model than the other, and vice-versa — and some properties equally easy to test in both.

And finally, to drive home the point that “models matter a lot,” we have our third paper:
Every set in P is strongly testable under a suitable encoding, by Irit Dinur, Oded Goldreich, and Tom Gur (ECCC). It is known that the choice of representation of the objects has a large impact in property testing: for instance, the complexity of testing a given property can change drastically between the dense and bounded-degree graph models. This work provides another example of such a strong dependence on the representation: while membership to some sets in \(P\) is known to be hard to test, the authors here prove that, for every set \(S\in P\), there exists a (polynomial-time, invertible) encoding \(E_S\colon \{0,1\}^\ast\to \{0,1\}^\ast\) such that testing membership to \(S\) under this encoding is easy. (They actually show even stronger a statement: namely, that under this encoding the set admits a “proximity-oblivious tester,” that is a constant-query testing algorithm which rejects with probability function of the distance to \(S\).)

Finally, on a non-property testing note: Edith Cohen, Vitaly Feldman, Omer Reingold, and Ronitt Rubinfeld recently wrote a pledge for inclusiveness in the TCS community, available here: https://www.gopetition.com/petitions/a-pledge-for-inclusiveness-in-toc.html
If you haven’t seen it already, we encourage you to read it.

Update: Fixed a mistake in the overview of the second paper; as pointed out by Oded in the comments, the main comparison was between the new model and the bounded-degree graph model, not the dense graph one.

News for December 2017

December 2017 concluded the year in style, with seven property testing papers spanning quite the range. Let’s hope 2018 will keep up on that trend! (And, of course, if we missed any, please point it out in the comments.The blame would be on our selves from the past year.)

We begin with graphs:

High Dimensional Expanders, by Alexander Lubotzky (arXiv). This paper surveys the recent developments in studying high dimensional expander graphs, a recent generalization of expanders which has become quite active in the past years and has intimate connections to property testing.

Generalized Turán problems for even cycles, by Dániel Gerbner, Ervin Győri, Abhishek Methuku, and Máté Vizer (arXiv).
A Generalized Turán Problem and its Applications, by Lior Gishboliner and Asaf Shapira (arXiv).
In these two independent works, the authors study questions of a following flavor: two subgraphs (patterns) \(H,H’\), what is the maximum number of copies of \(H\) which can exist in a graph \(G\) promised to be \(H’\)-free? They consider the case where the said patterns are cycles on \(\ell,k\) vertices respectively, and obtain asymptotic bounds on the above quantity (the two papers obtain somewhat incomparable bounds, and the first focuses on the case where both \(\ell,k\) are even). These estimates, in turn, have applications to graph removal lemmata, as discussed in the second work (Section 1.2): specifically, it implies the existence of a removal lemma with a
tight super-polynomial bound, a question which was previously open.

Approximating the Spectrum of a Graph, by David Cohen-Steiner, Weihao Kong, Christian Sohler, and Gregory Valiant (arXiv). The authors obtain constant-time and query algorithm for the task of approximating (in \(\ell_1\) norm) the spectrum of a graph \(G\), i.e. the eigenvalues of its Laplacian, given random query access to the nodes of \(G\) and to the neighbors of any given node. They also study the applications of this result to property testing in the bounded-degree model, showing that a large class of spectral properties of high-girth graphs is testable.

Then, we go quantum:

Schur-Weyl Duality for the Clifford Group with Applications: Property Testing, a Robust Hudson Theorem, and de Finetti Representations, by David Gross, Sepehr Nezami, and Michael Walter (arXiv).
Introducing and studying a duality theory for the Clifford group, the authors are able (among other results) to resolve an open question in quantum property testing, establishing a constant-query tester (indeed, making 6 queries) for testing whether an unknown quantum state is a stabilizer state. The previous best upper bound was linear in the number of qubits, as it proceeded by learning the state (“testing by learning”).

Quantum Lower Bound for a Tripartite Version of the Hidden Shift Problem, by Aleksandrs Belovs (arXiv). This work introduces and studies a generalization of (both) the hidden shift and 3-sum problems, and shows an \(\Omega(n^{1/3})\) lower bound on its quantum query complexity. The author also considers a property testing version of the problem, for which he proves a similar lower bound—interestingly, this polynomial lower bound is shown using the adversary method, evading the “property testing barrier” which states that (a restricted version of) this method cannot yield better than a constant-query lower bound.

And to conclude, a distribution testing paper:

Approximate Profile Maximum Likelihood, by Dmitri S. Pavlichin, Jiantao Jiao, and Tsachy Weissman (arXiv) This paper proposes an efficient (linear-time) algorithm to approximate the profile maximum likelihood of a sequence of i.i.d. samples from an unknown distribution, i.e. the probability distribution which, ignoring the labels of the samples and keeping only the collision counts, maximizes the likelihood of the sequence observed. This provides a candidate solution to an open problem suggested by Orlitsky in a FOCS’17 workshop (see also Open problem 84), and one which would have direct implications to tolerant testing and estimation of symmetric properties of distributions.

New Book: Introduction to Property Testing, by Oded Goldreich

A tad less than one year and a half ago, we announced on this blog a then-upcoming book on property testing, by Oded Goldreich. Now, just in time for the Holiday season, or for perusing on the beach if you live in the south hemisphere, the book is out!

Covering a broad swath of topics in property testing, Introduction to Property Testing, by Oded Goldreich (published by Cambridge University Press) is, quoting Amazon:

An extensive and authoritative introduction to property testing, the study of super-fast algorithms for the structural analysis of large quantities of data in order to determine global properties. This book can be used both as a reference book and a textbook, and includes numerous exercises.

See the book’s website for more resources and the (detailed) table of contents.

FOCS 2017: Workshop on Distribution Testing

As it may have transpired by now, FOCS 2017 (a.k.a. the 58th Annual Symposium on Foundations of Computer Science) is around the corner.

I am not going to go into the details of the talks on or related to property testing at the conference itself,\({}^{(\ast)}\) but instead focus on (one of) the workshops.

This FOCS will include 3 workshops,\({}^{(\dagger)}\) on Saturday 14th (gird your loins — they last the full day!):

  • Linear Sketching as a Tool for Everything
  • Frontiers in Distribution Testing
  • Hardness Escalation in Communication Complexity and Query Complexity

The second,  Frontiers in Distribution Testing, is specifically about recent advances, new questions, and novel directions in property testing of distributions. Co-organized by Gautam Kamath and me, its official objective is to “catch the community up in recent developments, and highlight some of the most interesting frontiers in distribution testing.”

It’ll feature a stellar lineup of speakers\({}^{(\ddagger)}\) (namely, Ilias Diakonikolas, Jiantao Jiao, Alon Orlitsky, Constantinos Daskalakis, Ryan O’Donnell, Ronitt Rubinfeld, and Tom Gur), as well as an open problem session — and, very possibly, pictures of eggs. The program is available, with all relevant details, on the workshop webpage.

If you are around next Saturday, consider attending — and submit your favorite open problems!

\({}^{(\ast)}\) Hint: look at Sessions 7A and 10B.
\({}^{(\dagger)}\) Thanks to the FOCS workshop chairs James Lee and Aleksander Mądry.
\({}^{(\ddagger)}\) A lineup of stellar speakers.

News for September 2017

Summer came and went, and Fall begins at full throttle with a (metric) ton of papers. Eight that we counted — if any was missed, please mention it in the comments!

Efficient Removal without Efficient Regularity, by Lior Gishboliner, Asaf Shapira (arXiv). Obtaining efficient removal lemmata for graphs pattern (such as triangle, to name the most famous), that is removal results with bounds on the number of copies of the pattern that is not mind-blowingly huge like a tower of \(\varepsilon\), is a classic and longstanding problem. This work makes significant progress for the last remaining case, i.e. for the pattern \(C_4\): providing bounds that are merely exponential in \(\varepsilon\).

Local decoding and testing of polynomials over grids, by Srikanth Srinivasan, Madhu Sudan (arXiv, ECCC). In this work, the authors study the local decodability and local testability of error-correcting codes corresponding to low-degree polynomials on the grid \(\{0,1\}^n\) (over a field \(\mathbb{F}\supseteq \{0,1\}\)). Obtaining both positive and negative results on these, a consequence of their results is a separation between local testability and local decodability for a natural family of codes.

Lower Bounds for Approximating Graph Parameters via Communication Complexity, by Talya Eden and Will Rosenbaum (arXiv). This paper establishes an analogue of the framework of Blais, Brody, and Matulef (2012), which enabled one to obtain property testing lower bounds by a reduction from communication complexity, for the setting of graph parameter estimation. The authors then leverage this technique to give new and simpler proofs of lower bounds for several such estimation tasks.

A Note on Property Testing Sum of Squares and Multivariate Polynomial Interpolation, by Aaron Potechin and Liu Yang (arXiv). The authors introduce and study the question of testing “sum-of-square-ness,” i.e. the property of a degree-\(d\)-polynomial being a sum of squares. Specifically, they show that one-sided sample-based testers cannot do much better than the trivial approach, that is that they require sample complexity \(n^{\Omega(d)}\) — while learning the polynomial can be done with \(n^{O(d)}\) samples.

Sharp Bounds for Generalized Uniformity Testing, by Ilias Diakonikolas, Daniel Kane, and Alistair Stewart (arXiv, ECCC). Remember the post from last month, which included a paper on “Generalized Uniformity Testing”? Well, this paper more or less settles the question, establishing tight bounds on the sample complexity of testing whether an (unknown) probability distribution over an (unknown) discrete domain is uniform on its support, or far from every uniform distribution. Specifically, the authors significantly strengthen the previous upper bound, by getting the right dependence on \(\varepsilon\) for all regimes; and complement it by a matching worst-case lower bound.

Sample-Optimal Identity Testing with High Probability, by Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price  (ECCC). Usually, in property testing we do not care too much about the error probability \(\delta\): if one can achieve \(1/3\), then simple repetition can bring it down to \(\delta\) at the mild price of a \(\log(1/\delta)\) factor in the query/sample complexity. Is that necessary, though? This paper shows that for uniformity and identity testing of distributions, the answer is “no”: for some regimes, this repetition trick is strictly suboptimal, as one can pay instead only a multiplicative \(\sqrt{\log(1/\delta)}\). And quite interestingly, this improvement is achieved with the simplest algorithm one can think of: by considering the empirical distribution obtained from the samples.

A Family of Dictatorship Tests with Perfect Completeness for 2-to-2 Label Cover, by Joshua Brakensiek and Venkatesan Guruswami (ECCC). While I tried to paraphrase the original abstract, but my attempts only succeeded in making it less clear; and, for fear of botching the job, decided to instead quote said abstract: “[the authors] give a family of dictatorship tests with perfect completeness [that is, one-sided] and low-soundness for 2-to-2 constraints. The associated 2-to-2 conjecture has been the basis of some previous inapproximability results with perfect completeness. However, evidence towards the conjecture in the form of integrality gaps even against weak semidefinite programs has been elusive. [Their] result provides some indication of the expressiveness and non-triviality of 2-to-2 constraints, given the close connections between dictatorship tests and satisfiability and approximability of CSPs.”

A polynomial bound for the arithmetic \(k\)-cycle removal lemma in vector spaces, by Jacob Fox, László Miklós Lovász, and Lisa Sauermann (arXiv). And back to removal lemmata! This work proves a generalization of Green’s arithmetic \(k\)-cycle removal lemma, which held for any \(k\geq 3\) and abelian group \(G\); however, the bounds in this lemma were quite large — i.e., tower-ype. Here, the authors establish an efficient lemma (with polynomial bounds) for the case of the group \(\mathbb{F}_p^n\) (where \(p\geq 2\) is any fixed prime, and \(k\geq 3\)).

Update (10/04): Finally, a paper we covered last summerThe Dictionary Testing Problem, by Siddharth Barman, Arnab Bhattacharyya, and Suprovat Ghoshal, went under signficant changes. Now titled Testing Sparsity over Known and Unknown Bases, it now includes (in addition to the previous results) a testing algorithm for sparsity with regard to a specific basis: given a matrix \(A \in \mathbb{R}^{d \times m}\) and unknown input vector \(y \in \mathbb{R}^d\), does \(y\) equal \(Ax\) for some \(k\)-sparse vector \(x\), or is it far from all such representations?

Update (10/5): we missed a recent paper of Benjamin Fish, Lev Reyzin, and Benjamin Rubinstein on Sublinear-Time Adaptive Data Analysis (arXiv). While not directly falling into the umbrella of property testing, this work considers sublinear-time algorithms for adaptive data analysis — similar in goal and spirit to property testing.