Category Archives: Surveys and books

New Book: Introduction to Property Testing, by Oded Goldreich

A tad less than one year and a half ago, we announced on this blog a then-upcoming book on property testing, by Oded Goldreich. Now, just in time for the Holiday season, or for perusing on the beach if you live in the south hemisphere, the book is out!

Covering a broad swath of topics in property testing, Introduction to Property Testing, by Oded Goldreich (published by Cambridge University Press) is, quoting Amazon:

An extensive and authoritative introduction to property testing, the study of super-fast algorithms for the structural analysis of large quantities of data in order to determine global properties. This book can be used both as a reference book and a textbook, and includes numerous exercises.

See the book’s website for more resources and the (detailed) table of contents.

Distribution Testing: a short non-survey

The past years have witnessed a great deal of activity in the field of distribution testing— so much, in fact, that it has become a bit of a challenge to even keep track of what’s happening and what has already happened. While this blog post is not going to be a comprehensive (comprehensible, maybe?) survey or summary, it hopefully will at least shed some light on what distribution testing is — and where to look at if interested in it.

Obligatory XKCD comic.

Recall that in the “vanilla” property testing setting, one is provided with access (typically query access) to an object \(\mathcal{O}\) from some universe \(\Omega\) (typically the space of Boolean functions over \(\{0,1\}^n\) or the set of \(n\)-vertex graphs), a distance measure \(\textrm{d}(\cdot,\cdot)\) between such objects (usually the Hamming distance), a property \(\mathcal{P}\subseteq \Omega\) (the subset of “good objects”) and a parameter \(\varepsilon \in (0,1]\). The goal is to design a randomized algorithm that (i) accepts with high probability if \(\mathcal{O}\in\mathcal{P}\); and (ii) rejects with high probability if \(d(\mathcal{O},\mathcal{O}’)>\varepsilon\) for every single object \(\mathcal{O}’\in\mathcal{P}\).

The distribution testing setting is very much like this one, with of course crucial twists. The object is now a probability distribution \(D\) over a (discrete) set, typically \([n]=\{1,\dots,n\}\); the access model is usually a sample oracle, providing independent samples from \(D\) on demand; the distance is the total variation distance between distributions (or, equivalently, the \(\ell_1\) norm between the probability mass functions).

\(\textrm{d}_{\rm TV}(D,D’) = \sup_{S\subseteq [n]} \left( D(S) – D'(S) \right) = \frac{1}{2} \sum_{i\in[n]} \left|D(i)-D'(i) \right|\)

This brings in some quite important distinctions: for instance, randomness is now inherent: even taking a number of samples beyond any reasonable bound will not bring the probability of failure of the algorithm to zero, this time.

A Brief History

The field made its debut in TCS with a work of Goldreich and Ron [5], who considered the question of testing if an expander graph had mixed — that is, if the distribution induced on its nodes was uniform. However, it is only a tad later, in a work of Batu, Fortnow, Rubinfeld, Smith, and White [6] that the setting was formally defined and studied, for the specific problem of testing whether two unknown random variables have the same distribution (closeness, or equivalence testing).

This started a line of work, which analyzed the sample complexity of testing uniformity (is my random variable uniformly distributed?), identity (is it distributed according to this nice distribution I fully know in advance?), independence (are my random variables independent of each other?), monotonicity (is the probability density function increasing?), and many more. Over the course of the past 16 years, many breakthroughs, new techniques, new questions, and new answers to old problems were made and found. With surprising twists (if your domain size is n, then surely \(\Theta(n)\) samples are enough and necessary to learn an arbitrary distribution; but who would have thought that \(\Theta(n/\log n)\) was the right answer to anything? [7]). And game changers (general theorems that apply to many questions or blackbox lemmas are really nice to have in one’s toolbox).

And that’s barely the tip of the iceberg! For more in-depth or better written prose, the interested Internet wanderer may want to consult one of the following (usual non-exhaustivity caveats apply):

  • Ronitt Rubinfeld’s introduction [1]
  • Dana Ron’s survey on Property testing [2] (Section 11.4)
  • Oded Goldreich’s upcoming book [3] (Chapter 11)
  • or my own survey [4] on distribution testing (but I cannot promise better written prose for this one).

For the more video-inclined, there are also quite a few survey talks available online, the most recent I know of being this talk by Ronitt Rubinfeld at COLT’16.

A Brief Comparison

Note: this section is taken verbatim from my survey [4] (Section 1.2).

It is natural to wonder how the above approach to distribution testing compares to classic methods and formulations, as studied in Statistics. While the following will not be a thorough comparison, it may help shed some light on the difference.

  • Null and alternative hypotheses. The standard take on hypothesis testing, simple hypothesis testing, relies on defining two classes of distributions, the null and alternative hypotheses \(\mathcal{H}_0\) and \(\mathcal{H}_1\). A test statistic is then tailored specifically to these two classes, in order to optimally distinguish between \(\mathcal{H}_0\) and \(\mathcal{H}_1\)— that is, under the underlying assumption that the unknown distribution \(D\in\mathcal{H}_0\cup\mathcal{H}_1\).

    Something like that.


    The test then rejects the null hypothesis \(\mathcal{H}_0\) if statistical evidence is obtained that \(D\notin\mathcal{H}_0\). In this view, the distribution testing formulation would be to set \(\mathcal{H}_0\) to be the property \(\mathcal{P}\) to be tested, and define the alternative hypothesis as “everything far from \(\mathcal{P}\).” In this sense, the latter captures a much more adversarial setting, where almost no structure is assumed on the alternative hypothesis— setting known in Statistics as composite hypothesis testing.
  • Small-sample regime. A second and fundamental difference resides in the emphasis given to the question. Traditionally, statisticians tend to focus on asymptotic analysis, characterizing— often exactly— the rate of convergence of the statistical tests under the alternative hypothesis, as the number of samples \(m\) grows to infinity. Specifically, the goal is to pinpoint the error exponent \(\rho\) such that the probability of error (failing to reject the null hypothesis) asymptotically decays as \(e^{-\rho m}\). However, this asymptotic behavior will generally only hold for values of \(m\) greater than the size of the domain (“alphabet”). In contrast, the computer science focus is on the small-sample regime, where the number of samples available is small with regard to the domain size, and one aims at achieving a fixed probability of error.
  • Algorithmic flavor. At a more practical level, a third point on which the two approaches deviate is the set of techniques used in order to tackle the question. Namely, the Statistics literature very often relies on relatively simple-looking and “natural” tests and estimators, which need not be computationally efficient. (This is for instance the case for the generalized likelihood ratio test that requires to compute the maximum likelihood of the sequence of samples obtained under the two hypotheses \(\mathcal{H}_0\) and \(\mathcal{H}_1\); which is not in general tractable.) On the other hand, works in distribution testing are predominantly algorithmic, with a computational emphasis on the testing algorithms thus obtained.

Exciting times!

And while the above has hinted at all that has been done so far in the area, there is still plenty ahead — many new questions to be posed, answered, revisited under another light, tackled with new insights and techniques, and connections to be made. As a small sample (sic), we list below a few of the recent developments or trends that herald these exciting times.

New tools

  •  Appeared in the 1800’s thanks to Pearson, testers based on (some tailored variant of) a \(\chi^2\)-based statistic are now aplenty. Yielding simple, concise, very often optimal testing algorithms… [8], [9], [10], [11]
  •  Information theory. Techniques and results from this related area have started to permeate distribution testing— mostly to prove lower bounds in an easier or conceptually simpler way. [12], [13,14]
  • They were there from the beginning. Now, the \(\ell_2\)-testing subroutines strike back. (For neat, elegant testing algorithms that use \(\ell_2\) as a proxy “in the right way.”) [5,6,15][14]

New models

What if modeling the access to the data as a source of i.i.d. samples was not enough? Because we can— the situation allows it— or because we want— what we are trying to achieve is best modeled a bit differently? If we get more ways to query it, can we do more, and by how much?

This recent line of work [20,21,22,23,…] offers many challenges, ranging from the right way to ask (what to model, and how?) to the right way to answer (intuition if often off the hook, in these uncharted waters?). New tools to design, intuition to build, and problems to consider…

New questions

And of course, even if many of the “classic” problems have by now been fully understood, there are many more that await… from class testing [16,11,17,18] to twists on old questions (uneven sample size for closeness?) to new questions entirely [19], there is no shortage. Not to forget there is a stronger connection with practice to build, and asking the “real-worlders” what makes sense to consider is also unlikely to stop producing challenges.


[1] Ronitt Rubinfeld. Taming Big Probability DistributionsXRDS, 19(1):24-28, 2012. http://xrds.acm.org/article.cfm?aid=2331052
[2] Dana Ron. Algorithmic and Analysis Techniques in Property Testing. Foundations and Trends in Theoretical Computer Science: Vol. 5: No. 2, pp 73-205. 2010. http://dx.doi.org/10.1561/0400000029
[3] Oded Goldreich. Introduction to Property Testing. (In preparation) http://www.wisdom.weizmann.ac.il/~oded/pt-intro.html
[4] Clément Canonne. A Survey on Distribution Testing: Your Data is Big. But is it Blue? 2015. http://eccc.hpi-web.de/report/2015/063/
[5] Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. Electronic Colloquium on Computational Complexity (ECCC), 7:20, 2000.
[6] Tuğkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing that distributions are close. In Proceedings of FOCS, pages 189-197, 2000.
[7] Gregory Valiant and Paul Valiant. The power of linear estimators. In Proceedings of FOCS, pages 403-412, October 2011.
[8] Siu-On Chan, Ilias Diakonikolas, Gregory Valiant, and Paul Valiant. Optimal algorithms for testing closeness of discrete distributions. In Proceedings of SODA, pages 1193-1203. Society for Industrial and Applied Mathematics (SIAM), 2014.
[9] Gregory Valiant and Paul Valiant. An automatic inequality prover and instance optimal identity testing. In Proceedings of FOCS, 2014.
[10] Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin. Testing Identity of Structured Distributions. In Proceedings of SODA, pages 1841-1854. Society for Industrial and Applied Mathematics (SIAM), 2015.
[11] Jayadev Acharya, Constantinos Daskalakis, Gautam Kamath. Optimal Testing for Properties of Distributions. In Advances in Neural Information Processing Systems (NIPS), 2015.
[12] Yihong Wu and Pengkun Yang. Minimax rates of entropy estimation on large alphabets via best polynomial approximation. ArXiV, abs/1407.0381, 2014.
[13] Yihong Wu and Pengkun Yang. Chebyshev polynomials, moment matching, and optimal estimation of the unseen. ArXiV, abs/1504.01227, 2015.
[14] Ilias Diakonikolas and Daniel Kane. A New Approach for Testing Properties of Discrete Distributions. In Proceedings of FOCS, 2016. (To appear. Also available as abs/1601.05557)
[15] Tuğkan Batu, Eldar Fischer, Lance Fortnow, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. InProceedings of FOCS, pages 442-451, 2001.
[16] Tuğkan Batu, Ravi Kumar, and Ronitt Rubinfeld. Sublinear algorithms for testing monotone and unimodal distributions. In Proceedings of STOC, pages 381-390, New York, NY, USA, 2004. ACM.
[17] Clément L. Canonne, Ilias Diakonikolas, Themis Gouleakis, and Ronitt Rubinfeld. Testing Shape Restrictions of Discrete Distributions, In Proceedings of STACS, pages 25:1–25:14, 2016.
[18] MIT Theory Blog. Distribution Testing: Do It With Class! https://mittheory.wordpress.com/2015/08/09/distribution-testing-do-it-with-class/
[19] Maryam Aliakbarpour, Eric Blais, Ronitt Rubinfeld. Learning and Testing Junta Distributions. In Proceedings of COLT, 2016. pp. 19–46, 2016
[20] Reut Levi, Dana Ron, and Ronitt Rubinfeld. Testing properties of collections of distributions. Theory of Computing, 9:295-347, 2013.
[21] Sourav Chakraborty, Eldar Fischer, Yonatan Goldhirsh, and Arie Matsliah. On the power of conditional samples in distribution testing. In Proceedings of ITCS, pages 561-580, New York, NY, USA, 2013. ACM.
[22] Clément L. Canonne, Dana Ron, and Rocco A. Servedio. Testing probability distributions using conditional samples. SIAM Journal on Computing, 2015.
[23] Clément L. Canonne and Ronitt Rubinfeld. Testing probability distributions underlying aggregated data. In Proceedings of ICALP, pages 283-295, 2014.

Call for feedback: upcoming “Introduction to Property Testing”

Forwarding an announcement by Oded Goldreich:

I would like to seek the help of this community regarding my forthcoming book Introduction to Property Testing (see http://www.wisdom.weizmann.ac.il/~oded/pt-intro.html). Specifically, I would be grateful for feedback regarding any aspect and part of the current text, although detailed comments are likely to be more helpful at this time than suggestions for drastic and/or structural changes. Please do not shy from indicating relevant works that may be mentioned (rather than described in detail), since it is quite possible that I was not aware of them or forgot to mention them (rather than chose not to include a detailed description due to various considerations).

Since I am likely to continue revising the text and posting the revisions on the website, please indicate the text-locations to which you refer in a manner that is most robust to revision (e.g., indicate relative location with respect to sectioning and/or theorem-environment numbers rather than with respect to page numbers).

Guest Post: Oded Goldreich, “Reducing testing affine spaces to testing linearity”

[We are delighted to have this month a blog post by Oded Goldreich, on a reduction from testing affinity of subspaces to testing linearity of Boolean functions. Oded also gave us open problems directly related to this post.]

We consider the task of testing whether a Boolean function \(f\colon \{0,1\}^\ell\to\{0,1\}\) is the indicator function of an \((\ell-k)\)-dimensional space. An optimal tester for this property was presented by Parnas, Ron, and Samorodnitsky (SIDMA, 2002), by mimicking the celebrated linearity tester (of Blum, Luby and Rubinfeld, JCSS, 1993) and its analysis. We show that this task can be reduced to testing the linearity of a related function \(g\colon\{0,1\}^\ell\to\{0,1\}^k\), yielding an almost optimal tester.

Continue reading

Lecture Notes on Property Testing

(Update: Krzysztof pointed out the resources at sublinear.info, which we forgot to mention. It makes for sense for all of this information to be there. We’ve added links there, and would urge our readers to post more information there. Beats having it in half-baked blogposts!)

Despite the relative maturity of property testing (more than a decade old!), we still lack a good textbook on the subject. At least, I definitely feel the need when I’m talking to interested students. I end up pointing to a bunch of papers, all with their own notation. We now have numerous courses being taught, so that certainly helps. And Oded Goldreich just pointed out his notes. So here’s a summary of stuff that I have found. Please let us know if you have other sources, including your own courses!

Oded Goldreich’s lecture notes
Ronitt Rubinfeld’s collection of surveys
Sofya Raskhodnikova’s course notes at Penn State
Eric Price’s course notes at UT Austin
Grigory Yaroslavtsev’s crash course at the University of Buenos Aires
Rocco Servedio’s course notes at Columbia

On Quantum Property Testing

(A guest post by Ashley Montanaro on quantum property testing, a blogpost on his survey. Thanks Ashley!)

Quantum property testing

As readers of this blog will be aware, the field of property testing is a vibrant and active area of research, with hundreds of papers having been written covering a vast range of topics. One direction which has perhaps been underexplored in recent years is the extent to which quantum computers could be useful in property testing. In this blog post, I will briefly discuss some examples of this kind. This is all based on a recent survey of quantum property testing, which is joint work with Ronald de Wolf. For technical details about everything I discuss here, and for references, please see this paper.

There are several ways in which one could generalise the idea of property testing to the quantum world. Perhaps the most obvious way, and the one which I will discuss in this post, is to consider quantum algorithms for testing properties of classical objects, with the goal being to find algorithms which are significantly faster than any possible classical property tester. We use the standard framework of property testing; the only difference is that our tester is allowed to be quantum.

A generalisation in a different direction which I will not discuss here is to testing whether quantum objects of some sort (such as quantum states or operations) have some property. This idea can be split into two cases, the first of which can be thought of as classical testers for quantum properties. While at first sight this idea might not seem to make much sense, in fact it is quite natural. For example, imagine we are given access to a quantum system which is claimed to be in a particular state, and we would like to verify that this is indeed the case. Further imagine that the only access to the system we are allowed is to perform measurements of some form, receiving classical measurement data. The problem of performing this kind of verification has been of both theoretical and practical interest to both physicists and computer scientists. Another, and very general, case we could consider is finding quantum testers for quantum properties. Here ideas and intuition from classical property testing can help develop algorithms for purely quantum tasks. See the survey and references therein for much more about the testing of quantum properties in each of these cases.

Quantum testers for classical properties

Quantum computers offer the prospect of dramatically faster algorithms for certain problems than are possible for classical computers. Property testing is a natural target for such algorithms, for at least a couple of reasons.

First, many of the most famous quantum algorithms known (such as Shor’s algorithm and Grover’s algorithm) operate in the query complexity model. Here the measure of complexity which we try to minimise is the number of queries to bits of the input used to compute some function of the input. Precisely the same model is usually used in property testing.

Second, it is known that quantum computers can only achieve super-polynomial speedups in the query complexity model for computing partial functions; that is, solving problems where we are promised that the algorithm will never receive certain inputs. In a property-testing problem we have such a promise: namely, the input either has some property, or is “far” from having that property.

This gives us hope that quantum algorithms could provide super-polynomial speedups for property-testing problems. However, it is not obvious that the (rather specific) nature of the promise in property testing lends itself to quantum speedup; indeed, it was only in 2002 that the first separation between quantum and classical property testers was demonstrated by Buhrman et al. In fact, this work gave two separations: \(O(1)\) vs. \(\Omega(\log n)\) for testing subsets of Hadamard codewords; and \(O(\log n \log \log n)\) vs. \(\Omega(n)\) for a property-testing variant of Simon’s problem.

Following these initial results, there has been a steady stream of other efficient quantum testers for classical properties. These are often based on applying in the property-testing context a quantum algorithm or primitive which has been useful elsewhere. One simple, and yet sometimes overlooked, way to obtain a more efficient quantum property tester from a classical property tester is via the important primitive of amplitude amplification, which is a more efficient quantum variant of classical probability amplification. Imagine we have a classical property tester for some property \(P\) such that the tester has perfect completeness (i.e. accepts objects with property \(P\) with certainty), uses \(q\) queries, and rejects objects \(\epsilon\)-far from \(P\) with probability \(p\). Then, using amplitude amplification, we obtain a quantum tester which rejects objects \(\epsilon\)-far from \(P\) with constant probability using \(O(q/\sqrt{p})\) queries. (Classical probability amplification would give \(O(q/p)\) queries.) For example, this can be applied to the standard classical tester for linearity of boolean functions to obtain a quantum property tester with constant success probability using \(O(1/\sqrt{\epsilon})\) queries, a quadratic improvement.

Some other examples where quantum algorithms which originated elsewhere have been used as part of efficient quantum property testers include:

  • An algorithm of Chakraborty et al. for a variant of testing periodicity of a sequence, where the quantum algorithm uses \(O(1)\) queries but any classical algorithm must make \(\Omega(n^{1/4})\) queries. The algorithm is based on Shor’s algorithm and the classical lower bound is based on generalising prior work of Lachish and Newman. Quantum testers for periodic functions on groups, and some other group-theoretic properties, had previously been given by Friedl et al..
  • A quantum algorithm of Atici and Servedio for testing \(k\)-juntas (boolean functions depending on at most \(k\) variables) using \(O(k/\sqrt{\epsilon})\) queries, which is a somewhat better complexity than the (later) best known classical bound of \(O(k \log k + k/\epsilon)\). The algorithm is based on the primitive of sampling from the Fourier spectrum of a boolean function, one of the earliest algorithmic ideas in quantum computing. The original algorithm of Atici and Servedio had a linear dependence on \(1/\epsilon\), but this can be improved to a square root using amplitude amplification.
  • The use of quantum counting by Bravyi et al. and Chakraborty et al. to give polynomial speedups for testing a number of properties of probability distributions: whether two distributions are equal, whether a distribution is equal to a fixed distribution, whether two distributions have disjoint support, etc.
  • The use of Ambainis’s efficient quantum algorithm for element distinctness to give polynomial speedups for testing bipartiteness and expansion of bounded-degree graphs. The basic idea here is to use the algorithm to efficiently find collisions in short random walks on the graph.

The last of these highlights an important gap in our knowledge: could there exist an exponential quantum speedup for testing a property of graphs? It is known that for problems with “too much” symmetry, quantum algorithms can achieve only a polynomial speedup over classical algorithms in the query complexity model. Graph properties (being invariant under permutations of the vertices) have an intermediate level of symmetry, and it is not clear whether a super-polynomial speedup is achievable in this setting.

Lower bounds

As well as the development of new quantum algorithms, an integral part of many of the above works demonstrating quantum-classical separations is the proof of new lower bounds on the number of classical queries required for property testing. It is also of interest to prove lower bounds on the quantum query complexity of property testing. This can be a challenging task. Indeed, one of the most successful techniques for proving quantum lower bounds, the adversary method originally introduced by Ambainis, suffers from a “property testing barrier” which prevents its application to property testing problems. This technique has subsequently been generalised to the negative-weight adversary method, which overcomes this barrier and indeed completely characterises quantum query complexity. However, this bound can be difficult to apply in practice and I am not aware of any examples of its use in property testing.

A different technique which has been successfully applied to lower-bound the quantum query complexity of various property testing problems is the method of polynomials. Here the idea is to exploit the fact that the acceptance probability of any quantum algorithm making a small number of queries is a low-degree multivariate polynomial. By proving a lower bound on the degree of any bounded polynomial which takes a large value on inputs with property \(P\), and a small value on inputs far from having property \(P\), we get a lower bound on the number of queries required by any quantum algorithm which tests \(P\).

One example where this idea can be applied is the testing of properties which correspond to linear codes. It turns out that for any code whose dual code has minimum distance \(d\), we get an \(\Omega(d)\) quantum lower bound on the number of queries required to test membership in the code. One such case is the Reed-Muller code of order \(d\), or equivalently the set of degree-\(d\) polynomials over \(F_2\); this argument can be used to show that any quantum algorithm that tests the property of being a degree-\(d\) polynomial must make \(\Omega(2^d)\) queries. Another, and much more complicated, example of the use of this method is a lower bound by Ambainis et al. on the quantum query complexity of testing expansion in graphs, which rules out an exponential quantum speedup for testing this property.

Open questions

The field of quantum property testing is just getting started and there are a number of interesting open questions. In particular, the quantum complexity of testing many properties which have been studied classically is wide open. Some examples include monotonicity, decision tree complexity, and almost all properties of graphs. In some cases, we might hope to find significant (e.g. exponential) speedups over any possible classical tester. The quantum property testers which have been found so far are largely based on applying quantum algorithms which originated elsewhere; but we could also hope to develop entirely new algorithmic techniques to attack these problems.

As well as considering specific problems, there are more wide-ranging questions which could be addressed. Can we achieve significant quantum speedups for testing properties with a high degree of symmetry – and more generally, what can we say about the structure of properties which admit efficient quantum testers? Can we find better (or simpler) lower-bound techniques for quantum property testers? Blais et al. have proven classical lower bounds based on an elegant technique using communication complexity – can their ideas be extended to prove quantum lower bounds based on quantum communication complexity?

Thinking more widely still, the question of just what it is about the structure of certain problems which makes them amenable to quantum speedup is of great interest and far from being resolved. Property testing provides a lens with which to study this question, whether by finding new examples of quantum speedup (or the lack thereof), or by uncovering deeper structural features of properties which determine whether they have efficient quantum testers. In both cases, there is the potential for new developments in quantum property testing to significantly influence quantum algorithmics more generally.

Survey on Testing Affine-Invariant Properties

Arnab has put together a wonderful survey on testing affine-invariant properties (ECCC). Consider functions \(f:\mathbb{F}^n_p \to R\) (where \(R\) is usually a finite range). A property is called affine-invariant if it closed under affine transformations of the domain. The classic example of such a property is that of being a low-degree polynomial. This is a very rich and beautiful area of research, with the initial inspiration being work in trying to understand what makes a property testable. I will also add that this is a deep (somewhat inaccessible to the non-expert?) area of research, so thanks Arnab for this excellent survey!