# News for November 2016

November was quite eventful for property testing, with six exciting new results for you to peruse.

Alice and Bob Show Distribution Testing Lower Bounds (They don’t talk to each other anymore.), by Eric Blais, Clément L. Canonne, and Tom Gur (ECCC). The authors examine distribution testing lower bounds through the lens of communication complexity, a-la Blais, Brody, and Matulef, who previously showed such a connection for property testing lower bounds in the Boolean function setting. In this work, the authors’ main result involves testing identity to a specific distribution $$p$$. While Valiant and Valiant showed tight bounds involving the $$\ell_{2/3}$$-quasinorm of $$p$$, this paper gives tight bounds using a different quantity, namely Peetre’s $$K$$-functional. Their techniques also give lower bounds for several other properties (some old and some new), including monotonicity, sparse symmetric support, and $$k$$-juntas in the PAIRCOND model.

Fast property testing and metrics for permutations, by Jacob Fox and Fan Wei (arXiv). This paper proves a general testing result for permutations. In particular, it shows that any hereditary property of permutations is two-sided testable with respect to the rectangular distance with a constant number of queries. While in many such testing results on combinatorial objects (such as graphs), a “constant number of queries” may be exorbitantly large (due to complexities arising from an application of the strong regularity lemma), surprisingly, the complexity obtained in this paper is polynomial in $$1/\varepsilon$$.

A Unified Maximum Likelihood Approach for Optimal Distribution Property Estimation, by Jayadev Acharya, Hirakendu Das, Alon Orlitsky, and Ananda Theertha Suresh (arXiv, ECCC). There has been a considerable deal of work recently on estimating several symmetric distribution properties, namely support size, support coverage, entropy, and distance to uniformity. One drawback of these results is that, despite the similarities between these properties, seemingly different techniques are required to obtain optimal rates for each. This paper shows that one concept, pattern maximum likelihood (PML), unifies them all. A PML distribution of a multiset of samples is any distribution which maximizes the likelihood of observing the multiplicities of the multiset, after discarding the labels on the elements. This can behave quite differently from the sequence maximum likelihood (SML), or empirical distribution. In particular, if a multiset of samples on support $$\{x, y\}$$ is $$\{x, y, x\}$$, then the SML is $$(2/3, 1/3)$$, while the PML is $$(1/2, 1/2)$$. The main result of this paper is, if one can approximate PML, then applying the plug-in estimator gives the optimal sample complexity for all of the aforementioned properties. The one catch is that efficient approximation of the PML is currently open. Consider the gauntlet thrown to all our readers!

Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures, by Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart (arXiv, ECCC). While the main focus of this work is on lower bounds for distribution estimation in the statistical query (SQ) model, this paper also has some interesting lower bounds for multivariate testing problems. Namely, they show that it is impossible to achieve a sample complexity which is significantly sublinear in the dimension for either of the following two problems:

• Given samples from an $$n$$-dimensional distribution $$D$$, distinguish whether $$D = \mathcal{N}(0,I)$$ or $$D$$ is $$\varepsilon/100$$-close to any $$\mathcal{N}(\mu, I)$$ where $$\|\mu\|_2 \geq \varepsilon$$.
• Given samples from an $$n$$-dimensional distribution $$D$$, distinguish whether $$D = \mathcal{N}(0,I)$$ or $$D$$ is a mixture of $$k$$ Gaussians with almost non-overlapping components.

Collision-based Testers are Optimal for Uniformity and Closeness, by Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price (arXiv, ECCC). In the TCS community, the seminal results in the field of distribution testing include the papers of Goldreich and Ron and Batu et al., which study uniformity testing and $$\ell_2$$-closeness testing (respectively) using collision based testers. While these testers appeared to be lossy, subsequent results have attained tight upper and lower bounds for these problems. As suggested by the title, this paper shows that collision-based testers actually achieve the optimal sample complexities for uniformity and $$\ell_2$$-closeness testing.

Testing submodularity and other properties of valuation functions, by Eric Blais, and Abhinav Bommireddi (arXiv). This paper studies the query complexity of several properties which have been studied in the context of valuation functions in algorithmic game theory. These properties are real-valued functions over the Boolean hypercube, and include submodularity, additivity, unit-demand, and much more. The authors show that, for constant $$\varepsilon$$ and any $$p \geq 1$$, these properties are constant-sample testable. Their results are obtained via an extension of the testing by implicit learning method of Diakonikolas et al.

## 1 thought on “News for November 2016”

1. Oded Goldreich

See my exposition regarding “Collision-based Testers are Optimal for Uniformity and Closeness” posted at http://www.wisdom.weizmann.ac.il/~oded/p_cpt.html

The collision probability tester, introduced by Goldreich and Ron (ECCC, TR00-020, 2000), distinguishes the uniform distribution over $[n]$ from any distribution that is $\eps$-far from this distribution using $\poly(1/\eps)\cdot{\sqrt n}$ samples. While the original analysis established only an upper bound of $O(1/\eps)^4\cdot{\sqrt n}$ on the sample complexity, a recent analysis of Diakonikolas, Gouleakis, Peebles, and Price (ECCC, TR16-178, 2016) established the optimal upper bound of $O(1/\eps)^2\cdot{\sqrt n}$. In this note we survey their analysis, while highlighting the sources of improvement. Specifically:
* While the original analysis reduces the testing problem to approximating the collision probability of the unknown distribution up to a $1+\eps^2$ factor, the improved analysis capitalizes on the fact that the latter problem needs only be solved “at the extreme” (i.e., it suffices to distinguish the uniform distribution, which has collision probability $1/n$, from any distribution that has collision probability exceeding $(1+4\eps^2)/n$).
* While the original analysis provides an almost optimal analysis of the variance of the estimator when $\eps=\Omega(1)$, a more careful analysis yields a significantly better bound for the case of $\e=o(1)$, which is the case that is relevant here.