Happy new year! I just looked back at our archives, and saw that PTReview has been on from July 2013. I’m happy (and maybe mildly surprised) that it’s still going strong. And so the chronicling of \(o(n)\) continues…
Much coolness we have to report: permutation testing, linearity testing, distribution testing, and monotonicity testing. Without further ado:
Large permutations and parameter testing by Roman Glebov, Carlos Hoppen, Tereza Klimosova, Yoshiharu Kohayakawa, Daniel Kral, and Hong Liu (arXiv). Just as typical dense graph testing involves checking properties on a random, constant-sized induced subgraph, we can look at permutation testers that test properties of permutations by sampling sub-permutations. The theory of dense graph testing is closely tied to the Szemeredi regularity lemma, the notion of graph limits, and the behavior of subgraph densities. (Check out this survey by Borgs et al.) An analogous theory for permutations has been built by a subset of the authors in a series of papers (survey). There is a notion of permutation properties/parameters that are forcible, meaning that two permutations that have similar frequencies of (a fixed, finite set of) subpermutations have close values of the parameter. This seems naturally connected to testing. Indeed, a permutation parameter that is forcible can be approximated in constant-time by simply approximating the frequencies of the subpermutations. This paper shows that converse is false: there is an constant-time approximable property that is not forcible.
A Self-Tester for Linear Functions over the Integers with an Elementary Proof of Correctness by Sheela Devadas and Ronitt Rubinfeld (arXiv). The venerable property of linearity testing needs no introduction, and has enough history to fill up this month’s post (and the next and the next). But what if the domain is n-bit integers and we care about running time in terms of bit operations? Suppose we wish to check if a program supposed to compute \(f(x) = b\cdot x\) (for fixed, known \(b\)). Inputs like \(2^k\) are easy to evaluate by the checker, and could be used to get faster checkers. This paper gives testers for linearity, multivariate linear functions, and multilinear functions, where computation time is always linear in sample complexity.
\(\ell_p\) Testing and Learning of Discrete Distributions by Bo Waggoner (arXiv). Distribution testing needs little introduction, and we have seen much progress over our time in PTReview. Let’s start with the basics. The seminal work of Batu et al. and later Paninksi showed that the sample complexity of testing uniformity of a distribution \(\mathcal{D}\) over universe \([n]\) is \(\Theta(\sqrt{n}/\epsilon^2)\). Meaning, the sample complexity of checking \(\|\mathcal{D} – \mathcal{U}\|_1 > \epsilon\) (where \(\mathcal{U}\) is uniform on \([n]\)) is \(\Theta(\sqrt{n}/\epsilon^2)\). But what if we had \(\ll \sqrt{n}/\epsilon^2\) samples? From prior work, nothing could be inferred. (Update: As Ilias Diakonikolas pointed out to me, the \(\ell_2\) results were previously known, both by the Batu et al. work and a recent paper by Chan et al. that settles the \(\ell_2\) question.) This paper changes that, and says that we can still infer something about other \(\ell_p\) norms, \(\|\mathcal{D} – \mathcal{U}\|_p\). What impressed me about this paper is the detailed understand of the interplay between \(n, \epsilon, p\). For example, the sample complexity of uniformity testing over \(\ell_p\)-norm for any \(p > 1\) is independent of \(n\). There are many, many results in this paper with an intriguing threshold phenomenon at \(p=2\). For distribution testing in practice, I would think that this result would be of much significance.
New algorithms and lower bounds for monotonicity testing by Xi Chen, Rocco A. Servedio, and Li-Yang Tan (arXiv). Ah yes, Boolean monotonicity testing. Consider the standard coordinate wise partial order on \(\{0,1\}^n\), given by \(\prec\). A function \(f:\{0,1\}^n \rightarrow \{0,1\}\) is monotone if \(\forall x \prec y, f(x) \leq f(y)\). The complexity of property testing (Boolean) monotonicity is one of those tantalizing, simple-to-state questions that is still quite open. I’ll spare you the full story and the epsilons, but here’s the status. The best upper bound is a non-adaptive, one-sided \(O(n^{7/8})\) tester by Chakrabarty and Seshadhri. The best lower bound is a non-adaptive, one-sided lower bound of \(\Omega(\sqrt{n})\) by Fischer et al. This implies an \(\Omega(\log\log n)\) lower bound for general testers. This paper changes all of this. The authors prove a \(\Omega(n^{1/5})\) lower bound for two-sided non-adaptive testers, leading to an exponentially better \(\Omega(\log n)\) lower bound for general testers. The main insight is to focus on monotone vs non-monotone families of linear threshold functions, and show that procedures making few (non-adaptive) queries cannot distinguish between such families. The main hammer is to use recent Central Limit Theorems. As an added bonus, this paper improves the upper bound for monotonicity testing to \(O(n^{5/6})\), with a modified tester (and better analysis) of Chakrabarty and Seshadhri. But can the lower bound be improved?
Boolean function monotonicity testing requires (almost) \(n^{1/2}\) non-adaptive queries by Xi Chen, Anindya De, Rocco A. Servedio, and Li-Yang Tan (arXiv). Yes, the lower bound can be improved. This paper gets the non-adaptive two-sided bound to (almost) \(\Omega(\sqrt{n})\), matching the one-sided Fischer et al bound. The paper proves improved Central Limit Theorems, tailored for this application. The authors, and I with them, believe that this is the true complexity. At that intriguing note, we end 2014!