News for November 2019

We hit the mother-lode of property testing papers this month. Stick with us, as we cover 10 (!) papers that appeared online in November.

EDIT: We actually have 11 papers, check out Optimal Adaptive Detection of Monotone Patterns at the bottom.

Testing noisy linear functions for sparsity, by Xue Chen, Anindya De, and Rocco A. Servedio (arXiv). Given samples from a noisy linear model $$y = w\cdot x + \mathrm{noise}$$, test whether $$w$$ is $$k$$-sparse, or far from being $$k$$-sparse. This is a property testing version of the celebrated sparse recovery problem, whose sample complexity is well-known to be $$O(k\log n)$$, where the data lies in $$\mathbb{R}^n$$. This paper shows that the testing version of the problem can be solved (tolerantly) with a number of samples independent of $$n$$, assuming technical conditions: the distribution of coordinates of $$x$$ are i.i.d. and non-Gaussian, and the noise distribution is known to the algorithm. Surprisingly, all these conditions are needed, otherwise the dependence on $$n$$ is $$\tilde \Omega(\log n)$$, essentially the same as the recovery problem.

Pan-Private Uniformity Testing, by Kareem Amin, Matthew Joseph, Jieming Mao (arXiv). Differentially private distribution testing has now seen significant study, in both the local and central models of privacy. This paper studies a distribution testing in the pan-private model, which is intermediate: the algorithm receives samples one by one in the clear, but it must maintain a differentially private internal state at all time steps. The sample complexity turns out to be qualitatively intermediate to the two other models: testing uniformity over $$[k]$$ requires $$\Theta(\sqrt{k})$$ samples in the central model, $$\Theta(k)$$ samples in the local model, and this paper shows that $$\Theta(k^{2/3})$$ samples are necessary and sufficient in the pan-private model.

Almost Optimal Testers for Concise Representations, by Nader Bshouty (ECCC). This work gives a unified approach for testing for a plethora of different classes which possess some sort of sparsity. These classes include $$k$$-juntas, $$k$$-linear functions, $$k$$-terms, various types of DNFs, decision lists, functions with bounded Fourier degree, and much more.

Unified Sample-Optimal Property Estimation in Near-Linear Time, by Yi Hao and Alon Orlitsky (arXiv). This paper presents a unified approach for estimating several distribution properties with both near-optimal time and sample complexity, based on piecewise-polynomial approximation. Some applications include estimators for Shannon entropy, power sums, distance to uniformity, normalized support size, and normalized support coverage. More generally, results hold for all Lipschitz properties, and consequences include high-confidence property estimation (outperforming the “median trick”) and differentially private property estimation.

Testing linear-invariant properties, by Jonathan Tidor and Yufei Zhao (arXiv). This paper studies property testing of functions which are in a formal sense, definable by restrictions to subspaces of bounded degree. This class of functions is a broad generalization of testing whether a function is linear, or a degree-$$d$$ polynomial (for constant $$d$$). The algorithm is the oblivious one, which simply repeatedly takes random restrictions and tests whether the property is satisfied or not (similar to the classic linearity test of BLR, along with many others).

Approximating the Distance to Monotonicity of Boolean Functions, by Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, Erik Waingarten (ECCC). This paper studies the following fundamental question in tolerant testing: given a Boolean function on the hypercube, test whether it is $$\varepsilon’$$-close or $$\varepsilon$$-far from monotone. It is shown that there is a non-adaptive polynomial query algorithm which can solve this problem for $$\varepsilon’ = \varepsilon/\tilde \Theta(\sqrt{n})$$, implying an algorithm which can approximate distance to monotonicity up to a multiplicative $$\tilde O(\sqrt{n})$$ (addressing an open problem by Sesh). They also give a lower bound demonstrating that improving this approximating factor significantly would necessitate exponentially-many queries. Interestingly, this is proved for the (easier) erasure-resilient model, and also implies lower bounds for tolerant testing of unateness and juntas.

Testing Properties of Multiple Distributions with Few Samples, by Maryam Aliakbarpour and Sandeep Silwal (arXiv). This paper introduces a new model for distribution testing. Generally, we are given $$n$$ samples from a distribution which is either (say) uniform or far from uniform, and we wish to test which is the case. The authors here study the problem where we are given a single sample from $$n$$ different distributions which are either all uniform or far from uniform, and we wish to test which is the case. By additionally assuming a structural condition in the latter case (it is argued that some structural condition is necessary), they give sample-optimal algorithms for testing uniformity, identity, and closeness.

Random Restrictions of High-Dimensional Distributions and Uniformity Testing with Subcube Conditioning, by Clément L. Canonne, Xi Chen, Gautam Kamath, Amit Levi, and Erik Waingarten (ECCC, arXiv). By now, it is well-known that testing uniformity over the $$n$$-dimensional hypercube requires $$\Omega(2^{n/2})$$ samples — the curse of dimensionality quickly makes this problem intractable. One option is to assume that the distribution is product, which causes the complexity to drop to $$O(\sqrt{n})$$. This paper instead assumes one has stronger access to the distribution — namely, one can receive samples conditioned on being from some subcube of the domain. With this, the paper shows that the complexity drops to the near-optimal $$\tilde O(\sqrt{n})$$ samples. The related problem of testing whether a distribution is either uniform or has large mean is also considered.

Property Testing of LP-Type Problems, by Rogers Epstein, Sandeep Silwal (arXiv). An LP-Type problem (also known as a generalized linear program) is an optimization problem sharing some properties with linear programs. More formally, they consist of a set of constraints $$S$$ and a function $$\varphi$$ which maps subsets of $$S$$ to some totally ordered set, such that $$\varphi$$ possesses monotonicity and locality properties. This paper considers the problem of testing whether $$\varphi(S) \leq k$$, or whether at least an $$\varepsilon$$-fraction of constraints in $$S$$ must be removed for $$\varphi(S) \leq k$$ to hold. This paper gives an algorithm with query complexity $$O(\delta/\varepsilon)$$, where $$\delta$$ is a dimension measure of the problem. This is applied to testing problems for linear separability, smallest enclosing ball, smallest intersecting ball, smallest volume annulus. The authors also provide lower bounds for some of these problems as well.

Near-Optimal Algorithm for Distribution-Free Junta Testing, by Xiaojin Zhang (arXiv). This paper presents an (adaptive) algorithm for testing juntas, in the distribution-free model with one-sided error. The query complexity is $$\tilde O(k/\varepsilon)$$, which is nearly optimal. Algorithms with this sample complexity were previously known under the uniform distribution, or with two-sided error, but this is the first paper to achieve it in the distribution-free model with one-sided error.

Optimal Adaptive Detection of Monotone Patterns, by Omri Ben-Eliezer, Shoham Letzter, Erik Waingarten (arXiv). Consider the problem of testing whether a function has no monotone increasing subsequences of length $$k$$, versus being $$\varepsilon$$-far from having this property. Note that this is a generalization of testing whether a function is monotone (decreasing), which corresponds to the case $$k = 2$$. This work shows that the adaptive sample complexity of this problem is $$O_{k,\varepsilon}(\log n)$$, matching the lower bound for monotonicity testing. This is in comparison to the non-adaptive sample complexity, which is $$O_{k,\varepsilon}((\log n)^{\lfloor \log_2 k\rfloor})$$. In fact, the main result provides a certificate of being far, in the form of a monotone increasing subsequence of length $$k$$.