News for June 2021

A quieter month after last month’s bonanza. One (applied!) paper on distribution testing, a paper on tolerant distribution testing, and a compendium of open problems. (Ed: alas, I missed the paper on tolerant distribution testing, authored by one of our editors. Sorry, Clément!)

Learning-based Support Estimation in Sublinear Time by Talya Eden, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, and Tal Wagner (arXiv). A classic problem in distribution testing is that of estimating the support size $$n$$ of an unknown distribution $$\mathcal{D}$$. (Assume that all elements in the support have probability at least $$1/n$$.) A fundamental result of Valiant-Valiant (2011) proves that the sample complexity of this problem is $$\Theta(n/\log n)$$. A line of work has emerged in trying to reduce this complexity, with additional sources of information. Canonne-Rubinfeld (2014) showed that, if one can query the exact probabilities of elements, then the complexity can be made independent of $$n$$. This paper studies a robust version of this assumption: suppose, we can get constant factor approximations to the probabilities. Then, the main result is that we can get a query complexity of $$n^{1-1/\log(\varepsilon^{-1})} \ll n/\log n$$ (where the constant $$\varepsilon$$ denotes the additive approximation to the support size). This paper also does empirical experiments to show that the new algorithm is indeed better in practice. Moreover, it shows that existing methods degraded rapidly with poorer probability estimates, while the new algorithm maintains its accuracy even with such estimates.

The Price of Tolerance in Distribution Testing by Clément L. Canonne, Ayush Jain, Gautam Kamath, and Jerry Li (arXiv). While we have seen many results in distribution testing, the subject of tolerance is one that hasn’t received as much attention. Consider the problem of testing if unknown distribution $$\mathcal{p}$$ (over domain $$[n]$$) is the same as known distribution $$\mathcal{q}$$. We wish to distinguish $$\varepsilon_1$$-close from $$\varepsilon_2$$-far, under total variation distance. When $$\varepsilon_1$$ is zero, this is the standard property testing setting, and classic results yield $$\Theta(\sqrt{n})$$ sample complexity. If $$\varepsilon_1 = \varepsilon_2/2$$, then we are looking for a constant factor approximation to the distance. And the complexity is $$\Theta(n/\log n)$$. Surprisingly, nothing was known in better. Until this paper, that is. The main result gives a complete characterization of sample complexity (up to log factors), for all values of $$\varepsilon_1, \varepsilon_2$$. Remarkably, the sample complexity has an additive term $$(n/\log n) \cdot (\varepsilon_1/\varepsilon^2_2)$$. Thus, when $$\varepsilon_1 > \sqrt{\varepsilon_2}$$, the sample complexity is $$\Theta(n/\log n)$$. When $$\varepsilon_1$$ is smaller, the main result gives a smooth dependence on the sample complexity. One the main challenges is that existing results use two very different techniques for the property testing vs constant-factor approximation regimes. The former uses simpler $$\ell_2$$-statistics (e.g. collision counting), while the latter is based on polynomial approximations (estimating moments). The upper bound in this paper shows that simpler statistics based on just the first two moments suffice to getting results for all regimes of $$\varepsilon_1, \varepsilon_2$$.

Open Problems in Property Testing of Graphs by Oded Goldreich (ECCC). As the title clearly states, this is a survey covering a number of open problems in graph property testing. The broad division is based on the query model: dense graphs, bounded degree graphs, and general graphs. A reader will see statements of various classic open problems, such as the complexity of testing triangle freeness for dense graphs and characterizing properties that can be tested in $$poly(\varepsilon^{-1})$$ queries. Arguably, there are more open problems (and fewer results) for testing in bounded degree graphs, where we lack broad characterizations of testable properties. An important, though less famous (?), open problem is that of the complexity of testing isomorphism. It would appear that the setting of general graphs, where we know the least, may be the next frontier for graph property testing. A problem that really caught my eye: can we transform testers that work for bounded degree graphs into those that work for bounded arboricity graphs? The latter is a generalization of bounded degree that has appeared in a number of recent results on sublinear graph algorithms.