# News for July 2020

We hope you’re all staying safe and healthy! To bring you some news (and distraction?) during this… atypical summer,here are the recent papers on property testing and sublinear algorithms we saw appear this month. Graphs, probability distributions, functions… there is a something for everyone.

On Testing Hamiltonicity in the Bounded Degree Graph Model, by Oded Goldreich (ECCC). The title sort of gives it away: this relatively short paper shows that testing whether an unknown bounded-degree graph has a Hamiltonian path (or Hamiltonian cycle) in the bounded-degree model requires a number of queries linear in $$n$$, the number of nodes. The results also hold for directed graphs (with respect to directed Hamiltonian path or cycle), and are shown via a local reduction to a promise problem of satisfiability of 3CNF formulae. Also included: a complete proof of the linear lower bound for another problem, Independent Set Size; and an open problem: what is the query complexity of testing graph isomorphism in the bounded-degree model?

Local Access to Sparse Connected Subgraphs Via Edge Sampling, by Rogers Epstein (arXiv). Given access to a connected graph $$G=(V,E)$$, can we efficiently provide access to some sparse connected subgraph $$G’=(V,E’)\subseteq G$$ with $$|E’|\ll |E|$$? This question, well-studied in particular for the case where $$G$$ had bounded degree and the goal is to achieve $$|E’|\leq (1-\varepsilon)|V|$$, is the focus of this paper which provides a trade-off between the query complexity of the oracle and $$|E’|$$. Specifically, for every parameter $$T$$, one can give oracle access to $$G’$$ with $$|E’|=O(|V|T)$$, with a query complexity $$=\tilde{O}(|E|/T)$$.

Switching gears, we move from graphs to probability distributions:

Tolerant Distribution Testing in the Conditional Sampling Model, by Shyam Narayanan (arXiv). In the conditional sampling model for distribution testing, which we have covered a few times on this blog, the algorithm at each step gets to specify a subset $$S$$ of the domain, and observe a sample from the distribution conditioned on $$S$$. As it turns out, this can speed things up a lot: as Canonne, Ron, and Servedio (2015) showed, even tolerant uniformity testing, which with i.i.d. samples requires a near-linear (in the domain size $$n$$) number of samples, can be done in a constant number of conditional queries. Well, sort of constant: no dependence on $$n$$, but the dependence on the distance parameter $$\varepsilon$$ was, in CRS15, quite bad: $$\tilde{O}(1/\varepsilon^{20})$$. This work gets rid of this badness, and shows the (nearly) optimal $$\tilde{O}(1/\varepsilon^{2})$$ query complexity! Among other results, it also generalizes it to tolerant identity testing ($$\tilde{O}(1/\varepsilon^{4})$$), for which previously no constant-query upper bound was known. Things have become truly sublinear.

Interactive Inference under Information Constraints, by Jayadev Acharya, ClĂ©ment Canonne, Yuhan Liu, Ziteng Sun, and Himanshu Tyagi (arXiv). Say you want to do uniformity/identity testing (or learn, but let’s focus on testing) on a discrete distribution, but you can’t actually observe the i.i.d. samples: instead, you can only do some sort of limited, “local” measurement on each sample. How hard is the task, compared to what you’d do if you fully had the samples? This setting, which captures things like distributed testing with communication or local privacy constraints, erasure channels, etc., was well-understood from previous recent work in the non-adaptive setting. But what if the “measurements” could be made adaptively? This paper shows general lower bounds for identity testing and learning, as a function of the type of local measurement allowed: as a corollary, this gives tight bounds for communication constraints and local privacy, and shows the first separation between adaptive and non-adaptive uniformity testing, for a type of “leaky” membership query measurement.

Efficient Parameter Estimation of Truncated Boolean Product Distributions, by Dimitris Fotakis, Alkis Kalavasis, and Christos Tzamos (arXiv). Suppose there is a fixed and unknown subset $$S$$ of the hypercube, a “truncation” set, which you can only accessible via membership query; and you receive i.i.d. samples from an unknown product distribution on the hypercube, truncated on that set $$S$$ (for instance, because your polling strategy or experimental measurements have limitations). Can you still learn that distribution efficiently? Can you test it for various properties, as you typically really would like to? (or is it just me?) This paper identifies some natural sufficient condition on $$S$$, which they call fatness, under which the answer is a resounding yes. Specifically, if $$S$$ satisfies this condition, one can actually generate honest-to-goodness i.i.d. samples (non-truncated) from the true distribution, given truncated samples!

Leaving distribution testing, our last paper is on testing functions in the distribution-free model:

Downsampling for Testing and Learning in Product Distributions, by Nathaniel Harms and Yuichi Yoshida (arXiv). Suppose you want to test (or learn) a class of Boolean functions $$\mathcal{C}$$ over some domain $$\Omega^n$$, with respect to some (unknown) product distribution (i.e., in the distribution-free testing model, or PAC-learning model). This paper develops a general technique, downsampling, which allows one to reduce such distribution-free testing of $$\mathcal{C}$$ under a product distribution to testing $$\mathcal{C}$$ over $$[r]^d$$ under the uniform distribution, for a suitable parameter $$r=r(d,\varepsilon,\mathcal{C})$$. This allows the authors, among many other things and learning results, to easily re-establish (and, in the second case, improve upon) recent results on testing of monotonicity over $$[n]^d$$ (uniform distribution) and over $$\mathbb{R}^d$$ (distribution-free).