# News for April 2018

Three Four papers for April: a new take on linearity testing, results on sublinear algorithms with advice, histogram testing, and distributed inference problems. (Edit: Sorry Clément, for missing your paper on distributed inference!)

Testing Linearity against Non-Signaling Strategies, by Alessandro Chiesa, Peter Manohar, and Igor Shinkar (ECCC). This paper gives a new model for property testing, through the notion of non-signaling strategies. The exact definitions are quite subtle, but here’s a condensed view. For $$S \subseteq \{0,1\}^n$$, let an $$S$$-partial function be one that is only defined on $$S$$. Fix a “consistency” parameter $$k$$. Think of the “input” as a collection of distributions, $$\{\mathcal{F}_S\}$$, where each $$|S| \leq k$$ and $$\mathcal{F}_S$$ is a distribution of $$S$$-partial functions. We have a local consistency requirement: $$\{\mathcal{F}_S\}$$ and $$\{\mathcal{F}_T\}$$ must agree (as distributions) on restrictions to $$S \cap T$$. In some sense, if we only view pairs of these distributions of partial functions, it appears as if they come from a single distributions of total functions. Let us focus on the classic linearity tester of Blum-Luby-Rubinfeld in this setting. We pick random $$x, y, x+y \in \{0,1\}^n$$ as before, and query these values on a function $$f \sim {\mathcal{F}_{x,y,x+y}}$$. The main question addressed is what one can say about $$\{\mathcal{F}_S\}$$, if this linearity test passes with high probability. Intuitively (but technically incorrect), the main result is that $$\{\mathcal{F}_S\}$$ is approximated by a “quasi-distribution” of linear functions.

An Exponential Separation Between MA and AM Proofs of Proximity, by Tom Gur, Yang P. Liu, and Ron D. Rothblum (ECCC). This result follows a line of work on understanding sublinear algorithms in proof systems. Think of the standard property testing setting. There is a property $$\mathcal{P}$$ of $$n$$-bit strings, an input $$x \in \{0,1\}^n$$, and a proximity parameter $$\epsilon > 0$$. We add a proof $$\Pi$$ that the tester (or the verifier) is allowed to use, and we define soundness and completeness in the usual sense of Arthur-Merlin protocols. For a $$\mathbb{MA}$$-proof of proximity, the proof $$\Pi$$ can only depend on the string $$x$$. In a $$\mathbb{AM}$$-proof of proximity, the proof can additionally depend on the random coins of the tester (which determine the indices of $$x$$ queried). Classic complexity results can be used to show that the latter subsume the former, and this paper gives a strong separation. Namely, there is a property $$\mathcal{P}$$ where any $$\mathbb{MA}$$-proof of proximity protocol (or tester) requires $$\Omega(n^{1/4})$$-queries of the input $$x$$, but there exists an $$\mathbb{AM}$$-proof of proximity protocol making $$O(\log n)$$ queries. Moreover, this property is quite natural; it is simply the encoding of permutations.

Testing Identity of Multidimensional Histograms, by Ilias Diakonikolas, Daniel M. Kane, and John Peebles (arXiv). A distribution over $$[0,1]^d$$ is a $$k$$-histogram if the domain can be partitioned into $$k$$ axis-aligned cuboids where the probability density function is constant. Recent results show that such histograms can be learned in $$k \log^{O(d)}k$$ samples (ignoring dependencies on accuracy/error parameters). Can we do any better for identity testing? This paper gives an affirmative answer. Given a known $$k$$-histogram $$p$$, one can test (in $$\ell_1$$-distance) whether an unknown $$k$$-histogram $$q$$ is equal to $$p$$ in (essentially) $$\sqrt{k} \log^{O(d)}(dk)$$ samples. There is a nearly matching lower bound, when $$k = \exp(d)$$.

Distributed Simulation and Distributed Inference, by Jayadev Acharya, Clément L. Canonne, and Himanshu Tyagi (arXiv   ECCC). This papers introduces a model of distributed simulation, which generalizes distribution testing and distributed density estimation. Consider some unknown distribution $$\mathcal{D}$$ with support $$[k]$$, and a “referee” who wishes to generate a single sample from $$\mathcal{D}$$ (alternately, she may wish to determine if $$\mathcal{D}$$ has some desired property). The referee can communicate with “players”, each of whom can generate a single independent sample from $$\mathcal{D}$$. The catch is that each player can communicate at most $$\ell$$ < $$log_2k$$ bits (otherwise, the player can simply communicate the sampled element). How many players are needed for the referee to generate a single sample? The paper first proves that this task is basically impossible with a (worst-case) finite number of players, but can be done with expected $$O(k/2^\ell)$$ players (and this is optimal). This can plugged into standard distribution testing results, to get inference results in this distributed, low-communication setting. For example, the paper shows that identity testing can be done with $$O(k^{3/2}/2^\ell)$$ players.