So, we had a pretty fantastic September. Besides the fact that September saw eight papers, the Computer Science community also bagged two Nobel Prizes(!) — reactions to which are kind of mixed from what I see around me. Anyhow without further delay, let us circle back to property testing. So, eight papers, yes. Without further ado, let us look at our spread.
Public Coin Interactive Proofs for Label-Invariant Distribution Properties by Tal Herman (ECCC) Suppose you have an unknown distribution \(\mathcal{D}\) supported over \([N]\). Suppose I claim that this distribution has entropy \(\texttt{blah}\). You have sample access to \(\mathcal{D}\) and you want to check my claim. To this end, you decide to engage me, a suspicious shady prover, in an Interactive Protocol where you, the verifier is restricted to use public coin tosses. The main result of this paper asserts that you can do this with a mere \(\widetilde{O}(N^{2/3})\) samples. You also get the same bound on the communication complexity. What’s more is that you get algorithms with running time of the same order. What’s even more, is that you get similar algorithms for all label invariant properties of \(\mathcal{D}\). You can contrast this with the seminal result of Valiant and Valiant from 2011 which asserts that you can approximate the distance between your input distribution \(\mathcal{D}\) and the label invariant property of interest (without any prover) using \(\Theta(N/\log N)\) samples. So, this result shows that the computation under the interactive model is more efficient than standalone computation even in the public coin toss model.
How to Verify Any (Reasonable) Distribution Property: Computationally Sound Argument Systems for Distributions by Tal Herman and Guy Rothblum (arXiv) Let us consider the same setup as above. Again, you have an unknown distribution (supported over \([N]\)) which you have sample access to and I assert that this distribution has a certain property. You want to verify whether my claim is correct or bogus. The main algorithmic result of this paper has a cryptographic ring to it: Assuming the existence of collision resistant hash functions, the authors show that for any reasonable distribution property, you have set up an interactive protocol where the verifier can decide whether or not the prover’s claim is bogus using \(\widetilde{O}(\sqrt N)\) samples. Also, the communication complexity is of the same order.
Random local access for sampling k-SAT solutions by Dingding Dong and Nitya Mani (arXiv) I find this paper pretty intriguing. So, here is the setup. I give you some \(k\)-SAT instance and I promise that no variable in this instance shows up in more than \(d\) clauses. Recall that for \(d \leq 2^k/(2e k)\), Lovasz Local Lemma assures you that there exists a satisfying assignment. What’s more is that the famous Moser-Tardos algorithm even allows you to find one satisfying assignment in polynomial time. In a beautiful work, in the regime where \(d \leq 2^{ck}\) (where \(0 < c < 1\) is a sufficiently small constant), Moitra gave samplers which return an almost uniformly random satisfying assignment. Note that this is not possible direct adaptation the Moser-Tardos algorithm. Anyhow, back to the setting considered in this work. So, consider the following sublinear challenge. Denote by \(\mu\) the uniform distribution supported over all satisfying assignments to the given \(k\)-SAT instance where each variable shows up in more than \(d \leq 2^{ck}\) clauses. I want you to sample the assignment \(\sigma(v)\) for a variable \(v\) from the marginal \(\mu_v\) induced by \(\mu\) on \(v\) fast (in \(poly(\log n)\) time) in expectation. Of course this has to be done in some consistent fashion overall which is very LCAish in flavor and I will not detail that here. Rest assured the featured paper rises up to the challenge.
New Direct Sum Tests by Alek Westover, Edward Yu, Kai Zheng (arXiv) We say that a \(\mathbb{F}_2\)-valued function \(f\) over the \(d\)-dimensional grid \(f \colon [n]^d \to \mathbb{F}_2\) is a direct sum if there are \(d\) one-dimensional functions \(L_i \colon [n] \to \mathbb{F}_2\) such that \(f(x) = \sum_{i=1}^d L_i(x_i)\). In a paper we covered in October 2019, Dinur and Golubev presented an algorithm for direct sum testing and left its analysis for future research. The featured paper analyzes this test and shows that it is indeed a bonafide direct sum tester. I omit the description of the tester. I will just leave with one note — this looks like an appetizing read for Boolean Fourier Analysis aficionados.
Directed Hypercube Routing, a Generalized Lehman-Ron Theorem, and Monotonicity Testing by Deeparnab Chakrabarty and C. Seshadhri (arXiv) This is a very refreshing read with (our very own) Seshadhri as one of the authors. Chakrabarty and Seshadhri have teamed up (often in a trio with Hadley Black) in their attempts to understand the deep dark secrets of testing boolean monotonicity over the boolean hypercube. The featured paper conjectures a bold generalization of Lehman-Ron theorem over the hypercube and suggests that the conjectured routing once established could pave the path forward towards a more transparent understanding of directed isoperimetric inequalities over the boolean hypercube. In my opinion, it is worth your while to check this one out.
Lempel-Ziv (LZ77) Factorization in Sublinear Time by Dominik Kempa and Tomasz Kociumaka (arXiv) A pretty unique topic. The main result of this paper is what you see in the title — after 50 years since its introduction, finally we can factorize LZ77 in sublinear time. If you are like me, you might be asking but what does that mean? Quoting from the abstract
Lempel-Ziv (LZ77) factorization is a fundamental problem in string processing: Greedily partition a given string T from left to right into blocks (called phrases) so that each phrase is either the leftmost occurrence of a letter or the longest prefix of the unprocessed suffix that has another occurrence earlier in T.
Indeed, the abstract has more fascinating information about this problem. Quoting again.
Sublinear-time algorithms are known for nearly all other fundamental problems on
strings, but LZ77 seems resistant to all currently known techniques.
Looks like new year came early for PTReview readers. This paper promises to be fun by the contents I sampled so far.
Computing String Covers in Sublinear Time by Jakub Radoszewski and Wiktor Zuba (arXiv) A substring (actually prefix) \(C\) is called a cover of text \(T\) is \(T\) can be constructed by concatenations and superpositions of \(C\). Suppose the text string \(T\) contains \(n\) symbols from some alphabet of fixed size. The main theorem of the featured paper asserts that given a string \(T\), you can find a representation of all of its covers (at most \(n\) of them — they are all prefixes) in merely \(O(n/\log_{\sigma} n)\) time. This bound is also shown to be optimal — indeed the authors show that you cannot compute a representation (in a certain model formulated by Charalampopoulos et al in FOCS 2020) for the shortest cover in less than \(o(n/log n)\) time.
Quantum Channel Testing in Average-Case Distance by Gregory Rosenthal, Hugo Aaronson, Sathyawageeswar Subramanian, Animesh Datta and Tom Gur (arXiv) This paper considers a new diversion in quantum land. Namely, it considers the task of testing quantum channels. The setup is this: I have a \(d\)-dimensional quantum system which is just a \(\mathbb{C}^{d \times d}\) psd matrix over complex numbers with trace \(1\). A quantum channel is a just a linear transformation that maps \(d_{in}\) dimensional quantum systems to a \(d_{out}\)-dimensional quantum system. With repsect to some appropriate norm (the so-called diamond norm), one of the results in this paper proves a \(poly(d_{in})/\varepsilon\) lower bound for testing identity to any fixed channel in diamond distance. This lower bound is shown to hold in a very strong query model appropriate for the quantum setting.