Sublinear Algorithms Program at the Simons Institute in 2024

Exciting news!* Next year, the Simons Institute will host a summer program on Sublinear Algorithms. from May 20 to August 9, 2024. Organised by Artur Czumaj, Piotr Indyk, Jelani Nelson, Noga Ron-Zewi, Ronitt Rubinfeld, Asaf Shapira and myself, the summer program will feature 4 workshops:

This is, of course, in addition to the bulk of the program itself: research discussions, reading groups, talks, social activities… If you happen to be in the area, you’re more than welcome to come and take part in some of these!

While each workshop will have its own set of attendees (more details soon), there are also some slots for (1) long-term visitors and (2) Simons Research Fellows (within five years of the award of their PhD at the start of academic year 2024-25, may already hold faculty positions) you can apply to! The deadline to apply is December 1, 2023:

Hope to see many of you next summer! Oh, and to conclude… have you seen our logo?

Sublinear Algorithms Wide Format Logo

* Full disclosure: I am biased, being an organizer, but do find that very exciting.

News for October 2023

Last month was a little slower, with only (unless we missed some) three papers: two papers appearing, and one that was overlooked from the month before.

Stability of Homomorphisms, Coverings and Cocycles I: Equivalence, by Michael Chapmanand Alexander Lubotzky (arXiv). This paper considers three “stability” problems in topological property testing: namely, problems of the form “are objects almost-X close to X”, where X (here) is one of homorphisms, coverings of a cell complex, or 1-cocycles. The main result of the paper is that the three property testing problems are equivalent: namely, they admit testing proximity-oblivious testers (POTs) with similar rejection probability rates.

Testing Higher-order Clusterability on graphs, by Yifei Li, Donghua Yang, and Jianzhong Li (arXiv). The authors propose a new notion of graph clusterability, higher-order clusterability, meant to generalize the previously studied notions of clusterability; and proceed to provide testing algorithms for this notion.

Private Distribution Testing with Heterogeneous Constraints: Your Epsilon Might Not Be Mine, by Clément Canonne and Yucheng Sun (arXiv). This distribution testing paper (carry-over from the previous month) focuses on the extensively studied problem of closeness testing: given samples from two unknown distributions \(p\) and \(q\), decide whether \(p=q\) or if they are far. Now, add (differential) privacy: what if the two sets of samples, from \(p\) and \(q\) respectively, were sensitive data? And now, the focus of the paper… what if the two sets of samples were not equally sensitive? What are the trade-offs between the number of samples from \(p\) and the number from \(q\), then?

News for September 2023

Sorry for delay in getting this month’s post out. This time we had seven (EDIT) eight (LATER EDIT) nine papers. Thanks to our readers for pointing out a paper we missed. Do let us know if we missed any (EDIT) others. Alright, without further delay, let us look at this month’s spread.

Mildly Exponential Lower Bounds on Tolerant Testers for Monotonicity, Unateness, and Juntas by Xi Chen, Anindya De, Yuhao Li, Shivam Nadimpalli, and Rocco A. Servedio (arXiv) Let us begin with a paper on property testing of boolean functions over the binary hypercube \(\{0,1\}^n\). Quoting from the abstract.

[This paper gives] the first superpolynomial (in fact, mildly exponential) lower bounds for tolerant testing of monotonicity, unateness, and juntas with a constant separation between yes and no cases.

Let us take a little dip to get a superficial feel for what techniques the paper uses for boolean monotonicity. The key lies in an adaptation of the lower bound techniques pioneered in the paper of Pallavoor-Raskhodnikova-Waingarten (PRW) which considers distributions over “yes functions” (which are \(\varepsilon/\sqrt{n}\)-close to monone) and “far functions” (which are $\eps$-far from being monotone). PRW split the \(n\) variables into \(n/2\) control variables and \(n/2\) action variables. PRW consider the subcubes indexed by bit-strings where the control variables are balanced. The function value is chosen carefully on these subcubes which ensures any tester that reliably distinguishes yes and no functions need to sample bit-strings from the same balanced subcube which differ in lots of action variables. As PRW argue, this event occurs with low probability if the allowed number of queries is small. The key insight of the featured paper is to modify the PRW construction by using random monotone DNF formulas due to Talagrand. If this sets up your appetite, go read the paper!

Testing Junta Truncation by William He and Shivam Nadimpalli (arXiv)

Let \(f \colon \{0,1\}^n \to \{0,1\}\) be a \(k\)-junta. You are given a set \(\mathcal{U}_{\text{yes}} = \bigl\{0,1\bigr\}^n\) and a set \(\mathcal{U}_{\text{no}} = \bigl\{x \in \{0,1\}^n \colon f(x) = 1 \bigr\}.\) Consider uniform distributions supported on both of these sets which we call \(\mathcal{D}_{\text{yes}}\) and \(\mathcal{D}_{\text{no}}\). Finally, we define a distribution \(\mathcal{D} = \begin{cases} \mathcal{D}_{\text{yes}} \text{ w.p. } 1/2 \\ \mathcal{D}_{\text{no}} \text{ w.p } 1/2 \end{cases}.\)

You are given \(t\) samples from the distribution \(D\). The task is to decide whether \(D\) is the yes distribution or the no distribution. The featured paper shows you can reliably solve this task with \(t \leq \min(2^k + \log{n \choose k }, 2^{k/2} \log^{1/2}{n \choose k})\) samples. The paper also supplements this result with a lower bound of \(t \geq \log{n \choose k}\) samples fewer than which cannot be used to reliably distinguish these two distributions. The results suggest that this “testing junta truncation” problem requires learning the set of relevant variables for the junta.

Longest Common Substring and Longest Palindromic Substring in \(\mathbf{\widetilde{O}(\sqrt n)}\) Time by Domenico Cantone, Simone Faro, Arianna Pavone, and Caterina Viola (arXiv) I paraphrase from the abstract of this paper. You know the longest common substring and longest palindromic substring as classic problems in computer science both of which can be solved in linear time using suffix trees. Recently, quantum algorithms were proposed for both of these problems in the query model both of which issue only \(o(n)\) quantum queries. The featured paper notes that this query model has a shortcoming namely when it comes to real life implementation on actual hardware. The current paper address this shortcoming by presenting \(o(n)\) quantum-query algorithms in the circuit model of computation.

Testing properties of distributions in the streaming model by Sampriti Roy and Yadu Vasudev (arXiv) Alright, now let us consider a different twist on distribution testing. Suppose you have a small memory. You obtain a bunch of samples to solve some standard distribution testing task but the twist is of course you cannot store all the samples. What can you say about how sample complexity trades off against space complexity? The featured paper studies this trade off in the standard access model and the conditional access model. One of the results of the paper asserts that in the conditional access model, you can do identity testing with \( \widetilde{O}\bigl(\frac{\log^4n}{\varepsilon^4}\bigr)\) samples while using only \(O\bigl(\frac{\log^2 n}{\varepsilon^2} \bigr)\) bits of memory.

Testing Spreading Behavior in Networks with Arbitrary Topologies by Augusto Modanese and Yuichi Yoshida (arXiv) We covered the problem of testing dynamic environments in this March 2014 post and that May 2021 post earlier. The goal here is to check whether a dynamically evolving system evolves according to some fixed rule or whether it evolves according to some fixed rule or whether the system is far from systems that evolve according to that fixed rule. The May 2021 post covered a paper which shows you can test dynamically evolving systems that evolve according to what is called the threshold rule. The featured paper considers rules motivated by some kind of models for infection spreading. One of the results in the paper presents one-sided and two-sided testers (with \(O(1/\varepsilon)\) query complexity) for testing a single step of evolution (on bounded degree graphs) with these rules.

A Tight Lower Bound of Ω(log n) for the Estimation of the Number of Defective Items by Nader Bshouty and Gergely Harcos (ECCC) The featured paper considers a problem in group testing. Let us quickly review the setup for group testing. You are given some ground set \(X\) of \(|X| = n\) items. Suppose the set of items in the set \(I \subseteq X\) is defective. The challenge is to devise a test which refers to some set \(Q \subseteq X\) where the test is said to be successful iff \(Q \cap I \neq \emptyset\). This paper presents lower bounds for non-adpative algorithms for group testing. And as the title says, if your algorithm wishes to estimate the number of defective items to within a constant factor, you better pay up \(\Omega(\log n)\) tests.

A tight lower bound on non-adaptive group testing estimation by Tsun-Ming Cheung, Hamed Hatami, and Anthony Ostuni (arXiv) As our readers pointed out, this paper is concurrent with the paper above and achieves the same lower bound. Indeed, this holds for both the one-sided and the two-sided variants. Furthermore, as this paper shows if one knows the set \(I\) satisfies \(L \leq |I| \leq U\) then you can show both one-sided and two-sided lower bounds of \(\Omega(U/L)\) non-adaptive queries if you want a constant approximation to \(|I|\).

On Testing Isomorphism to a Fixed Graph in the Bounded-Degree Graph Model by Oded Goldreich and Laliv Tauber (ECCC) This paper looks like a fantastic read for your students — especially when written in such an engaging style spanning (only) 18 highly readable pages. As the title indicates, this paper considers the challenge of testing isomorphism to a fixed graph in the bounded degree model. The main result of this paper asserts that for almost all \(d\)-regular graphs \(H\) on \(|V(H)| = n\) vertices, testing isomorphism to \(H\) can be done in about \(\approx \sqrt n\) queries. The paper also presents an almost matching (query) lower bound which also holds for almost all graphs \(H\).

Tolerant Testing of High-Dimensional Samplers with Subcube Conditioning by Gunjan Kumar, Kuldeep S. Meel, and Yash Pote (arXiv) Let us consider the distribution testing setup with a twist: Suppose you are given some unknown distribution \(\mathcal{Q}\) supported over \(\{0,1\}^n\) and you want to sample from it conditioned on some predicate \(\mathcal{Q}\). The question is can you efficiently check whether these are legit samples (satisfying the predicate) which are taken from the distribution \(\mathcal{Q}\). Our October 2020 news covered a tolerant tester on this problem which involved some subset of the authors of the current paper. The featured paper considers what additional leverage you gain if you are given access to a sampling oracle which can sample from “conditioned subcubes.” In this model, you can query some subcube and after issuing a query, you will receive an element \(x\) from this subcube with probability proportional to original probability weight of \(x\). The paper provides a tolerant tester in this setup which makes at most \(\widetilde{O}(n^3/(\varepsilon_2 – \varepsilon_1)^5)\) queries to this sampling oracle.

News for August 2023

The month of August was… wow, intense! We listed no fewer than 13 property testing papers, which is great. Keep them coming!

If you like high-dimensional expanders (HDX) and agreement testing, this is your lucky day! We start with two (2!) exciting papers on HDX, and agreement testing in the low soundness regime:

Characterizing Direct Product Testing via Coboundary Expansion, by Mitali Bafna and Dor Minzer (ECCC). This paper focuses on direct product testing: namely, given query access to a function \(F\) defined on a \(d\)-dimensional simplicial complex, we consider testers which samples two faces, and check if \(F\) is consistent on their intersection. Such testers will pass with probability one when \(F\) is a direct product, so the question is what “soundness” (probability of rejecting if \(F\) is far from being one) one can achieve. And, central to this paper: can we use high-dimensional expanders to get such testers?
While the high-soundness regime is reasonably well understood, the low-soundness one… not so much. This paper makes progress on that front: the main result states that, to allow for direct product testing in this low soundness regime, “a high-dimensional spectral expander must possess a property that may be seen as a generalization of coboundary expansion.”

Agreement theorems for high dimensional expanders in the small soundness regime: the role of covers, by Yotam Dikstein and Irit Dinur (ECCC). Concerned with the same type of question, this paper provides another characterization of the agreement test properties of high-dimensional expanders in the low-soundness regime: specifically, in terms of the topological covers of the expander.

Rejoice: we’re not done with HDX!

New Codes on High Dimensional Expanders, by Irit Dinur, Siqi Liu, and Rachel Zhang (ECCC). This paper constructs a new family of locally testable codes (LTCs), based on simplicial (bounded-degree) high dimensional expanders, which, while they do not achieve the \(c^3\) Holy Grail (constant rate, distance, and testability), do satisfy a host of potentially useful properties: symmetric, with low-density parity-check matrices (LDPC), and some additional structure (multiplying two codewords gives a related codeword).

Now, onto quantum…

Space-bounded quantum state testing via space-efficient quantum singular value transformation, by François Le Gall, Yupan Liu, and Qisheng Wang (ECCC). Suppose two “computationally limited” (read: space-bounded) quantum devices are used to prepare quantum states \(\rho_0\) and \(\rho_1\), and claim they are identical: your goal is to check whether these two states are close, or very far, for some reasonable distance measure. How hard could that be, computationally? This is exactly a variant of quantum state testing (i.e., the “quantum generalization of tolerant closeness testing” in distribution testing jargon) which places restrictions on the “source” of the observations: this paper establishes several characterisations of the computational complexity of the task, for various distance measures: trace distance, Hilbert-Schmidt distance, quantum entropy difference, and quantum Jensen-Shannon divergence.

Quantum Lower Bounds by Sample-to-Query Lifting, by Qisheng Wang and Zhicheng Zhang (arXiv). There exist several methods to prove quantum query complexity lower bounds, such as the polynomial and the adversary methods. This paper brings a new kid in town: a “sample-to-query lifting” result, which lets one convert a query-based algorithm for a promise problem to obtain a sample-based quantum algorithm for a corresponding testing problem, with a quadratic blowup from query to sample complexity. That is, any sample complexity lower bound for that problem now implies a (quadratically weaker) query complexity lower bound for the original task! Thus, the result allows to “systematically derive quantum query lower bounds using corresponding sample lower bounds from quantum information theory.” The authors use this sample-to-query lifting to establish both new and previously known quantum query complexity lower bounds.

A little magic means a lot, by Andi Gu, Lorenzo Leone, Soumik Ghosh, Jens Eisert, Susanne Yelin, and Yihui Quek (arXiv). Alright, I have to say, I am completely out of my depth with this one. But the title is great, the abstract sounds interesting, and the paper contains a lower bound on the number of copies any tester for “non-stabilizerness” requires (Theorem 6).

… then interactive proofs…

Distribution-Free Proofs of Proximity, by Hugo Aaronson, Tom Gur, Ninad Rajgopal, and Ron Rothblum (ECCC). In distribution-free property testing, to model “real practical settings” the underlying distance measure is not Hamming, but one induces by an unknown probability distribution, PAC-learning style. In interactive proofs of proximity (IPP), the property testing algorithm can interact by an all-powerful, but all-very-much-untrusted prover, IP-style. Well, you probably have guessed by now: this paper introduces a new type of property testing, combining the two! And shows that any problem in NC admits a distribution-free interactive proof of proximity with strongly sublinear communication (\(\tilde{O}(\sqrt{n})\)), which (under some reasonable assumption) is near-optimal.

… and then, distribution testing!

New Lower Bounds for Testing Monotonicity and Log Concavity of Distributions, Yuqian Cheng, Daniel Kane, and Zhicheng Zheng (arXiv). Monotonicity is a pretty clean and natural property of probability distributions over the line: the probability mass function must be non-increasing. This definition straightforwardly generalizes to either dimensions, and, let’s be honest: given how simple this property is, surely, by now we must have tight bounds on the sample complexity of testing monotonicity, right? Turns out, yes, but no. The known upper and lower bounds are tight… as long as the distance parameter \(\varepsilon\) is not too small. But if it is, then… the bounds are lose. And same thing for a related property, log-concavity! This state of affairs was really annoying to some, and I am on of these “some.” Fortunately, this paper makes significant progress towards scratching that itch, by proving new lower bounds for both monotonicity and log-concavity (via a new lower bound technique, relying on moment matching but preserving the type of constraints monotonicity and log-concavity impose on the individual probabilities). Almost there!

Tight Lower Bound on Equivalence Testing in Conditional Sampling Model, by Diptarka Chakraborty, Sourav Chakraborty, and Gunjan Kumar (arXiv). In the usual sampling model for distribution testing, the algorithm gets i.i.d. samples from the unknown distribution(s). This is very natural, but often leads to very high sample complexities… a recent line of work, starting with Chakraborty, Fischer, Goldhirsh, and Matsliah (2011) and Canonne, Ron, and Servedio (2012), introduced the “conditional sampling model”, a more powerful oracle setting where the algorithm gets to condition on subsets of the domain of its choosing. And, lo and behold, with great (additional) power comes greatly lower sample complexity! But again, some itch to scratch: for one of the “flagship” distribution testing questions, closeness testing, there remained a quadratic gap between upper (\(\tilde{O}(\log\log n)\), Falahatgar, Jafarpour, Orlitsky, Pichapati, and Suresh ’15) and lower (\(\Omega(\sqrt{\log\log n})\), Acharya, Canonne, Kamath’15). Not anymore! For this question (and some related ones), this paper essentially closes the gap, proving an \(\tilde{\Omega}(\log\log n)\) lower bound, bypassing the limitations of previous lower bound techniques!

Support Testing in the Huge Object Model, by Tomer Adar, Eldar Fischer, and Amit Levi (arXiv). Another paper, another distribution testing model! In the Huge Object Model recently introduced by Goldreich and Ron, one gets i.i.d. samples from an unknown probability distribution over \(n\)-bit strings, but does not to actually get the samples themselves, which are too big: instead, the algorithm then gets query access to the individual samples (each being an \(n\)-bit string). This is a rather appealing model, and the authors focuses on the task of testing the support size of the distribution. As their results show, this turns out to be quite interesting: namely, the paper shows a separation between adaptive and non-adaptive algorithms, as well as a separation between number of samples and number of queries (to these samples).

We also have some Boolean function testing on the menu:

Linear isomorphism testing of Boolean functions with small approximate spectral norm, by Arijit Ghosh, Chandrima Kayal, Manaswi Paraashar, and Manmatha Roy (arXiv). Given two Boolean functions \(f,g\colon\mathbb{F}_2^n\to{-1,1}\), a standard property testing question is to test whether \(f,g\) are isomorphic. Now, one can instead ask whether they are linearly isomorphic: i.e., whether there exists some invertible matrix \(A\) with entries in \(\mathbb{F}_2\) such that \(f\circ A = g\). This is this question that the authors study, through two different lenses: first, focusing on the communication complexity of the question when Alice has \(f\) and Bob has \(g\); and on the (usual) query complexity. In both cases, their results are expressed in terms of the (approximate) spectral norm of the functions.

Adversarial Low Degree Testing, by Dor Minzer and Kai Zhe Zheng (arXiv). Consider an adversarial setting of property testing where, every time the tester makes a query, an adversary gets to “erase” a small number values of the function, say \(t\), to try to fool the tester. This model was recently introduced by Kalemaj, Raskhodnikova, and Varma: in this paper, the authors continue this line of study, and show that one can test in an erasure-resilient fashion whether a function has low degree (at most \(d\)) with \(\tilde{O}(\log^{O(d)}(t)/\varepsilon)\) queries.

And, to conclude, graph testing:

Testing Graph Properties with the Container Method, by Eric Blais and Cameron Seth (arXiv). Consider the \(\rho\)-CLIQUE property: does a graph on \(n\) vertices contains a clique of size at least \(\rho n\), or is it \(\varepsilon\)-far from it? This property has been extensively studied in the dense graph testing model, but lower and upper bound still exhibited a factor-\(\rho\) gap, particularly striking in the “small clique” regime (when \(\rho \ll 1\)). This paper, using the graph container method, nearly closes that gap, up to polylogarithmic factors: showing an \(\tilde{O}(\rho/\varepsilon)\)-query upper bound. The authors also establish an improved upper bound for another flagship testing question, \(k\)-COLORABILITY.


Oh, my. We forgot a paper last month (appeared end of June), which means… fourteen. Fourteen! (And, of course, apologies for overlooking this paper!)

Refining the Adaptivity Notion in the Huge Object Model, by Tomer Adar and Eldar Fischer (arXiv). For the huge object model in property testing, see one of the papers above. In this one, the authors focus on the notion of adaptivity in this model, in a fine-grained fashion: i.e., which queries depends on what the tester previously saw, with interesting subtleties about what that means in the huge object model. Of course, the key question is whether adaptivity allows for more efficient property testing… As they show, the answer is yes: the authors establish a hierarchy of separations.

News for July 2023

We’re now back to our regular cadence of 4+ monthly papers on property testing and sublinear time algorithms. July brings us a new sublinear PageRank computation and numerous results on our trending topics of distribution and monotonicity testing. Without further delay, let’s see the July spread.

Estimating Single-Node PageRank in \(\widetilde{O}(\min(d_t,\sqrt{m}))\) Time by Hanzhi Wang and Zhewei Wei (arXiv). PageRank is one of the most important quantities computed on graphs, especially in today’s networked world. Consider an undirected graph \(G = (V,E)\). The PageRank value of vertex \(t\) is the probability that a short random walk, starting from the uniform distribution, reaches the vertex \(t\). (Ok ok, I’m fudging details, but this is close enough to the truth.) The aim is to estimate this probability to within additive error \(O(1/n)\), where \(n\) is the number of vertices. A standard simulation would give an \(O(n)\) time algorithm; can we do sublinear in graph size? Previous work (which actually has implementations!) has led to \(O(\sqrt{n\cdot d_t})\) for undirected graphs [LBGS15] and (roughly) \(O(n^{2/3} d^{1/3}_{max})\) for all graphs [BPP18]. Here, \(d_t\) is the degree of vertex \(t\) and \(d_{max}\) is the maximum degree. This paper gets a bound of \(\widetilde{O}(\min(d_t,\sqrt{m}))\). A lower bound is still open for the fundamental problem! (A nice problem for any students reading…?)

Directed Poincare Inequalities and \(L_1\) Monotonicity Testing of Lipschitz Functions by Renato Ferreira Pinto Jr. (arXiv, ECCC). We all (or at least me) love monotonicity testing. The paper studies the continuous version, where \(f:[0,1]^n \to \mathbb{R}\). To have a reasonable notion of distance and testers, we will require functions to be differentiable and \(L\)-Lipschitz. We measure distance using \(\ell_1\) distance, so the distance between \(f,g\) is \(\int_{[0,1]^n} |f-g| d\nu\) (where we integrate over the uniform measure) [BRY14]. To access \(f\), we are also provided a “derivative oracle”: given a point \(x\) and a direction \(\vec{v}\), we can query the directional derivative along \(\vec{v}\) at \(x\). One of the key insights in (discrete, Boolean) monotonicity testing is the connection to directed isoperimetric inequalities. These inequalities are directed versions of classic inequalities that relate boundaries (or gradients) to volumes. For \(L\)-Lipschitz functions, the classic Poincare inequality states that \(E[\|\nabla f\|_1] \geq \textrm{var}(f)\), where \(\nabla f\) is the gradient and \(\textrm{var}(f)\) is (up to constant factors) the distance to being constant. This paper proves the directed version \(E[\|\nabla^- f\|_1] \geq \varepsilon(f)\). Roughly speaker, \(\nabla^-\) is the “negative gradient” (\(min(\nabla f, 0)\)) and \(\varepsilon(f)\) is the \(L_1\)-distance to monotonicity. This result directly yields a \(O(n/\varepsilon)\) query tester for monotonicity. The paper asks the tantalizing question: can we do better?

A Strong Composition Theorem for Junta Complexity and the Boosting of Property Testers by Guy Blanc, Caleb Koch, Carmen Strassle, and Li-Yang Tan (arXiv). It is best to go backwards in discussing this paper, starting from the implications and going to the core results. The problem at hand is the classic one of junta testing. Given \(f:\{0,1\}^n \to \{0,1\}\), distinguish if \(f\) only depends on \(r\) variables (an \(r\)-junta) or is \(\varepsilon\)-far from having that property. This problem is well studied, has optimal (efficient) testers, and even has results in the distribution-free setting. In the latter setting, there exists some (unknown) distribution \(\mathcal{D}\) on the domain according to which distance is defined. The tester can access to queries according to \(\mathcal{D}\). The clarity of junta testing disappears when we consider tolerant testing: can we distinguish \(f\) is (say) \(1/4\) close to an \(r\)-junta from being \(1/3\)-far (where distances are measured according to \(\mathcal{D}\))? A remarkable consequence of this paper is that this tolerant testing version is unlikely to have a \(poly(n)\) time algorithm. (Note that the sample complexity might still be small.) The main tool is a composition theorem that gives reductions between low \(\varepsilon\) testers and constant \(\varepsilon\) testers. The details are intricate, but here’s the gist. Suppose we can construct composing functions \(g\) such that the distance to “junta-ness” of \(g \circ f\) is much larger than the distance of \(f\). Then a tester (that can only deal with large \(\varepsilon\)) on \(g \circ f\) can effectively property test \(f\). (Obviously, I’m greatly oversimplifying and mischaracterizing. So go read the paper!)

The Full Landscape of Robust Mean Testing: Sharp Separations between Oblivious and Adaptive Contamination by Clément L. Canonne, Samuel B. Hopkins, Jerry Li, Allen Liu, and Shyam Narayanan (arXiv). Consider the fundamental hypothesis testing problem of Gaussian testing. We wish to distinguish the \(d\)-dimensional, zero-mean, unit covariance Gaussian \(\mathcal{N}(0,I)\), from the “shifted” version \(\mathcal{N}(\mu,I)\), where \(\mu\) is a vector of length at least $latex  \alpha$. Existing results give a simple algorithm using \(\Theta(\sqrt{d}/\alpha^2)\) samples. Suppose there is an adversary who can corrupt samples. The adaptive adversary can look at the samples generated, and arbitrarily change an \(\varepsilon\) fraction of entries. The weaker, oblivious adversary can also arbitrarily change an \(\varepsilon\) fraction of entries, but cannot look at the samples generated. Can we perform Gaussian testing in this setting? A previous result gave an optimal bound for adaptive adversaries [N22]. This paper gives the optimal sample complexity bound for oblivious adversaries. The expression is somewhat complicated. But the punchline is that for many regimes of parameters \(d, \alpha, \varepsilon\), the oblivious adversary is strictly weaker. Meaning, there is a testing algorithm (against an oblivious adversary) whose sample complexity is strictly less than the optimal bound against adaptive adversaries. This comes as a surprise, because in typical distribution testing settings, these adversaries are basically equivalent.

Uniformity Testing over Hypergrids with Subcube Conditioning by Xi Chen and Cassandra Marcussen (arXiv). A problem of generalizing from hypercube to hypergrids, an issue I have much obsessed over. Consider the problem of uniformity testing, where the domain is the hypergrid \([m]^n\). We wish to test if a distribution \(\mathcal{D}\) over the hypergrid is uniform. Unlike the standard distribution testing setting, the domain size is exponentially large. To aid us, we can perform conditional sampling: we can select any sub-hypergrid, and get samples from \(\mathcal{D}\) restricted to this sub-hypergrid. Previous work solved this problem for the hypercube domain (when \(m=2\)), by providing a tester with sample complexity \(O(\sqrt{n}/\varepsilon^2)\) [CCKLW22]. The previous work did not work for any other hypergrid (even \(m=3\)). This paper gives the first solution for uniformity testing on hypergrids with a tester of sample complexity \(O(poly(m)n/\varepsilon^2)\). The dependence on \(m\) has to be at least \(\sqrt{m}\), from existing results. One of the important tools is getting robust versions of classic isoperimetric inequalities over the hypergrid. An important open question is to determine the optimal dependence on \(\mathcal{m}\).

Learning and Testing Latent-Tree Ising Models Efficiently by Davin Choo, Yuval Dagan, Constantinos Daskalakis, and Anthimos Vardis Kandiros (arXiv). Depending on your standpoint, one may or may not consider this a “sublinear time” paper (it does not show that testing \(\ll\) learning). But given the interest in distribution testing, I think it’s germane to our blog. We have a high-dimensional distribution \(\mathcal{D}\) over \((x_1, x_2, \ldots, x_n)\). Without any assumption, learning (or even identity testing) requires exponential in \(n\) many samples. This paper assumes that \((x_1, x_2, \ldots, x_n)\) is generated from a latent-tree Ising model. There is a tree where each node is associated with a number (think \(x_i\)), and the joint distribution satisfies some dependencies according to the edges. We only observe the values of the leaves; hence, the term “latent” model. The main result shows that identity testing can be done in with polynomial in \(n\) samples. The authors also prove that one can learn the distribution in (albeit larger) \(poly(n)\) samples.

News for June 2023

June was a somewhat quiet month for sublinear-time algorithms with only one paper coming to our attention.

Relaxed Local Correctability from Local Testing by Vinayak M. Kumar and Geoffrey Mon (arXiv). As the name indicates, the paper proposes a scheme to construct good relaxed locally correctable codes (RLCC) from good locally testable codes (LTC). A relaxed local decoder for an RLCC is a sublinear algorithm that, given query access to a corrupted word \(w\) of length \(n\) and an index \(i \in [n]\) as input, either returns \(c[i]\), where \(c\) is the closest codeword to \(w\), or returns a failure symbol \(\perp\). In the regime of constant distance and constant rate, the prior best known lower and upper bounds on the query complexity of RLCCs is \(\tilde{\Omega}(\sqrt{\log n})\) (Dall’Agnol, Gur, and Lachish; 2021) and \((\log n)^{O(\log \log \log n)}\) (Cohen and Yankovitz; 2022), respectively. Using their scheme to convert LTCs to RLCCs, this paper improves the state of the art by providing constant distance constant rate RLCCs with query complexity \(\log^{O(1)} n\).

News for May 2023

Apologies, dear readers for the delay in getting out this month’s post. This month we had three papers — all on testing properties of graphs! (EDIT: Updated later) four papers: three on property testing problems on graphs and the last one on testing convexity. One of the featured papers this month revisits the problem of testing the properties of directed graphs and comes back with a happy progress report. Alright, let’s dig in.

A Distributed Conductance Tester Without Global Information Collection by Tugkan Batu, Chhaya Trehan (arXiv) One of the classic questions in property testing considers the task of testing expansion. Here, you are interested in knowing whether the input graph has conductance at least \(\alpha\) or it is far from having conductance at most \(\alpha^2\). On a parallel track, we recall that thanks to the classic work of Parnas and Ron, we know there are connections between distributed algorithms and graph property testing. Meditating on these connections led to the emergence of distributed graph property testing as an active area of research. The featured paper considers the task of testing expansion in the distributed framework. The algorithms presented give a distributed implementation of multiple random walks from all vertices, and controls the congestion of the implementation. In particular, this leads to a \(O(\log n/\alpha^2)\) round expansion-tester. In a first attempt at such an implementation, you might note that you need to track how well short random walks mix when started from a bunch of randomly chosen vertices. This seems to require maintaining some global state/global aggregate information. One of the important features of their algorithm (as mentioned in the title) does away with the need to maintain such global states. As a closing remark, I note the algorithm presented in this paper does not require the graph to be bounded degree.

Testing versus estimation of graph properties, revisited by Lior Gishboliner, Nick Kushnir, Asaf Shapira (arXiv) This paper considers the task of additively estimating the distance to a property \(\mathcal{P}\) of a dense graph. Let me set up some context to view the results in the featured paper by summarizing what was known before. One of the early important results in this area is the original result of Fischer and Newman which shows that any testable graph property admits a \(\pm \varepsilon\) distance approximation algorithm, which follows from the regularity lemma. However, the query complexity of the resulting algorithm is a Wowzer-style bound. Later, Hoppen et al., building upon tools from the classic work of Conlon and Fox, demonstrated that this bound of \(twr(poly(1/\varepsilon))\) also holds for testable hereditary properties. Fiat and Ron introduced a decomposition machinery that allowed them to decompose a “complex” property into a collection of simpler properties. They used this decomposition to estimate distances to a vast collection of graph properties. They also asked if it was possible to find a transformation using which one can bypass the blowup in query complexity incurred by Fischer and Newman for some rich class of graph properties. The featured paper proves that for a hereditary graph property, you can in fact get algorithms where the query complexity for distance estimation only grows as \(\exp(1/\varepsilon)\). Additionally, for every testable graph property, you can get distance estimators for that property whose query complexity only grows doubly exponentially in \(1/\varepsilon\) (as opposed to the tower bound earlier).

An Optimal Separation between Two Property Testing Models for Bounded Degree Directed Graphs by Pan Peng, Yuyang Wang (arXiv) Unlike undirected graphs, directed graph properties have not received as much attention in the property testing community. In a classic work, Bender and Ron considered two natural models for studying property testing on directed graphs. The first model is one where you can only follow the “out” edges or the so-called unidirectional model. In the other model, you are allowed to follow both the “out” edges and the “in” edges incident on the vertex which is also called the bidirectional model. The featured paper considers directed graphs where the in-degree and the out-degrees are both bounded in both of the models mentioned above. The graph is presented to you in the adjacency list format (tailored for the model you consider). The paper shows that even for the fundamental task of subgraph-freeness, the directed world has some interesting behavior with respect to the two models above. Let me showcase one of the catchy results from the paper which illustrates this separation nicely. Take a connected directed graph \(H\) with \(k\) source components. The paper shows that for sufficiently small \(\varepsilon\), testing whether a directed graph \(G\) is \(H\)-free or \(\varepsilon\)-far from being \(H\)-free requires \(\Omega(n^{1-1/k})\) unidirectional queries. On the other hand, in the bidirectional model, this can be done with a mere \(O_{\varepsilon, d, k}(1)\) number of queries.

(EDIT: Added later)

Testing Convexity of Discrete Sets in High Dimensions by Hadley Black, Eric Blais, Nathaniel Harms (arXiv) As the title suggests, this paper explores the problem of testing convexity. To understand the notion of convexity explored in the paper, consider the following setup: You call a set \(S \subseteq \{-1, 0, 1\}^n\) convex if there exists a convex set \(S’ \subseteq \mathbb{R}^n\) such that \(S = S’ \cap \{-1,0,1\}^n\). And you call a set \(S \subseteq \{-1,0,1\}^n\) far from being convex if for every convex \(T \subseteq \{-1,0,1\}^n\), you have \(|S \oplus T| \geq \varepsilon 3^n\). The paper shows that when you are allowed membership queries, you can test convexity non-adaptively with a one-sided error by issuing \(3^{O(\sqrt{n \log(1/\varepsilon)})}\) queries. Also, they prove an almost matching lower bound. Finally, with a lower bound of \(3^{\Omega(n)}\) when confined to using sample-based testers, authors provably demonstrate that membership queries indeed buy you some undeniable power for testing convexity in high dimensions.

News for April 2023

After an empty month, the engines of PTReview are roaring back to life with a fresh batch of papers for this month’s edition. In total, we have four papers that are sure to pique your interest. It’s been an action-packed month with a diverse range of topics covered in the featured papers. The first paper delves into new variations in distribution testing, while the second paper discusses optimal testers for Bayes Nets. The third paper focuses on optimal tolerant junta-testers, and the final paper presents a cool monotonicity tester over hypergrids.

Distribution Testing Under the Parity Trace by Renato Ferreira Pinto Jr and Nathaniel Harms (arXiv) The featured paper considers the classic setup in distribution testing with a twist. To explain the results, let me introduce the framework considered in this work. Consider distributions supported over \([n]\). Suppose I want to design distribution testers where instead of obtaining samples in the usual way, I first obtain an ordered list of samples, I store it in a sequence \(S\) and only the least significant bit of each element of \(S\) is made available to your distribution testing algorithm. This is called a parity trace. Note that with this access model, suddenly a bunch of standard tasks become non-trivial. To take an example from the paper, you can no longer distinguish between the uniform distribution supported on \(\{1,2, \ldots, n\}\) and the uniform distribution supported on \(\{n+1, n+2, \ldots 2n\}\) in this access model. Nevertheless, the paper shows, you can still obtain testers which require only sublinear number of accesses for testing uniformity in this model.

Another cool feature of this big paper is an unexpected foray into the trace reconstruction literature from a property testing viewpoint. I wish I understood the formal connection better to describe a bit more about it. But for now, let me leave that as an appetizer which (hopefully) encourages you to take a look at the paper.

New Lower Bounds for Adaptive Tolerant Junta Testing by Xi Chen and Shyamal Patel (arXiv) If you are a regular here on the PTReview corner, you are probably no stranger to the tolerant junta testing problem. As a corollary to the main result, the paper in question proves a lower bound of \(k^{\Omega(\log k)}\) queries on any adaptive algorithm that wishes to test whether the input function \(f\) is \(\varepsilon_1\) close to being a \(k\)-junta or whether it is \(\varepsilon_2\)-far \(\left(\text{where } \varepsilon_2 \geq \varepsilon_1 + \displaystyle\frac{1}{poly(k)}\right)\). Indeed, another remarkable achievement of the paper is that it achieves a superpolynomial separation between non-tolerant versions and the tolerant versions of any natural property of boolean functions under the adaptive setting.

Near-Optimal Degree Testing for Bayes Nets by Vipul Arora, Arnab Bhattacharyya, Clément L. Canonne (our own!) and Joy Qiping Yang (arXiv) This paper continues a line of investigation which a subset of the authors were a part of (which we also covered in our News for April 2022). Let us remind ourselves of the setup. You are given a probability distribution \(\mathcal{P}\) supported over the Boolean Hypercube. Suppose \(\mathcal{P}\) can be generated by a Bayseian Network. You may think of a Bayesian Network as a DAG where each vertex tosses a coin (with different heads probabilities). The question seeks to test whether \(\mathcal{P}\) admits a sparse Bayesian Net (in the sense of each vertex having small in-degree). The main result of the paper gives an algorithm for this task which requires \(\Theta(2^{n/2}/\varepsilon^2)\) samples. The paper also proves an almost matching lower bound.

A \(d^{1/2+o(1)}\) Monotonicity Tester for Boolean Functions on \(d\)-Dimensional Hypergrids by Hadley Black, Deeparnab Chakrabarty and C. Seshadhri (again, our own!) (arXiv) Again, the problem of monotonicity testing of boolean functions hardly requires any detailing to the regular readers of PTReview. As you can see in our News from November 2022 there were two concurrent papers mulling over this problem over the hypergrid domain. The featured paper is the result of a dedicated pursuit by the authors and the key result is what the title says. Namely, you can test monotonicity with a number of (non-adaptive, one-sided) queries that has no dependence on \(n\).

News for February 2023

Despite being a short month, February 2023 has witnessed a significant amount of activity under the sublinear “regime”. Let us know if we have missed anything!

Dynamic \((1 + \epsilon)\)-Approximate Matching Size in Truly Sublinear Update Time by Sayan Bhattacharya, Peter Kiss, and Thatchaphol Saranurak (arXiv). This work throws light on connections between the dynamic and query models of computation and uses them for making advances on approximating the size of a maximum cardinality matching (MCM) in a general graph. In particular, as the main technical ingredient in obtaining an improved dynamic algorithm for maintaining an approximation to the size of MCM, the authors provide a \(\pm \epsilon n\) approximation algorithm for estimating the size of MCM in a general \(n\)-vertex graph by making \(n^{2 – \Omega_{\epsilon}(1)}\) adjacency queries. Prior to this result, the state of the art (Behnezhad, Roghani & Rubinstein; STOC’23) was a \(n^{2 – \Omega(1)}\)-query algorithm for the same problem with a multiplicative approximation guarantee of \(1.5\) and an additive guarantee of \(o(n)\).

Uniformity Testing over Hypergrids with Subcube Conditioning by Xi Chen and Cassandra Marcussen (arXiv). As the name indicates, the paper studies the fundamental problem of testing uniformity of distributions supported over hypergrids \([m]^n\). The tester that they present make \(O(\text{poly}(m)\sqrt{n}/\epsilon^2)\) queries to a conditional subcube sampling oracle, which, when given a subcube of \([m]^n\), returns a point sampled from the distribution conditioned on the point belonging to the subcube. The result is a generalization of the uniformity tester for distributions supported over the \(n\)-dimensional hypercube (Canonne, Chen, Kamath, Levi and Waingarten; SODA ’21).

Easy Testability for Posets by Panna Timea Fekete and Gabor Kun (arXiv). This paper deals with testing properties of directed graphs in the adjacency matrix model. The main characters of the story are posets, or directed acyclic graphs (DAGs) that are transitively closed. Given a family \(\mathcal{F}\) of finite posets, let \(\mathcal{P}_\mathcal{F}\) denote the set of all finite posets that do not contain any element of \(\mathcal{F}\) as a subposet. The main result of the paper is an \(\epsilon\)-tester with query complexity \(\text{poly}(1/\epsilon)\) for \(\mathcal{P}_\mathcal{F}\). The authors obtain this result by proving a removal lemma for posets. The result is placed in the larger context of understanding what properties of graphs can be tested with query complexity that has a polynomial dependence on \(1/\epsilon\) in the adjacency matrix model.

Compressibility-Aware Quantum Algorithms on Strings by Daniel Gibney and Sharma V. Thankachan (arXiv). Lastly, we have a paper on quantum string algorithms that run in sublinear time. In short, the authors present quantum algorithms with optimal running times for computing the Lempel-Ziv (LZ) encoding and Burrows Wheeler Transform (BWT) of highly compressible strings. A main consequence of these results is a faster quantum algorithm for computing the longest common subsequence (LCS) of two strings when the concatenation of the strings is highly compressible. It is to be noted that sublinear-time algorithms do not exist for these problems in the classical model of computation. More details follow.

Factoring a string into disjoint substrings (factors) in an specific manner is the main step in the LZ compression algorithm. The smaller the number of factors, the more compressible the string is. This paper gives a quantum algorithm for the problem of computing the LZ factorization of a string in time \(\tilde{O}(\sqrt{nz})\), where \(z\) is the number of factors in the string. They also show that their algorithm is optimal. Using this algorithm, they obtain a fast algorithm for computing the BWT of an input string, as well as an algorithm running in time \(\tilde{O}(\sqrt{nz})\) to compute the LCS of two strings, where \(n\) is the length and \(z\) is the number of factors in the concatenation of the two strings. When \(z\) is \(o(n^{1/3})\), this algorithm gives an improvement over the previous best quantum algorithm running in time \(\tilde{O}(n^{2/3})\).