Author Archives: Clement Canonne

News for October 2024

Four* papers on property testing last month! Lower bounds, upper bounds, distribution testing, quantum, and a new testing model!

* at least four. If we missed any, please let us know in the comments!

Lower Bounds for Convexity Testing, by Xi Chen, Anindya De, Shivam Nadimpalli, Rocco Servedio, and Erik Waingarten (arXiv). You’re given a membership oracle to a set \(S\) in \(\mathbb{R}^n\) (that is, query access to its indicator function \(f_S\colon \mathbb{R}^n\to \{0,1\}\)), and asked to decide if this set is convex, or “far from it”. This is a very natural and seemingly basic question— of course, we need to define what “far” means here, and the natural (normal, one may say) choice of underlying measure in \(\mathbb{R}^n\) is the standard Gaussian measure: \(d(S,T) = \Pr_{\mathcal{N}(0,I_n)}[ x \in S\Delta T]\).
Previous algorithms for this convexity testing question (and its tolerant testing analogue) are non-adaptive, and have \(2^{\tilde{O}(\sqrt{n})}\) query complexity. This paper shows that this is not just unfortunate, but also necessary: every non-adaptive tolerant tester for this question must make \(2^{\Omega(\sqrt[4]{n}}\) queries, and every (possibly adaptive) one-sided tester must have polynomial query complexity.

Replicable Uniformity Testing, by Sihan Liu and Christopher Ye (arXiv). In property testing, the algorithm must say YES with high probability on inputs which have the property, and NO with high probability on those which are far. On anything else, the algorithm is off the hook and can output either. This is typically considered to be fine, and, in any case, necessary to be able to obtain ultra-efficient algorithms. But what if, in this third case, we wanted to put the algorithm partially back on that hook, and required it to be consistent? The algorithm can answer either YES or NO, sure, but if I run it again on that same input, it should give the same answer with high probability. This is in line with a recent line of works on replicable algorithms, and is non-trivial to achieve in (the standard model of) distribution testing, where a distribution testing algorithm only gets to see random samples from the distribution, and thus needs to have a replicable behavior over that randomness. This paper introduces the question of replicable distribution testing, and provides both upper and lower bounds (essentially matching, with an asterisk) for the flagship task of uniformity testing.

Quantum property testing in sparse directed graphs, by Simon Apers, Frédéric Magniez, Sayantan Sen, and Dániel Szabó (arXiv). Graph property testing has a long and rich history in the classical setting, spanning more than two decades. There are several testing models, depending on whether the graph is dense, sparse, and directed or not: and even in the sparse, directed case, it is important to sometimes only allow outgoing edge queries. All these variants capture different meaningful scenarios, and relations and separations between them are known. This paper opens the direction of quantum testing for sparse graphs, either directed or not. The authors investigate what advantage quantum computers can bring for graph testing in this setting, and show one natural property for which a quadratic speedup exists: \(o(\sqrt{n})\) quantum queries in the outgoing-edge-query-only (unidrectional) sparse model, while classically \(\Omega(n)\) are necessary. They also show that this is not always the case: quantum testing of 3-colorability, as in the classical case, does not admit a \(o(n)\)-query tester.

Relative-error monotonicity testing, by Xi Chen, Anindya De, Yizhi Huang, Yuhao Li, Shivam Nadimpalli, Rocco Servedio, and Tianqi Yang (arXiv). Property testing of Boolean functions is defined “absolutely“: the distance between two functions is the fraction of the domain on which they differ, i.e., \(\displaystyle\frac{|f^{-1}(\{1\})\Delta g^{-1}(\{1\})|}{2^n}\)
This makes sense when the functions have a reasonable number of satisfying assignments: but may be much less meaningful for sparse functions, which only are non-zero on a \(o(1)\) fraction of the inputs—for instance, functions where “all the action” is concentrated in a tiny subcube of the hypercube. All these functions are vanishingly close to each other! To address this, the authors introduce a new distance notion, relative-error, where the distance from \(g\) to \(f\) is scaled by the sparsity of \(f\):
\(\displaystyle\frac{|f^{-1}(\{1\})\Delta g^{-1}(\{1\})|}{|f^{-1}(\{1\})|}\)
This requires a slightly different access model to avoid trivial impossibility results, so the tester is augmented with sampling access to satisfying assignments of \(f\), on top of query access to \(f\) (as otherwise it may just never even find one satisfying assignment). After introducing and motivating this testing model, the paper initiates its study in the specific case of testing monotonicity of Boolean functions.

News for May 2024

May came with 3 new papers on property testing algorithms — or inspired by them.

Interactive Proofs for General Distribution Properties, by Tal Herman and Guy Rothblum (ECCC). Following a fruitful line of work (including by the authors themselves: see, e.g., this previous monthly post), this paper considers interactive proofs for distribution testing: Merlin and Arthur have data over a universe of size \(n\), Arthur wants to test properties of that data (probability distribution), but he has much less data (samples) than Merlin.
As it turns out, as long as the property he’s interested in can be checked efficiently (computationally: via a small-depth circuit), then Arthur can do it with strongly sublinear sample complexity: he needs only \(n^{1-\Omega(1)}\) samples, even for tolerant testing! And all that’s needed is a small number of rounds of interaction with Merlin. And even more, all (honest) parties can do that via a computationally efficient protocol…

Oracle-Checker Scheme for Evaluating a Generative Large Language Model, by Yueling Jenny Zeng, Li-C. Wang, and Thomas Ibbetson (arXiv). This paper draws inspiration from property testing and program checking (à la Blum, Luby, and Rubinfeld) to check the output of large language models (LLMs): specifically, for the task of entity extraction: the authors formalize how to view entity extraction as a homomorphism, and then assess empirically what using a property tester for linearity leads to. Overall, it sounds like an interesting (and somewhat unexpected?) use of property testing for LLM trustworthiness assessment!

Property testing in graphical models: testing small separation numbers, by Luc Devroye, Gábor Lugosi, and Piotr Zwiernik (arXiv). Here too, ideas from property testing are used, this time in the context of high-dimensional (Gaussian) graphical models. This paper focuses on testing properties of the structure of the graphical model: given query access to the covariance matrix \(\Sigma\) consistent with some underlying graph structure \(G\), can we test whether this structure is a tree? Is it has small separation number?
The focus differs a little from the classical setting of property testing, in that there is no distance parameter and the goal is to get an exact decision algorithm (adaptive, but with unbounded query complexity: rejecting graphs that are far from the property as a function of the unknown distance parameter, and always accepting graphs with the property). But besides this small variation, great to see more uses of property testing in the wild!

WoLA’24: Dates, Registration, and call for contributed talks and posters

The 8th edition of WoLA, the Workshop on Local Algorithms, will be taking place on August 5-7, 2024 at the Simons Institute, as part of the Simons Institute’ summer program on Sublinear Algorithms.

Now, what is WoLA, some of you may ask?
“Local algorithms, that is, algorithms that compute and make decisions on parts of the output considering only a portion of the input, have been studied in a number of areas in theoretical computer science and mathematics. Some of these areas include sublinear-time algorithms, distributed algorithms, inference in large networks and graphical models. These communities have similar goals but a variety of approaches, techniques, and methods. This workshop is aimed at fostering dialogue and cross-pollination of ideas between the various communities.”

Save the date — the workshop has consistently been a great event for the local algorithms community to meet and discuss, and everyone is welcome to attend! (Registration is free)

The schedule is still being finalized, but here are some salient points:

  • on the first day, a celebration of Dana Ron‘s work and far-reaching influence, at the occasion of her 60th birthday
  • an open problem session for people to propose and discuss open questions and directions in local algorithms
  • Senior-junior lunches, and lightning talks (“graduating bits”) for graduating or soon-to-be-graduating students and postdocs
  • contributed short talks and a poster session

Importantly, there will also be another event (independent of WoLA, but very much related) on August 8 at the Simons Institute: a one-day birthday celebration for Ronitt Rubinfeld, “RR@60”, organized by Arnab Bhattacharyya, Funda Ergun, Krzysztof Onak, and Ravi Kumar. So plan on attending both!

To register your interest in attending WoLA (optional: for planning purposes), please fill the Simons Institute form on the right side here (or, alternatively, the form below). If you’d like to present your work, as a short contributed talk and/or a poster, or would like to take part in the “graduating bits” session, please express your interest by filling this form by ⏰ May 3, 5pm ET. Notifications will be sent by May 15.

If you have any questions about WoLA, or need an invitation letter to attend (for visa reasons), please indicate it in the form.

Update (25/04): updated the post to include the link to the Simons Institute registration form.

Clément Canonne, on behalf of the WoLA PC (Talya Eden, Manuela Fischer, Michael Kapralov, Robi Krauthgamer, Reut Levi, Rotem Oshman, Michal Parnas, Ron Rothblum, and Jara Uitto)

News for February 2024

February this year was slightly superlinear, with 29 days instead of the usual 28. As a result… 5 property testing papers! (Including one overlooked from January),

Testing Calibration in Subquadratic Time, by Lunjia Hu, Kevin Tian, and Chutong Yang (arXiv). The authors consider the question of model calibration, where a binary predictor is said to be calibrated if \(\mathbb{E}[ y\mid v=t ] = t\) for all \(t\), where \(y\) is the observed outcome and \(v\) is the prediction. This notion, central to algorithmic fairness, comes with a host of challenges: one of them being to assess whether a given predictor is indeed calibrated, and quantifying by how much it deviates from it. Following work by Błasiok, Gopalan, Hu, and Nakkiran which introduced a notion of distance to calibration, the paper defines the (property testing) task of calibration testing, with connections to distribution testing, and provides subquadratic-time algorithms (in the sample complexity) for the task. The authors also obtain analogous results for tolerant calibration testing, which they also introduce.

The role of shared randomness in quantum state certification with unentangled measurements, by Jayadev Acharya and Yuhan Liu (arXiv). In this paper (from January), the authors consider the following question, the quantum analogue of identity testing from the classical distribution testing world: what is the copy complexity (≈sample complexity) of certifying (≈testing) whether an unknown quantum state (≈quantum analogue of a probability distribution) is equal to a known, reference quantum state? And, crucially, what about doing this when our quantum hands are tied, i.e., without using entanglement — but possibly with adaptive measurements? This is not a new question, and we previously covered a couple papers on this in April 2020 and Feb 2021. What is new here is that the authors show it’s not about adaptivity! Mirroring what happens in the classical (distributed) case, the key here turns out to be shared randomness: that is, whether the measurements are made independently (in which case \(\Theta(d^2)\) copies are necessary and sufficient), or chosen randomly but jointly (in which case the copy complexity is \(\Theta(d^{3/2})\)).

Low Acceptance Agreement Tests via Bounded-Degree Symplectic HDXs, by Yotam Dikstein, Irit Dinur, and Alexander Lubotzky (ECCC) and Constant Degree Direct Product Testers with Small Soundness, by Mitali Bafna, Noam Lifshitz, Dor Minzer (ECCC). [Two independent works]

Let \(X\) be a (small) set of \(k\)-element subsets of \([n]\), and \(\{f_S\colon S\to \Sigma\}_{S\in X}\) a family of partial functions. Is there a way to “stitch together” all the functions \(f_S\) into a global one \(G\colon X \to \Sigma\)? A testing algorithm for this is called an agreement test, and the most natural goes as follows: pick \(S,T\in X\) at random (say, with fixed, small intersection), and accept if, and only if, \(f_{S}, f_T\) agree on \(S\cap T\). Does this work? In which parameter regime (i.e., how does the acceptance probability \(\varepsilon\) relate to the closeness-to-a-global-function-\(G\)? How large does \(X\) need to be? The two papers both show that the above agreement test works in the small soundness regime (small \(\varepsilon\)), for \(= O(n)\). Or, as the authors of the first of the two papers put it: “In words, we show that ε agreement implies global structure”

Efficient learning of quantum states prepared with few fermionic non-Gaussian gates, by Antonio Anna Mele and Yaroslav Herasymenko (arXiv). While most of the paper’s focus is on tomography (learning) of a specific class of quantum states, the authors also provide in Appendix A an algorithm for a property testing question: namely, testing the Gaussian dimension of a quantum state: specifically, tolerant testing of \(t\)-compressible \(n\)-qubit Gaussian states in trace distance (Theorem 48). I do not fully grasp what all this means, to be honest, so I’ll stop here.

Sublinear Algorithms Program at the Simons Institute in 2024

Exciting news!* Next year, the Simons Institute will host a summer program on Sublinear Algorithms. from May 20 to August 9, 2024. Organised by Artur Czumaj, Piotr Indyk, Jelani Nelson, Noga Ron-Zewi, Ronitt Rubinfeld, Asaf Shapira and myself, the summer program will feature 4 workshops:

This is, of course, in addition to the bulk of the program itself: research discussions, reading groups, talks, social activities… If you happen to be in the area, you’re more than welcome to come and take part in some of these!

While each workshop will have its own set of attendees (more details soon), there are also some slots for (1) long-term visitors and (2) Simons Research Fellows (within five years of the award of their PhD at the start of academic year 2024-25, may already hold faculty positions) you can apply to! The deadline to apply is December 1, 2023:

Hope to see many of you next summer! Oh, and to conclude… have you seen our logo?

Sublinear Algorithms Wide Format Logo

* Full disclosure: I am biased, being an organizer, but do find that very exciting.

News for October 2023

Last month was a little slower, with only (unless we missed some) three papers: two papers appearing, and one that was overlooked from the month before.

Stability of Homomorphisms, Coverings and Cocycles I: Equivalence, by Michael Chapmanand Alexander Lubotzky (arXiv). This paper considers three “stability” problems in topological property testing: namely, problems of the form “are objects almost-X close to X”, where X (here) is one of homorphisms, coverings of a cell complex, or 1-cocycles. The main result of the paper is that the three property testing problems are equivalent: namely, they admit testing proximity-oblivious testers (POTs) with similar rejection probability rates.

Testing Higher-order Clusterability on graphs, by Yifei Li, Donghua Yang, and Jianzhong Li (arXiv). The authors propose a new notion of graph clusterability, higher-order clusterability, meant to generalize the previously studied notions of clusterability; and proceed to provide testing algorithms for this notion.

Private Distribution Testing with Heterogeneous Constraints: Your Epsilon Might Not Be Mine, by Clément Canonne and Yucheng Sun (arXiv). This distribution testing paper (carry-over from the previous month) focuses on the extensively studied problem of closeness testing: given samples from two unknown distributions \(p\) and \(q\), decide whether \(p=q\) or if they are far. Now, add (differential) privacy: what if the two sets of samples, from \(p\) and \(q\) respectively, were sensitive data? And now, the focus of the paper… what if the two sets of samples were not equally sensitive? What are the trade-offs between the number of samples from \(p\) and the number from \(q\), then?

News for August 2023

The month of August was… wow, intense! We listed no fewer than 13 property testing papers, which is great. Keep them coming!

If you like high-dimensional expanders (HDX) and agreement testing, this is your lucky day! We start with two (2!) exciting papers on HDX, and agreement testing in the low soundness regime:

Characterizing Direct Product Testing via Coboundary Expansion, by Mitali Bafna and Dor Minzer (ECCC). This paper focuses on direct product testing: namely, given query access to a function \(F\) defined on a \(d\)-dimensional simplicial complex, we consider testers which samples two faces, and check if \(F\) is consistent on their intersection. Such testers will pass with probability one when \(F\) is a direct product, so the question is what “soundness” (probability of rejecting if \(F\) is far from being one) one can achieve. And, central to this paper: can we use high-dimensional expanders to get such testers?
While the high-soundness regime is reasonably well understood, the low-soundness one… not so much. This paper makes progress on that front: the main result states that, to allow for direct product testing in this low soundness regime, “a high-dimensional spectral expander must possess a property that may be seen as a generalization of coboundary expansion.”

Agreement theorems for high dimensional expanders in the small soundness regime: the role of covers, by Yotam Dikstein and Irit Dinur (ECCC). Concerned with the same type of question, this paper provides another characterization of the agreement test properties of high-dimensional expanders in the low-soundness regime: specifically, in terms of the topological covers of the expander.

Rejoice: we’re not done with HDX!

New Codes on High Dimensional Expanders, by Irit Dinur, Siqi Liu, and Rachel Zhang (ECCC). This paper constructs a new family of locally testable codes (LTCs), based on simplicial (bounded-degree) high dimensional expanders, which, while they do not achieve the \(c^3\) Holy Grail (constant rate, distance, and testability), do satisfy a host of potentially useful properties: symmetric, with low-density parity-check matrices (LDPC), and some additional structure (multiplying two codewords gives a related codeword).

Now, onto quantum…

Space-bounded quantum state testing via space-efficient quantum singular value transformation, by François Le Gall, Yupan Liu, and Qisheng Wang (ECCC). Suppose two “computationally limited” (read: space-bounded) quantum devices are used to prepare quantum states \(\rho_0\) and \(\rho_1\), and claim they are identical: your goal is to check whether these two states are close, or very far, for some reasonable distance measure. How hard could that be, computationally? This is exactly a variant of quantum state testing (i.e., the “quantum generalization of tolerant closeness testing” in distribution testing jargon) which places restrictions on the “source” of the observations: this paper establishes several characterisations of the computational complexity of the task, for various distance measures: trace distance, Hilbert-Schmidt distance, quantum entropy difference, and quantum Jensen-Shannon divergence.

Quantum Lower Bounds by Sample-to-Query Lifting, by Qisheng Wang and Zhicheng Zhang (arXiv). There exist several methods to prove quantum query complexity lower bounds, such as the polynomial and the adversary methods. This paper brings a new kid in town: a “sample-to-query lifting” result, which lets one convert a query-based algorithm for a promise problem to obtain a sample-based quantum algorithm for a corresponding testing problem, with a quadratic blowup from query to sample complexity. That is, any sample complexity lower bound for that problem now implies a (quadratically weaker) query complexity lower bound for the original task! Thus, the result allows to “systematically derive quantum query lower bounds using corresponding sample lower bounds from quantum information theory.” The authors use this sample-to-query lifting to establish both new and previously known quantum query complexity lower bounds.

A little magic means a lot, by Andi Gu, Lorenzo Leone, Soumik Ghosh, Jens Eisert, Susanne Yelin, and Yihui Quek (arXiv). Alright, I have to say, I am completely out of my depth with this one. But the title is great, the abstract sounds interesting, and the paper contains a lower bound on the number of copies any tester for “non-stabilizerness” requires (Theorem 6).

… then interactive proofs…

Distribution-Free Proofs of Proximity, by Hugo Aaronson, Tom Gur, Ninad Rajgopal, and Ron Rothblum (ECCC). In distribution-free property testing, to model “real practical settings” the underlying distance measure is not Hamming, but one induces by an unknown probability distribution, PAC-learning style. In interactive proofs of proximity (IPP), the property testing algorithm can interact by an all-powerful, but all-very-much-untrusted prover, IP-style. Well, you probably have guessed by now: this paper introduces a new type of property testing, combining the two! And shows that any problem in NC admits a distribution-free interactive proof of proximity with strongly sublinear communication (\(\tilde{O}(\sqrt{n})\)), which (under some reasonable assumption) is near-optimal.

… and then, distribution testing!

New Lower Bounds for Testing Monotonicity and Log Concavity of Distributions, Yuqian Cheng, Daniel Kane, and Zhicheng Zheng (arXiv). Monotonicity is a pretty clean and natural property of probability distributions over the line: the probability mass function must be non-increasing. This definition straightforwardly generalizes to either dimensions, and, let’s be honest: given how simple this property is, surely, by now we must have tight bounds on the sample complexity of testing monotonicity, right? Turns out, yes, but no. The known upper and lower bounds are tight… as long as the distance parameter \(\varepsilon\) is not too small. But if it is, then… the bounds are lose. And same thing for a related property, log-concavity! This state of affairs was really annoying to some, and I am on of these “some.” Fortunately, this paper makes significant progress towards scratching that itch, by proving new lower bounds for both monotonicity and log-concavity (via a new lower bound technique, relying on moment matching but preserving the type of constraints monotonicity and log-concavity impose on the individual probabilities). Almost there!

Tight Lower Bound on Equivalence Testing in Conditional Sampling Model, by Diptarka Chakraborty, Sourav Chakraborty, and Gunjan Kumar (arXiv). In the usual sampling model for distribution testing, the algorithm gets i.i.d. samples from the unknown distribution(s). This is very natural, but often leads to very high sample complexities… a recent line of work, starting with Chakraborty, Fischer, Goldhirsh, and Matsliah (2011) and Canonne, Ron, and Servedio (2012), introduced the “conditional sampling model”, a more powerful oracle setting where the algorithm gets to condition on subsets of the domain of its choosing. And, lo and behold, with great (additional) power comes greatly lower sample complexity! But again, some itch to scratch: for one of the “flagship” distribution testing questions, closeness testing, there remained a quadratic gap between upper (\(\tilde{O}(\log\log n)\), Falahatgar, Jafarpour, Orlitsky, Pichapati, and Suresh ’15) and lower (\(\Omega(\sqrt{\log\log n})\), Acharya, Canonne, Kamath’15). Not anymore! For this question (and some related ones), this paper essentially closes the gap, proving an \(\tilde{\Omega}(\log\log n)\) lower bound, bypassing the limitations of previous lower bound techniques!

Support Testing in the Huge Object Model, by Tomer Adar, Eldar Fischer, and Amit Levi (arXiv). Another paper, another distribution testing model! In the Huge Object Model recently introduced by Goldreich and Ron, one gets i.i.d. samples from an unknown probability distribution over \(n\)-bit strings, but does not to actually get the samples themselves, which are too big: instead, the algorithm then gets query access to the individual samples (each being an \(n\)-bit string). This is a rather appealing model, and the authors focuses on the task of testing the support size of the distribution. As their results show, this turns out to be quite interesting: namely, the paper shows a separation between adaptive and non-adaptive algorithms, as well as a separation between number of samples and number of queries (to these samples).

We also have some Boolean function testing on the menu:

Linear isomorphism testing of Boolean functions with small approximate spectral norm, by Arijit Ghosh, Chandrima Kayal, Manaswi Paraashar, and Manmatha Roy (arXiv). Given two Boolean functions \(f,g\colon\mathbb{F}_2^n\to{-1,1}\), a standard property testing question is to test whether \(f,g\) are isomorphic. Now, one can instead ask whether they are linearly isomorphic: i.e., whether there exists some invertible matrix \(A\) with entries in \(\mathbb{F}_2\) such that \(f\circ A = g\). This is this question that the authors study, through two different lenses: first, focusing on the communication complexity of the question when Alice has \(f\) and Bob has \(g\); and on the (usual) query complexity. In both cases, their results are expressed in terms of the (approximate) spectral norm of the functions.

Adversarial Low Degree Testing, by Dor Minzer and Kai Zhe Zheng (arXiv). Consider an adversarial setting of property testing where, every time the tester makes a query, an adversary gets to “erase” a small number values of the function, say \(t\), to try to fool the tester. This model was recently introduced by Kalemaj, Raskhodnikova, and Varma: in this paper, the authors continue this line of study, and show that one can test in an erasure-resilient fashion whether a function has low degree (at most \(d\)) with \(\tilde{O}(\log^{O(d)}(t)/\varepsilon)\) queries.

And, to conclude, graph testing:

Testing Graph Properties with the Container Method, by Eric Blais and Cameron Seth (arXiv). Consider the \(\rho\)-CLIQUE property: does a graph on \(n\) vertices contains a clique of size at least \(\rho n\), or is it \(\varepsilon\)-far from it? This property has been extensively studied in the dense graph testing model, but lower and upper bound still exhibited a factor-\(\rho\) gap, particularly striking in the “small clique” regime (when \(\rho \ll 1\)). This paper, using the graph container method, nearly closes that gap, up to polylogarithmic factors: showing an \(\tilde{O}(\rho/\varepsilon)\)-query upper bound. The authors also establish an improved upper bound for another flagship testing question, \(k\)-COLORABILITY.


Oh, my. We forgot a paper last month (appeared end of June), which means… fourteen. Fourteen! (And, of course, apologies for overlooking this paper!)

Refining the Adaptivity Notion in the Huge Object Model, by Tomer Adar and Eldar Fischer (arXiv). For the huge object model in property testing, see one of the papers above. In this one, the authors focus on the notion of adaptivity in this model, in a fine-grained fashion: i.e., which queries depends on what the tester previously saw, with interesting subtleties about what that means in the huge object model. Of course, the key question is whether adaptivity allows for more efficient property testing… As they show, the answer is yes: the authors establish a hierarchy of separations.

News for January 2023

Welcome to the first batch of 2023. Looks like it’s going to be a good year, with 5 property testing or related papers (that I could find) already:

An efficient asymmetric removal lemma and its limitations, by Lior Gishboliner, Asaf Shapira, and Yuval Wigderson (arXiv). One of the jewels of graph property testing is the triangle removel lemma (and its many generalizations and variants), which relates the number of triangles in a dense graph to its distance from being triangle-free: namely, any graph \(\varepsilon\)-far from being triangle-free must have \(\delta(\varepsilon)n^3\) triangles, where the density \(\delta(\varepsilon)\) only depends on the distance (and not the size of the graph!). This immediately leads to constant-query testers (and even “proximity-oblivious” testers) for triangle-freness (and, more generally, pattern-freeness). Unfortunately, the dependence on \(\varepsilon\) is quite bad, essentially a tower-type function (and it is known no polynomial bound is possible). This work attempts to bypass this impossibility result by proving an asymmetric removal lemma, or the form “any graph \(\varepsilon\)-far from being triangle-free must have \(\mathrm{poly}(\varepsilon)n^5\) 5-cycles” (and generalizations beyond triangles). This seems like a very interesting direction, with potential applications to property testing, and (who knows!) efficient testers for many properties hithertho only known to be (practically) testable for constant \(\varepsilon\).

Related (more removal lemmata!), a different work on this topic:

The Minimum Degree Removal Lemma Thresholds, by Lior Gishboliner, Zhihan Jin, and Benny Sudakov (arXiv). As mentioned above, removal lemmata relate the distance \(\varepsilon\) from being \(H\)-free (for a given subgraph \(H\)) to the density \(\delta(\varepsilon)\) of occurrences of \(H\) in the graph. Sadly, it is known that this density will be superpolynomial (in the distance) unless \(H\) is bipartite… which, while technically still yielding testing algorithms (query complexity independent of the size of the graph!), yields very inefficient testers (very bad dependence on \(\varepsilon\)!). This paper studies one direction to bypass this sad state of affairs: under which additional assumption on the underlying graph (specifically, bounds on its minimum degree) can we obtain a polynomial bound on \(\delta(\varepsilon)\)? And a linear bound? The authors give a tight degree condition for \(\delta(\varepsilon)\) to be polynomial when \(H\) is an odd cycle, and their results for the linear-dependence case establishes a separation between the two. Put differently: obtaining polynomial-query testers via removal lemmas is possible for a strictly larger class of graphs than linear-query ones!

And now, for something completely different: testing binary matrices!

A Note on Property Testing of the Binary Rank, by Nader H. Bshouty (arXiv). The binary rank of a matrix \(M\in\{0,1\}^{n\times m}\) is the smallest \(d\) such that there exist \(A\in\{0,1\}^{n\times d}\) and \(B\in\{0,1\}^{d\times m}\) with \(M=AB\); this can also be seen as the minimal number of bipartite cliques needed to partition the edges of a bipartite graph represented by \(M\). One can also define the relaxed notion of \(s\)-binary rank, if one enforces that each edge of the bipartite graph is covered by at most \(s\) bipartite cliques. The property testing question is then to decide, given inputs \(s,d,\varepsilon\), if \(M\) has \(s\)-binary rank at most \(d\), or is \(\varepsilon\)-far from it. The main result of this note is to give one-sided testers (one adaptive, and one non-adaptive) for \(s\)-binary rank with query complexity \(\tilde{O}(2^d)\) (for constant \(s\)), improving on the previous algorithms by a factor \(2^d\).

Into the quantum realm!

Testing quantum satisfiability, by Ashley Montanaro, Changpeng Shao, and Dominic Verdon (arXiv). Classically, one can study the property version of \(k\)-SAT, which asks to decide whether a given instance is satisfiable or far from being so. And people (namely, Alon and Shapira, in 2003) did! Quantumly, one can define an analogue of \(k\)-SAT, “quantum \(k\)-SAT”: and people (namely, Bravyi, in 2011) did! But what about property testing of quantum \(k\)-SAT? Well, now, people (namely, the authors of this paper) just did! Showing (Corollary 1.10) that one can efficiently distinguish between (1) the quantum \(k\)-SAT is satisfiable, and (2) it is far from satisfiable by a product state. This, effectively, extends the result of Alon–Shapira’03 to the quantum realm.

And to conclude, a foray into reinforcement learning via distribution testing…

Lower Bounds for Learning in Revealing POMDPs, by Fan Chen, Huan Wang, Caiming Xiong, Song Mei, and Yu Bai (arXiv). Alright, I’m even more out of my depth than usual here, so I’ll just copy (part of) the abstract, for fear I don’t do justice to the authors’ work: “This paper studies the fundamental limits of reinforcement learning (RL) in the challenging partially observable setting. While it is well-established that learning in Partially Observable Markov Decision Processes (POMDPs) requires exponentially many samples in the worst case, a surge of recent work shows that polynomial sample complexities are achievable under the revealing condition — A natural condition that requires the observables to reveal some information about the unobserved latent states. However, the fundamental limits for learning in revealing POMDPs are much less understood, with existing lower bounds being rather preliminary and having substantial gaps from the current best upper bounds. We establish strong PAC and regret lower bounds for learning in revealing POMDPs. […] Technically, our hard instance construction adapts techniques in distribution testing, which is new to the RL literature and may be of independent interest.” (Emphasis mine)

News for September 2022

Apologies for the delay! Another quiet month in the property testing world, with only one paper (that we found). If we missed any, let us know in the comments!

On Interactive Proofs of Proximity with Proof-Oblivious Queries, by Oded Goldreich, Guy Rothblum, and Tal Skverer (ECCC). Interactive Proofs of Proximity (IPPs) are the “interactive” version of property testers, where the algorithm can both query the input and interact with an all-knowing (but untrusted) prover. In this work, the authors study the power of a specific and natural type of “adaptivity” for IPPs, asking what happens when the choice of queries and the interaction with the prover are independent, or restricted. That is, what happens when these two aspects of the IPP algorithm are in separate “modules”? Can we still test various properties as efficiently? The paper proves various results in under several models (=restrictions between the two “modules”), focusing on the intermediate restriction where the two modules (queries to the input and interaction with the prover) are separate (no interaction), but have access to shared randomness.

News for May 2022

The crazy numbers from last month are not quite gone: we have five papers this month, not bad at all!

Codes! Distributed computing! Probability distributions!

Improved local testing for multiplicity codes, by Dan Karliner and Amnon Ta-Shma (ECCC). Take the Reed–Muller code with parameters \(m, d\), whose codewords are the evaluation tables of all degree-\(m\) polynomials over \(\mathbb{F}^d\). RM codes are great, they are everywhere, and they are locally testable: one can test whether a given input \(x\) is a valid codeword (or far from every codeword) with only very few queries to \(x\). Now, take the multiplicity code: instead of just the evaluation table of the polynomial themselves, a codeword includes the evaluations of all its derivatives, up to order \(s\). These beasts generalize RM codes: are they also locally testable? Yes they are! And this work improves on our understanding of this aspect, by providing better bounds on the locality (how few queries are necessary to test), and simplifies the argument from previous work by Karliner, Salama, and Ta-Shma (2022).

Overcoming Congestion in Distributed Coloring, by Magnús M. Halldórsson, Alexandre Nolin, Tigran Tonoyan (arXiv). Two of the main distributed computing models, LOCAL and CONGEST, differ in how they model the bandwidth constraints. In the former, nodes can send messages of arbitrary size, and the limiting quantity is the number of rounds of communications; while in the latter, each node can only send a logarithmic number of bits at each round. This paper introduces a new technique that allows for communication-efficient distributed (coordinated) sampling, which as a direct applications enables porting several LOCAL algorithms to the CONGEST model at a small cost: for instance, \((\Delta+1)\)-List Coloring. This new technique also has applications beyond these distributed models, to graph property testing – in a slightly non-standard setting where we define farness from the property in a “local” sense (detect vertices or edges which contribute to many violations, i.e., are “locally far” from the property considered).

Robust Testing in High-Dimensional Sparse Models, by Anand Jerry George and Clément L. Canonne (arXiv). In the Gaussian mean testing problem, you are given samples from a high-dimensional Gaussian \(N(\mu, I_d)\), where \(\mu\) is either zero or has \(\ell_2\) norm greater than \(\varepsilon\), and you want to decide which of the two holds. This “mean testing” equivalent (due to, erm, “standard facts”) to testing in total variation distance, and captures the setting where one wantss to figure out whether an underlying signal \(\mu\), subject to white noise, is null or significant. Now, what if this \(\mu\) was promised to be \(s\)-sparse? Can we test more efficiently? But what if a small fraction of the samples were arbitrarily corrupted — how much harder does the testing task become? For some related tasks, it is known that being robust against adversarial corruptions makes testing as hard as learning… This paper addresses this “robust sparse mean testing” question, providing matching upper and lower bounds; as well as the related question of (robust, sparse) linear regression.

Sequential algorithms for testing identity and closeness of distributions, by Omar Fawzi, Nicolas Flammarion, Aurélien Garivier, and Aadil Oufkir (arXiv). Consider the two “usual suspects” of distribution testing, identity and closeness testing, where we must test if an unknown distribution is equal to some reference one or \(\varepsilon\)-far (in total variation distance) from it; or, the same thing, but with two unknown distributions (no reference one). These are, by now, quite well understood… but the algorithms for them take a worst-case number of samples, function of the distance parameter \(\varepsilon\). But if the two distributions are much further apart than \(\varepsilon\), fewer samples should be required! This is the focus of this paper, showing that with a sequential test one can achieve this type of guarantees: a number of samples which, in the “far” case, depends on the actual distance, not on its worst-case lower bound \(\varepsilon\). One could achieve this by combining known algorithms with a “doubling search;” however, this still would lose some constant factors in the sample complexity. The authors provide sequential tests which improve on this “doubling search technique” by constant factors, and back this up with empirical evaluations of their algorithms.

Estimation of Entropy in Constant Space with Improved Sample Complexity, by Maryam Aliakbarpour, Andrew McGregor, Jelani Nelson, and Erik Waingarten (arXiv). Suppose that, given samples from an unknown distribution \(p\) over \(n\) elements, your task is to estimate its (Shannon) entropy \(H(p)\) up to \(\pm\Delta\). You’re in luck! We know that \(\Theta(n/(\Delta\log n)+ (\log^2 n)/\Delta^2)\) samples are necessary and efficient. But what if you had to do that under strict memory constraints? Say, using only a constant number of words of memory? Previous work by Acharya, Bhadane, Indyk, and Sun (2019) shows that it is still possible, but the number of samples required shoots up, with their algorithm now requiring (up to polylog factors) \(n/\Delta^3\) samples. This works improves upon the dependence on \(\Delta\), providing a constant-memory algorithm with sample complexity \(O(n/\Delta^2 \cdot \log^4(1/\Delta))\); they further conjecture this to be optimal, up to the polylog factors.