Property Testing Review

News for June 2026

Akash — Sat, 11 Jul 2026 15:38:57 +0000

Our press release this month features five papers: four of them are squarely property testing papers, and a fifth which I could not, in good conscience, bring myself to omit. Let us take a look: a paper that carries the submodularity testing story from two labels up to $k$ of them; a striking super-polynomial quantum advantage for tolerant junta testing; a sublinear tester that decides whether a mystery multiplication table is an abelian group; and a rather pretty reworking of the Goldreich-Ron bipartiteness tester through the Max-Cut SDP. And then, saved for last, a rare treat that settles a question which has stood open since the 1980s, namely that bipartite matching is in NC. Without further ado, let us examine our spread.

Testing k-submodularity by Themistoklis Haris and Diptaksho Palit (arXiv) Let us begin where this story usually begins, with the question posed by Seshadhri and Vondrák in Is submodularity testable?: given oracle access to $f \colon \{0,1\}^n \to \mathbb{R}$, can we distinguish submodular functions from those that are $\varepsilon$-far from every submodular function? Building up on a reduction to testing monotonicity over unbounded ranges, the authors exhibited a lower bound of $\Omega(n)$ queries for testing submodularity. Blais and Bommireddi moved the question into the $\ell_p$-testing model in Testing submodularity and other properties of valuation functions, where they obtained constant-query-complexity testers. The featured paper considers the following variation: take a partial partition of the ground set into $k+1$ parts–eg, a string in $[k+1]^n$. Think of the last part as the elements unassigned so far (the “partial” in the “partial partition”). A function on these partial partitions is $k$-submodular if the marginal gains diminish no matter which part an element is assigned to. The main result of the paper, following in the tracks laid out by Blais-Bommireddi, presents constant-query-complexity testers for $k$-submodularity in $\ell_p$ distance.

Quantum Advantage in Tolerant Junta Testing by Avishay Tal and Weiqiang Yuan (arXiv) Recall the tolerant junta testing problem: given parameters $(k, \varepsilon_1, \varepsilon_2)$ with $0\leq \varepsilon_1 < \varepsilon_2 \leq 1/2$ and black-box access to a Boolean $f$ on $n$ variables, decide whether $f$ is $\varepsilon_1$-close to some $k$-junta or $\varepsilon_2$-far from every $k$-junta. Our July 2016 News reported a result which presented tolerant testers with query complexity exponential in $k$. Additionally, despite our research efforts, getting a good understanding of the query complexity of adaptive testers for tolerant junta testers has been out of reach.

The featured paper picks up on this investigation thread and establishes the first super-polynomial quantum advantage for this problem in the adaptive setting. The main result is a non-adaptive quantum toleratnt tester with query complexity growing as $poly(k, 1/\varepsilon)$. On the other hand, the main result also proves that any adaptive, tolerant tester must cough up a number of queries that grows like $k^{\Omega(\log{1/\varepsilon} )}$.

Sublinear Time Algorithms for Abelian Group Property Testing by Nader H. Bshouty (arXiv) You are given a finite set $G$ and oracle access to a binary operation $\ast \colon G^2 \to G$, and you want to decide whether $(G,\ast)$ is an abelian group or is $\varepsilon$-far from every abelian group over $G$. The paper considers two access models: in the partially specified model the algorithm does not know $|G|$ and only sees randomly sampled elements together with the Cayley table restricted to those elements, and in the fully specified model it knows $|G|$ and has access to the full table. The main result is a tester in the weaker PS-model (and hence in the FS-model) which runs in time $\widetilde{O}(\sqrt{|G|}+1/\varepsilon)$, improving on testers of Goldreich and Tauber which run in time $O(|G|/\varepsilon)$.

Testing Bipartiteness in Logarithmic Rounds by Yumou Fei and Ronitt Rubinfeld (arXiv) Recall the seminal Goldreich-Ron tester for bipartiteness of bounded-degree graphs, which runs $\widetilde{O}(\sqrt n)$ random walks of length $O(\log^6 n)$ each and rejects when it discovers an odd cycle. The featured paper shows that $O(\sqrt n)$ walks of length $O(\log n)$ already suffice. The proof departs from the Goldreich-Ron analysis and instead routes the argument through the Goemans-Williamson SDP relaxation for Max-Cut. As a corollary, the paper obtains an $O(\log n)$-pass, $O(\sqrt n \cdot logn)$-space streaming algorithm for testing bipartiteness, and the pass complexity is optimal thanks to a recent lower bound of Fei, Minzer and Wang. Looks like a great read before the new term rolls in.

And now, as promised, a result which is not property testing at all, but which I could not skip over.

Bipartite Matching is in NC by Abhranil Chatterjee, Sumanta Ghosh, Rohit Gurjar, Roshan Raj and Thomas Thierauf (ECCC) Whether the randomness in the Mulmuley-Vazirani-Vazirani RNC algorithm for matching can be removed is a question that has stood since the 1980s, with the state of the art being the quasi-NC bounds of Fenner-Gurjar-Thierauf for bipartite graphs and Svensson-Tarnawski for general graphs. The featured paper settles the bipartite case: bipartite matching is in NC. The techniques are based on the polynomial method, inspired by the subspace design construction of Guruswami and Kopparty. The result extends to weighted bipartite matching and to computing the noncommutative rank of a symbolic matrix, and as a consequence the decision version of linear matroid intersection lands in NC as well. As a curious aside, in a talk by Rohit Gurjar (one of the authors), I learnt a fact that I found rather remarkable.
Take two univariate polynomials $\boldsymbol{p}, \boldsymbol{q} \in \mathbb{R}[X]_{\leq d}$, each of degree $d$. Suppose these polynomials are linearly independent, so that $span(\boldsymbol{p}, \boldsymbol{q})$ is a two-dimensional space of polynomials of degree at most $d$. Consider the following set of real numbers: $S_1 = \{ \alpha \in \mathbb{R} : \boldsymbol{f}(\alpha) = 0 \text{ for some } \boldsymbol{f} \in span(\boldsymbol{p}, \boldsymbol{q}) \}. $ That is, $S_1$ collects those reals that happen to be a root of some polynomial hiding in the span; call these the roots of the span. One notes that $S_1$ contains infinitely many elements. So far so good. Next, consider the set $S_2 = \{ \alpha \in \mathbb{R} : \alpha \text{ is a root of some } \boldsymbol{f} \in span(\boldsymbol{p}, \boldsymbol{q}) \text{ with multiplicity } 2 \}. $ Somewhat surprisingly (to me), $S_2$ is a finite set, and in fact $|S_2| \leq 2d$.

Announcing WoLA 2026 in Boston

Clement Canonne — Sat, 13 Jun 2026 10:14:51 +0000

The 10th edition of WoLA, the Workshop on Local Algorithms, will be taking place on August 16-18, 2026 at at Boston University, Boston, MA, just before APPROX/RANDOM!

For those unfamiliar with WoLA:

Local algorithms, that is, algorithms that compute and make decisions on parts of the output considering only a portion of the input, have been studied in a number of areas in theoretical computer science and mathematics. Some of these areas include sublinear-time algorithms, distributed algorithms, inference in large networks and graphical models. These communities have similar goals but a variety of approaches, techniques, and methods. This workshop is aimed at fostering dialogue and cross-pollination of ideas between the various communities.

You are all invited! For more on (free) registration, how to submit a poster or propose a talk, list of invited speakers, and local arrangements, see the website.

Don’t miss it! Not only is WoLA a truly enjoyable and interesting event, showcasing exciting new results and directions of research, it’s also one centered around an incredibly welcoming and collegial (and fun!) community of researchers. But that’s not all! Expect a particularly amazing edition this year, as this is the 10th anniversary edition of WoLA, and it is being held on the top floor of the new Computer and Data Sciences building at BU, with breathtaking views of Boston.

Program Committee: Maryam Aliakbarpour (Rice University), Sepehr Assadi (University of Waterloo; Chair), Eric Blais (University of Waterloo), Yi-Jun Chang (National University of Singapore), Talya Eden (Bar Ilan University), Nathan Harms (University of British Columbia), Akash Kumar (IIT Bombay), Yannic Maus (TU Graz), Sofya Raskhodnikova (Boston University; Local Chair)), Chen Wang (Rensselaer Polytechnic Institute).

News for May 2026

Clement Canonne — Sun, 07 Jun 2026 08:33:32 +0000

10 11 12 papers in May! With a mix of testing for logic, quantum, codes, and distribution testing, this was a very prolific and well-rounded month. Let’s get to it, in no particular order!

Constant time testability of first-order logic with modulo counting on finitary graphs, by Isolde Adler, Jenny Stimpson (arXiv). This paper is concerned with testing, in the bounded-degree graph model, graph properties which can be expressed in first-order (resp. second-order monadic) logic with additional modulo quantifiers. The main result is a “meta-theorem” which states that every such property concerning graphs with a uniform (constant) bound on the size of connected components can be tested with a constant number of queries.

Tolerant Testing for Unique Games, by Yuichi Yoshida (arXiv). The property testing version of Unique Games sees the constraint satisfaction problem over $n$ variables and $m$ constraints as a constraint graph $G$ with $n$ vertices and $m$ edges, where the edges are labeled by permutations of the alphabet $[Q]$: the goal is then, given query access to the graph, to distinguish between he minimum fraction of constraints violated by any labeling being at most $\varepsilon$ or at least $\rho$. This paper considers the question in the adjacency list access model, where the algorithm can where the algorithm has uniform-vertex, degree, and neighbor query access to the graph. The main result is a tester which (in contrast to previous work) makes no structural assumption on the graph, and, for $\varepsilon = O(\rho^4/\log n)$, solves this tolerant testing question with query complexity $$\tilde{O}((\sqrt{m} + n/\sqrt{m})\text{poly}(1/\rho))$$ for constant alphabet size $Q$.

Optimal Testing of Reed-Muller Codes with an Online Adversary, by Esty Kelman, Uri Meir, Kai Zhe Zheng (arXiv). In the online erasure model of property testing, after each query made to the input $f$, an adversary gets to erase up to $t$ (adaptively chosen) values of $f$. This makes the testing task much more challenging, requiring testers to (intuitively) behave somewhat unpredictably. In this paper, the authors define a new type of testers, in-between sample-based (only make uniformly random queries to the input) and query-based: semi-sample-based testers, which first choose an arbitrary subset $S$ of potential queries, and then get uniformly random queries from $S$. The main results are (1) a semi-sample-based tester for Reed-Muller codes in the “usual” testing model, which (2) can be made to work in the online erasure model (complemented by a lower bound showing it is then near-optimal).

Mean Testing under Truncation beyond Gaussian, by Yuhao Wang, Roberto Imbuzeiro Oliveira, Themis Gouleakis (arXiv). In the Mean Testing problem, given i.i.d. samples from a high-dimensional distribution with unknown mean vector $\mu$, the goal is to distinguish between $\|\mu\|_2=0$ and $\|\mu\|_2 > \alpha$. While the fundamental cases of $d$-dimensional Gaussian distributions and product Bernoulli distributions are fully understood, a natural generalization is what happens when the samples are corrupted or censored: for instance, if samples are truncated (i.e., only samples within an (unknown) set $S$ can be observed). Work of Canonne, Gouleakis, Wang, and Yang (2025) addresses the case of mean testing under truncation for identity-covariance Gaussians: this paper significantly generalizes these results, dropping the Gaussianity to allow for any class of high-dimensional distributions satisfying a bounded moment assumption.

Distributed Gaussian Mean Testing under Communication Constraints: messages, samples, and coins, by Clément Canonne and Nimitt (arXiv). Going back to Gaussian mean testing, what happens when the i.i.d. samples are distributed among $n$ users each holding $m$ samples, subject to strict (but possibly different) one-way communication constraints, and sharing a small random seed? The authors provide algorithms for Gaussian mean testing in this very general setting, which capture as special cases a number of previous works on distributed mean testing, and allow for smooth trade-offs between the parameters at play.

And now, onto the quantum realm!

On Clifford hierarchy testing and near-extremizers of noncommutative uniformity norms, by Zongbo (Bob) Bao, Jop Briët, Davi Castro-Silva, Philippe van Dordrecht, and Jonas Helsen (arXiv). The Clifford hierarchy is a sequence of quantum circuit models, allowing for increasingly general gates (the first level of the Clifford hirerachy being the set of Pauli gates, and the second is the Clifford group). In this paper, the authors consider the question of testing, given access to a unitary $U$ (and its inverse), whether it belongs to a given level of the Clifford hierarchy, or is far from it. Their main result is an efficient testing algorithm for the 3rd level of the Clifford hierarchy, the first one still open: to do so, they establish a “robust inverse theorem” for the fourth Pauli uniformity norm $P^4$, relating the closeness of a unitary to the 3rd level of the Clifford hierarchy to to its $P^4$ norm.

Practical Tests and Witnesses of Fermionic non-Gaussianity, by Tobias Haug, Xhek Turkeshi, and Piotr Sierant (arXiv). In this paper, the authors are concerned with quantifying and detecting the distance of a (pure) $n$-qubit-quantum state to the set of Fermionic Gaussian States (FGS). Among other results, they provide two testing algorithms which distinguish between (1) being a (pure) Fermionic Gaussian state and (2) being at trace distance at least $\varepsilon$ from every FGS: the first, using $O(n^2/\varepsilon^2)$ two-copy Bell measurements, and the second, using $O(n^3/\varepsilon^4)$ single-copy measurements.

Quantum Multi-Level Estimation of Functionals of Discrete Distributions, by Kean Chen, Minbo Gao, Tongyang Li, Qisheng Wang, Xinzhao Wang (arXiv). Given purified quantum query access to a classical probability distribution $p$ over $n$ elements, i.e., access to a unitary which prepares the state $$\sum_{i=1}^n \sqrt{p_i}|i\rangle |\textsf{garbage}_i\rangle$$, the goal is to estimate a functional $f(p)$ of the distribution: distance to uniformity, entropy, support size… This paper provides a general framework to do so, for any function $f$ for which $f^2$ admits a low-degree polynomial approximation. As a direct application, the authors obtain improved quantum algorithms for estimating the $q$-Tsallis entropy of a discrete distribution, for all ranges of parameter $$q$ (and near-optimal for $q > 1$).

Sticking with distributions, but back to classical…

Entropy Equivalence Testing, by Clément Canonne, Yash Pote, Jonathan Scarlett, and Joy Qiping Yang (arXiv). In the standard task of closeness testing for probability distributions, one gets i.i.d. samples from two unknown probability distributions $p,q$, and must distinguish between (1) $p=q$ and (2) $\text{TV}(p,q) > \varepsilon$, where TV denotes the total variation distance. This is now well-understood — but what happens when one changes (2) to something else? Some previous work considered different notions of distance than TV distance: in this work, (2) is relaxed to a much weaker version, only asking to reject when the entropies of $p,q$ are significantly different. As shown in the paper, not only does this variant lead to a more sample-efficient tester, it can be very useful: used as a blackbox, this yields a better learning algorithm for a well-studied class of high-dimensional distributions, that of Bayesian networks.

Testing properties of trees in graphical models with covariance queries, by Sofiya Burova, Francisco Calvillo, Gábor Lugosi, Piotr Zwiernik (arXiv). The goal of this paper is to test properties about the structure of high-dimensional distributions (specifically, of Gaussian graphical models), but not from samples: instead, given queries to their covariance matrix, or, equivalently, to the graph encoding dependencies between variables. Here, the assumption is that this underlying graph is a tree, and the goal is to test its diameter: the main result is an bicriteria testing algorithm which decides whether the tree has diameter at least $D$ or at most $(1-\delta)D$, making $\tilde{O}(\frac{n^2}{D^2\delta^2})$ covariance queries (where $n$ is the dimension).

Edit: I had forgotten one!

Reducing the Randomness in Partition Oracles for Bounded Degree Minor-Free Graphs, by Akash Kumar, Abhiruk Lahiri, and C. Seshadhri (arXiv). Influential results in graph theory and graph property testing are separator theorems, which state that some classes of graphs can be “partitioned” at little cost into small components. In particular, known separator theorems imply that bounded-degree minor-free graphs can be partitioned into connected components of constant size! This led to the notion of partition oracle, by Hassidim-Kelner-Nguyen-Onak, which, given access to a minor-free graph, asks to implement fast and consistent “oracle access” to such a partition: given a vertx $v$, which connected component is it in? These have led to numerous advances in sublinear algorithm and property testing, but one aspect remained ill-understood: how large of an input random seed does a partition oracle need to achieve this consistent oracle access? This paper answers the question by showing that the constant-query partition oracle of Kumar, Seshadhri, and Stolman (2021) can be implemented with a constant-size random seed (for constant bounded degree and $\varepsilon$), being efficient in all respects.

Edit: I had forgotten two!

A digest of the work of Rothblum, Vadhan, and Wigderson (2013), by Oded Goldreich (ECCC). Interactive Proofs of Proximity (IPPs) are the property testing analogue of Interactive Proof systems, where the testing algorithm is allowed to interact with an all-powerful but untrusted prover. IPPs are fascinating objects, which have received significant interest since their introduction, a decade or so ago. In this expository work, the author provides a detailed overview, complemented with proofs and discussion, of a result of Rothblum, Vadhan, and Wigderson, who showed that any property decidable by log-space uniform circuits of small depth $D$ and size $S$ admits public-coin IPPs with perfect completeness, low query complexity, and low communication complexity, and round complexity $O(D\log S)$. (The theorem further provides a smooth tradeoff between query complexity $q$ and communication complexity $c$: roughly, $q\cdot c \approx n\cdot D^{O(1)}$). This positive result is (somewhat) complemented by a negative one, showing the existence of some necessary trade-off between query and communication complexity.

News for April 2026

Seshadhri — Tue, 19 May 2026 04:30:16 +0000

Apologies for the major delay! We have a nice bounty of 8 papers, with a big haul of graph property testing papers. Let us not delay even further and get right to it.

Sublinear-query relative-error testing of halfspaces by Xi Chen, Anindya De, Yizhi Huang, Shivam Nadimpalli, Rocco A. Servedio, and Tianqi Yang (arXiv). This paper studies the relative error model for property testing, which is designed for studying “sparse” inputs. Consider functions $f, g: \mathbb{R}^n \to \{0,1\}$ over the Gaussian measure in $\mathbb{R}^n$. Let $p = \Pr[f(x) = 1]$, which one can think of as the “volume” of $f$. The relative distance of $g$ from $f$ is defined as $\Pr_{x}[f(x) \neq g(x)]/p$. In situations where the volume is tiny, this definition makes more sense than the standard definition, where $f$ would simply be close to the all zero function. To get non-trivial results, one needs another oracle called SAMP that gives a random sample of a point in $f^{-1}(1)$. This paper studies the classic problem of halfspace testing. The main results are property testers with query complexity sublinear in $n$ and polylogarithmic in $1/p$. Some of these testers are given the value $p$, and some only use the SAMP oracle (so no querying of arbitrary points). In the full model of standard queries and the SAMP oracle, one can get a $poly(\log(1/p), \varepsilon)$ query tester.

Classes Testable with $O(1/\varepsilon)$ Queries for Small ε Independent of the Number of Variables by Nader H. Bshouty and George Haddad (arXiv). Consider property testing in the most classic setting, functions $f:\{0,1\}^n \to \{0,1\}$. An obvious lower bound for testing any non-trivial property is $\Omega(1/\varepsilon)$; in fewer queries, one cannot even detect the existence of an $\varepsilon$-fraction of the domain that may certify that input $f$ is not in the property. When can this trivial lower bound be matched? Specifically, this paper studies parameterized properties where the tester query complexity is $O(\psi + 1/\varepsilon)$, where $\psi$ depends on the property parameters, and the big-Oh only hides an “absolute” constant. Take the classic problem of $k$-junta testing. A $k$-junta is a function that depends only on $k$ variables. An old result of Blais gives a tester with query complexity $O(k\log k + k/\varepsilon)$. But note that this does not match the trivial lower bound as $k$ becomes large. This paper gives a tester with complexity $O(k2^k + 1/\varepsilon)$, thereby matching the trivial lower bound even when $k$ has some (small) dependence on $\varepsilon$. This paper gives a number of such results for well studied properties like low Fourier degree and being a sparse polynomial.

Distributed Quantum Property Testing with Communication Constraints by Mina Doosti, Ryan Sweke, and Chirag Wadhwa (arXiv). This paper solves a quantum analogue of a distributed distribution testing problem, introduced by Acharya-Canonne-Tyagi. Suppose $m$ machines have access to a distribution $\rho$. An algorithm wishes to test if this distribution is close to a known distribution $\sigma$ (the identity testing problem), and can do independent communication with each machine. This problem is studied under constraints on the amount of communication and the amount of shared randomness between the machines. The quantum equivalent of a distribution is a quantum state and communication can be either classical bits or qubits. So $\rho, \sigma$ denotes quantum states, and the aim is property test if they are the same. This is called the quantum “certification” problem, which is the analogue of identity testing) in the distributed setting. The main results are in the setting where there is no classical communication, and the machines only send qubits.

A characterization of one-sided error testable graph properties in bounded degeneracy graphs by Oded Lachish, Amit Levi, Ilan Newman, and Felix Reidl (arXiv). For the setting of general graphs, the standard model is the Goldreich-Ron adjacency list model, as clarified by Czumaj-Sohler. Given a vertex, one can fetch a uniform random neighbor. In this model, Czumaj-Sohler completely characterized one-sided (constant query) testable properties as those described by subgraph freeness. The main result there was to show that $H$-freeness is constant time testable when the input graph $G$ is minor-free. A natural super-class of minor-free graphs is that of bounded degeneracy graphs. A graph has bounded degeneracy if all subgraphs have bounded average degree; this is a rich class of graphs containing all minor-free classes, and often occurring in practice. It has been shown that testing four cycle-freeness in bounded degeneracy graphs requires $\Omega(n^{1/4})$ queries. So it raises the natural question of characterizing the patterns $H$ for which $H$-subgraph freeness is testable in constant queries. And indeed, this paper achieves exactly that, with a neat characterization.

Lower Bounds for Testing Directed Acyclicity in the Unidirectional Bounded-Degree Model by Yuichi Yoshida (arXiv). While property testing of sparse graphs has received some attention, sublinear algorithms for directed graphs is relatively unexplored. Problems become surprisingly hard, and analysis is even harder. Consider property testing of bounded outdegree graphs with no bound on indegree. The model only gives access to the out-neighborhoods, not the in-neighborhood. This asymmetry lies at the root of hardness. This paper studies the classic property of being acyclic. The main results are new lower bounds that are significant improvements over past results. Let $n$ be the number of vertices. Ignoring log factors, One-sided testers for acyclicity require $\Omega(n^{2/3})$ queries and two-sided testers require $\Omega(\sqrt{n})$ queries. And tolerant testers essentially require reading the whole graph, with a lower bound of $\Omega(n)$ queries.

Quantum Property Testing for Bounded-Degree Directed Graphs by Pan Peng and Jingyu Wu (arXiv). Same setting as above, but assume that both outdegree and indegrees are bounded. We consider the bidirectional model, which gives access to both in and out neighborhoods, as opposed to the (“standard”) unidirectional model. Pang-Wang proved that there is a property that requires $O(1)$ queries in bidirectional model, but (almost) $\Omega(n)$ queries in the unidirectional model. (Ignoring dependencies of $\varepsilon$ and the degree $d$ for clarity.) Remarkably, this paper shows that such a lower bound does not hold in the quantum model. The main result is that any property that is classically testable with $O(1)$ queries in bidirectional model can be tested with $o(\sqrt{n})$ queries in undirectional model with a quantum computer. Moreover, there exists a hard property with a nearly matching lower bound. One of these consequences of this result is a quantum algorithm for testing $H$-freeness in the unidirectional model.

Sublinear Spectral Clustering Oracle with Little Memory by Ranran Shen, Xiaoyi Zhu, Pan Peng, and Zengfeng Huang (arXiv). The focus of this paper, as the title clearly says, are sublinear spectral clustering oracles. Consider an bounded degree input graph $G$ that is $k$-clusterable, which means that (say) one can remove a small fraction of edges, to get at most $k$ connected components, each of which has high internal conductance. A line of work has shown that one can answer cluster membership queries in sublinear time, without any preprocessing of $G$. So this is a Local Computation Algorithm (LCA) for the clusters. Any LCA has a storage budget where it keeps a data structure to ensure that all query answers are consistent with a single solution. Previous results by Peng showed that there is a clustering oracle with $O(\sqrt{n})$ space, such that each query can be answered in $O(\sqrt{n})$ time. One can obviously use linear space and answer queries in constant time. This paper shows a smooth tradeoff between these solutions: given $M$ storage, there is an oracle that answers questions in $O(n/M)$ time.

Locally Computable High Independence Hashing by Yevgeniy Dodis, Shachar Lovett, and Daniel Wichs (ECCC). Not exactly a sublinear algorithms result, but one can interpret it as an LCA result. Consider the classic problem of constructing a family $\mathcal{F}$ of $k$-wise independent hash functions. Let each function be of the form $f:\{0,1\}^n \to \{0,1\}^n$. These constructions need a seed of $O(kn)$ bits. In many settings, $k$ is quite large, say $poly(n)$ (so going beyond the typical constant $k$ setting). How many queries, denoted $t$, to the seed does it take to compute $f(x)$, for an input $x \in \{0,1\}^n$? Thinking of $n$ as small and $k$ as large, this is exactly an LCA for the function $f$. An old lower bound of Siegel proves that computing $f(x)$ requires $\Omega(t)$ queries to the seed. The main result of this paper is giving a construction that matches this bound; unfortunately, the construction is non-constructive and depends on the existence of certain expanders. If we relax the independence condition to “almost” independent (up to some precision parameter $\varepsilon$), then the paper gives an explicit construction.

News for March 2026

Seshadhri — Fri, 03 Apr 2026 19:35:36 +0000

After either really busy months or really quiet months, we’re now on a moderately busy month (which is the way it should be). A collection of 7 papers, now showing an increased interest in sublinear graph algorithms. Of course, we still have some distribution testing, and a paper on sparse functional representations.

A note on approximating the average degree of bounded arboricity graphs by Talya Eden, C. Seshadhri (arXiv). Let us begin with this simple note, since it will set the stage for the other papers. Consider the basic problem of estimating the average degree of a (simple) graph $G = (V,E)$, under the standard adjacency list model. We assume that $V = [n]$, and an algorithm can get the degree of a vertex and query for uar neighbors. Classic results prove that there are $\widetilde{O}(\varepsilon^{-2} n/\sqrt{m})$ query (and time) algorithms. Moreover, when the graph arboricity is $\alpha$, one can get $\widetilde{O}(\varepsilon^{-2} \alpha)$ query algorithms. These algorithms and results are unfortunately buried deep inside papers, or have extraneous $\varepsilon$ and $\log n$ factors. This note give a simple, clean presentation of these results; the main result just takes two pages.

Almost-Uniform Edge Sampling: Leveraging Independent-Set and Local Graph Queries by Tomer Adar and Amit Levi (arXiv). Same setting as the previous blurb. A result of Eden-Rosenbaum (among others) showed that the problem of sampling a uniform random edge can be done in the same query complexity as estimating the number of edges. This is quite interesting, since these problems (at least from a sublinear perspective) are not similar, and the underlying algorithms are quite different. This paper studies this phenomenon in more general models. Very recent results (from Jan 2026!) by Adar-Hotam-Levi showed that average degree estimation can be done much faster with Independent Set (IS) queries together with standard adjacency list queries. Essentially, for sparse graphs, one can get $O(n^{1/4})$ queries, as opposed to the usual $\sqrt{n}$ bound. This paper shows that edge sampling can be done the same time as edge estimation (with nearly matching lower bounds). This “complexity equivalence” is also proven for the model with only IS queries.

Testing Properties of Edge Distributions by Yumou Fei (arXiv). Let us bring in some distributions to the (dense) graph setting. Consider an input graph $G$ with $n$ vertices, represented as an adjacency matrix. Suppose there is an unknown distribution $\mu$ over $[n]^2$ that we can sample from. Our aim is to test if (say) the graph is bipartite, where distance is defined according to $\mu$. If $\mu$ is uniform over all edges, then this is standard property testing. With arbitrary $\mu$, we are looking at bipartiteness testing in the distribution-free setting. The classic Goldreich-Goldwasser-Ron result shows that bipartiteness testing (in the standard, uniform setting) can be done in $poly(\varepsilon^{-1})$ queries. They also prove an $\Theta(n)$ bound when the distribution $\mu$ is uniform over an arbitrary subset of edges. This papers gives a bipartiteness tester for any $\mu$, with the same asymptotic complexity. Moreover, there are results for testing triangle-freeness and square-freeness in this setting; the complexities are $\Theta(n^{4/3})$ and $\Theta(n^{9/8})$ respectively. (Note that in the standard setting, we get complexities independent of $n$.)

Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions by Maryam Aliakbarpour, Alireza Azizi, Ria Stevens (arXiv). Now onto distribution testing. Consider the problem of determining if a distribution $p$ with support $[n_1] \times [n_2] \times \ldots [n_d]$ is a product distribution. This is a fundamental problem that goes back to the birth of statistics and Pearson’s $\chi^2$ test. This paper studies the problem in the learning augmented setting. Suppose we are also given a complete description of a predicted approximate distribution $\widehat{p}$, and a tolerance $\alpha$. If $\|\widehat{p} – p\|_{TV} \geq \alpha$, then the algorithm can output “Inaccurate Information”. So, if the predictor $\widehat{p}$ is accurate, then the algorithm needs to give a correct output. The main result shows that one can improve on the optimal sample complexity by $\alpha^{1/3}$ factors, so the approximate prediction can help avoid the lower bounds for independence testing.

Improved Local Computation Algorithms for Greedy Set Cover via Retroactive Updates by Slobodan Mitrović, Srikkanth Ramachandran, Ronitt Rubinfeld, Mihir Singhal (arXiv). Consider the classic set cover problem; given a collection of sets $\mathcal{S}$ over a universe $\mathcal{E} = \bigcup_{S \in \mathcal{S}} S$, find the smallest subcollection that covers $\mathcal{S}$. For the sublinear perspective, assume that the input is represented as a bipartite incidence graph between $\mathcal{S}$ and $\mathcal{E}$, with adjacency list access. The aim is to get a Local Computation Algorithm (LCA) that computes a $O(\log \Delta)$-approximate set cover (where $\Delta$ is the largest size in $\mathcal{S}$). An LCA is an algorithm that, given some $S \in \mathcal{S}$, outputs (in sublinear time) whether this set is in the output subcollection. The main challenge is to bound the query complexity of this operation. The guarantee is that all the output sets should have the desired guarantees. There are distributed protocols that can output an $O(\log \Delta)$-approximate set cover in $r = O(\log \Delta \cdot \log f)$ rounds ($f$ is the largest number of sets that an element participates in). Parnas-Ron showed that one can get an LCA using $\Delta^r$ queries. This paper gives an LCA that makes $2^r$ queries, which avoids numerous roadblocks in previous work.

Approximate Butterfly Counting in Sublinear Time by Chi Luo, Jiaxin Song, Yuhao Zhang, Kai Wang, Zhixing He, Kuan Yang (arXiv). This is an interesting applied algorithms paper, that at its core, uses a sublinear algorithm. A significant amount of graph data is really a bipartite graph (like actor-movie, author-paper, etc.). A butterfly is a $K_{2,2}$, which is also a 4-cycle. Counting butterflies is a common data mining problem, since a high density of $K_{2,2}$’s is usually indicative of some structure or clustering in the bipartite graph. There have been a number of practical algorithms to compute (or estimate) butterfly counts in bipartite graphs. This paper gives a theoretical sublinear algorithm that runs in $\widetilde{O}(w\sqrt{m}/b + m/b^{1/4})$, where $w$ is the wedge (2-path) count and $b$ is butterfly count. This algorithm is implemented and compared with previous results. Interestingly, the comparison is done on queries and raw running time, and even the latter shows orders of magnitude improvement. This kind of work bridges the theory/practice divide, and shows the power of pure theoretical ideas (like heavy-light vertex partitioning, various edge sampling strategies) for practical algorithmics. Hopefully this inspires even more work along these lines!

Testing Sparse Functions over the Reals by Vipul Arora, Arnab Bhattacharyya, Philips George John, Sayantan Sen (arXiv). The problems of linearity/low degree testing are well-known to our readers. There is a recent line of work generalized such results to the real-valued setting. Consider $f: \mathbb{R}^n \to \mathbb{R}$. Distance is measured in $l_1$ over the Gaussian measure. This paper studies a number of properties parameterized by sparsity. Consider linear functions, where $f(x)$ is linear iff $f(x) = \sum_i c_i x_i$, for some constants $c_i$. A function is $k$-linear if only $k$ of the $c_i$ coefficients are non-zero. So the function has a small representation, independent of the dimension $n$. This paper gives a tester that makes $\widetilde{O}(k\log k + 1/\varepsilon)$ queries. Similar results are given for $k$-sparse low-degree polynomials and the general problem of $k$-juntas. There is a significant subtlety in querying a real-valued function. One can imagine that the value $f(x)$ is only obtained with some precision $\eta$. There are results for $k$-linearity for the standard Boolean setting, but the precision and $l_1$-distance introduce challenges for generalizing to the real valued setting.

News for February 2026

Clement Canonne — Sat, 14 Mar 2026 01:07:14 +0000

Apologies for the very late post! Last month was a bit calmer on the property testing front, with “merely” 3 papers we found. (Of course, if we missed any… let us know in the comments!)

Testing Monotonicity of Real-Valued Functions on DAGs, by Yuichi Yoshida (arXiv). Monotonicity of functions is a fundamental, and well-studied property in the literature, and testing monotonicity on the line, the reals, the Bollean hypercube, and the hypergrid (among others) have been studied at great lengths (and yet, still not fully understood!). This paper considers a new twist on the question, where the object of study is a real-valued function defined on an $n$-vertex directed acyclic graph (DAG) provided to the algorithm. The key contribution of this work is showing that, on this type of structured poset, testing monotonicity requires $\Omega(n^{1/2-\delta}/\sqrt{\varepsilon})$ non-adaptive queries for any constant $\delta>0$, nearly matching the general-poset non-adaptive upper bound of Fisher, Lehman, Newman, Raskhodnikova, Rubinfeld, and Samorodnitsky (2002). The paper also provides a similar adaptive lower bound, for one-sided testers. The author also establishes more fine-grained results (both upper and lower bounds), leveraging assumptions on either the range of the function, or the sparsity of the DAG.

The Power of Two Bases: Robust and copy-optimal certification of nearly all quantum states with few-qubit measurements, by Andrea Coladangelo, Jerry Li, Joseph Slote, and Ellen Wu (arXiv). Following recent works by Huang, Preskill, and Soleimanifar, and then Gupta, He, and O’Donnell, this paper considers the task of state certification (“is this unknown quantum state, which I am given copies of, equal, or very different from, the reference quantum state I want?”), which can be seen as the quantum analogue of identity testing in the classical distribution testing case, for pure reference target states. The key aspect of these works is that one requires the testing algorithm to make very “simple” (ideally single-qubit ones) on the copies of the unknwon $n$-qubit state: the underlying idea being that certifying a state given to you should be, in a very quantifiable sense, much “simpler” than preparing the reference state from scratch, otherwise the whole endeavor is sort of useless. Long story short, in this paper, the authors obtain both a very long title and a much more robust algorithm to perform this task, allowing to do tolerant state certification, with constant tolerance parameter. Only slight wrinkles: the algorithm requires one final measurement on logarithmically many qubits (not a single qubit, which would be the Holy Qugrail), and only works for “nearly all” reference states.

Instance-optimal estimation of $L_2$-norm, by Tomer Adar (arXiv). Given i.i.d. samples from an probability distribution $p$ over an arbitrary discrete domain, estimate its collision probability $\|p\|^2_2$ (equivalently, its $\ell_2$-norm) to a multiplicative $1\pm \varepsilon$ factor. How hard can this be? Quite surprisingly, this question was not, in fact fully solved, and shows a much more complex landscape than expected, in that the right answer is not the obvious guess. An algorithm matching a (known, yet unpublished) lower bound of Tugkan Batu and myself was posed as an open problem by Tugkan at WoLA 2025: in this work, the author solves the problem, showing that the lower bound is indeed tight, by providing an algorithm achieving the right sample complexity, $O\left(\frac{1}{\varepsilon \|p\|_2}+\frac{\|p\|_3^3-\|p\|_2^4}{ \varepsilon^2\|p\|_2^4}\right)$. It feels good to see this very simple-looking (but not simple, it turns out!), fundamental question solved.

News for January 2026

Seshadhri — Fri, 06 Feb 2026 05:49:59 +0000

After some insanely busy months, we have a merely busy month. A couple of neat results on sublinear graph algorithms (a subject dear to me!), DNF testing, and two expositions that should be of interest to our readers.

When Local and Non-Local Meet: Quadratic Improvement for Edge Estimation with Independent Set Queries by Tomer Adar, Yahel Hotam, Amit Levi (arXiv). We’ve seen a lot of recent activity of the fundamental problem of edge estimation in graph. Given a simple, connected, undirected graph $G = (V,E)$ with $n$ vertices, we want to get a $(1+\varepsilon)$-approximation to the number of edges $m$. In the standard adjacency list access model, a classic result of Goldreich-Ron proves that the sample complexity of this problem is $\Theta(n/\sqrt{m})$ (ignoring $poly(\varepsilon^{-1} \log n)$ factors). A completely different access model is the Independent Set (IS) model. For any set $S$, an oracle outputs a single bit indicating whether an edge is present in $S$. Again, in this model, the optimal bound is $\Theta(n/\sqrt{m})$. This paper shows that with access to all oracles (standard adjacency list and IS queries), one gets a quadratic improvement and the complexity is $\Theta(\sqrt{n/\sqrt{m}})$.

Spectral Clustering in Birthday Paradox Time by Michael Kapralov, Ekaterina Kochetkova, Weronika Wrzos-Kaminska (arXiv). Consider a (bounded degree) graph $G$ that is can be partitioned into $k$ roughly equal “expanding clusters”. This means that one can remove an $\varepsilon$-fraction of the edges to get $k$ connected components of size around $n/k$, each of which has expansion at least $\phi$. The aim is to get a sublinear oracle that determines the cluster that a vertex belongs to. The origins of this problem go back to problems in expansion testing and local expansion reconstruction. From a spectral perspective, the ideal “cluster classifier” is to simply take the bottom $k$ eigenvectors of the Laplacian, and project every vertex onto this vector space. The corresponding embeddings of vertices in the same cluster will be close. This paper effectively implements this procedure in sublinear time; specifically $\widetilde{O}(\sqrt{n})$ time, using a birthday paradox type collision argument to estimate the embedding similarities.

DNF formulas are efficiently testable with relative error by Xi Chen, William Pires, Toniann Pitassi, Rocco A. Servedio (arXiv). The relative error model for Boolean functions was recently proposed as an analogy to sparse graph testing. The standard notion of distance between two functions $f, g: \{0,1\}^n \to \{0,1\}$ is just $\| f – g\|_0/2^n$. But this is not meaningful when $|f^{-1}(1)| \ll 2^n$. A natural notion of distance is to consider the sets $f^{-1}(1)$ and $g^{-1}(1)$ and measure their symmetric difference. One can now define distances to properties analogously. There have been numerous property testing results with this notion of distance, called the “relative error” setting. This paper considers the property of being a (small) DNF. Learning an $s$-term DNF requires $n^{O(\log(s/\varepsilon))}$ queries. This paper shows that the (relative error) testing of $s$-term DNFs can be done in $poly(s/\varepsilon)$ queries.

A short note on (distribution) testing lower bounds via polynomials by Clément Canonne (ECCC). As the title says, this is a short note on a fundamental question of proving lower bounds for distribution testing. For your favorite distribution testing problem, to prove a lower bound, you start with a “Yes” distribution $\mathcal{Y}$ and a “No” distribution $\mathcal{N}$. So $\mathcal{Y}$ would satisfy the desired property (say, uniformity) and $\mathcal{N}$ would not. Then, you need to argue some sort of “statistical indistinguishability” of samples. So the distribution of the set $s$ samples from $\mathcal{Y}$ is almost the same as that from $\mathcal{N}$. How does one prove the latter? A convenient lemma shows that if the first $\ell$ moments of the distributions are the same, then we get a $\Omega(k^{1-1/\ell})$ sample lower bound ($k$ is the support size). The key insight is that such “matching moment” distributions/random variables can be generated by looking at specific univariate polynomials. It’s a really clever trick that looks at the powers of roots scaled by the derivative at those points. A really nice read overall!

Mathematical and computational perspectives on the Boolean and binary rank and their relation to the real rank by Michal Parnas (arXiv). This is long survey on the notion of Boolean and binary rank, and an overview of their use in TCS as well as methods to bound this rank. Section 7.3 gives an excellent discussion of property testing problems on Boolean rank. It also gives some of the main open questions regarding property testing low Boolean rank.

News for December 2025

Clement Canonne — Tue, 06 Jan 2026 02:11:34 +0000

Happy new year, everyone! Last year ended with a bang, featuring no fewer than 11 papers on property testing! As usual, if we forgot any, please let us know in the comments.

We’ll start with graphs, then move to locally testable and decodable codes, turn to quantum computing, make a stop around neural networks, and finally head towards probability distribution testing. Just as 2025 was, it’s a journey!

Improved Bounds with a Simple Algorithm for Edge Estimation for Graphs of Unknown Size, by Debarshi Chanda (arXiv). (Technically from November, but which we missed at the time). Given access to an unknown undirected graph, efficiently estimating its number of edges $m$ is arguably a very fundamental (and well-studied) question. Recent work has provided sublinear-query algorithms which only require access to “degree”, “neighbour”, and “random edge” queries. However, previous algorithms did require a priori knowledge of some of the graph parameters — such as its number of vertices $n$. This paper’s main contribution is an algorithm which only makes “degree” and “random edge” queries, does not require knowledge of any of the graph parameters, and outputs an $1\pm \varepsilon$ multiplicative estimate of its number of edges with an expected $\tilde{O}(\alpha n/(\varepsilon^2 m)$ queries, where $\alpha$ is the (unknown) arboricity of the graph. Further, the author shows that allowing a wider range of query types cannot lead to significantly more efficient algorithms.

Optimal non-adaptive algorithm for edge estimation, by Arijit Bishnu, Debarshi Chanda, Buddha Dev Das, Arijit Ghosh, and Gopinath Mishra (arXiv). Very related to the previous paper, this one does assume knowledge of the number of vertices $n$, and parameterizes the complexity of the algorithm as a function of $n,m$ only (bot the arboricity $\alpha$. The key focus and difference, however, is that the algorithm is non-adaptive, in contrast to all previous algorithms for the task, achieving an expected $\tilde{O}(\sqrt{n}/\varepsilon^{2.5})$ query complexity with only “degree” and “random edge” queries (to be compared with the adaptive query complexity in this query model, recently shown to be $\tilde{O}(n^{1/3})$ by Beretta, Chakrabarty, and Seshadhri).

Graph Limits via Quotients, by Eitan Levin and Venkat Chandrasekaran (arXiv). To quote the abstract, this paper introduces “a new notion of limits of weighted directed graphs of growing size based on convergence of their random quotients.” These limits, that the authors name grapheurs, are meant to better capture global structures and “hub” behavior, in contrast to other type of graph limits such as graphons. Along with defining and analyzing their new notion, the authors derive an “edge-based analogue” of Szemerédi’s regularity lemma for grapheurs, and leverage it in Section 4 to obtain an edge-based sampling property tester for hubs (Example 4.11).

Good Locally Testable Codes with Small Alphabet and Small Query Size, by Uriya First and Stav Lazarovici (arXiv). A good code is an error-correcting code with both (relative) distance and rate $\Omega(1)$. A $q$-query locally testable code (LTC) is a code which admits a tester (here assumed to be non-adaptive) making $q$ queries to strings to test whether they are valid codewords. The existence of good LTCs was established independently in 2022 (for large values of $q$) by inur–Evra–Livne–Lubotzky–Mozes and Panteleev–Kalachev; First and Kaufman (2024) then showed good 2-query LTCs (but on large alphabets). Several impossibility results, under conditions on $q$ and alphabet size, were known, but a characterization was unclear: untile now, as this paper settles which values of $q$ and alphabet size admit good LTCs.

3-Query RLDCs are Strictly Stronger than 3-Query LDCs, by Tom Gur, Dor Minzer, Guy Weissenberg, and Kai Zhe Zheng (arXiv). A $q$-query locally decodable code (LDC) is an error-correcting code such that any given bit of a message can be decoded by only querying $q$ bits of the (possibly corrupted) codeword. A relaxed LDC (RLDC) is nearly the same, except for the fact that the decoder can give up (output $\bot$) on a small fraction of corrupted codewords (i.e., it only needs to be able to decode most of the bits of most of the slightly-corrupted codewords). While the existence of 2-query LDCs with rate $o(1/n^3)$ had recently been shown to be impossible, this paper showed that 2-query RLDCs with rate $\tilde{\Omega}(1/n^2)$ do exist, showing a strict separation between relaxed and usual (stressed out?) LDCs.

A List of Complexity Bounds for Property Testing by Quantum Sample-to-Query Lifting, by
Kean Chen, Qisheng Wang, and Zhicheng Zhang (arXiv). The quantum sample-to-query lifting framework, introduced by Wang and Zhang in 2025 and strengthened by Tang, Wright, and Zhandry the same year, enables one to (as the name suggests) lift sample complexity lower bounds to query complexity, and as a result to, for instance, prove lower bound for property testing in the quantum setting. This paper applies this framework to a wide range of tasks, obtaining improved bounds (or re-deriving new bounds) for what can only be called a pretty big number of quantum testing problems, both for classical properties and quantum ones.

A random purification channel for arbitrary symmetries with applications to fermions and bosons, by Michael Walter and Freek Witteveen (arXiv). In this paper, the authors introduce a generalization of the random purification channel (whereby an arbitrary mixed state is (randomly) cast as a pure state in a larger space, often simplifying the task at hand) to general symmetry groups or algebras. While this is a little over my head, an application of the above is in learning (tomography) and testing quantum states: namely, the paper provides improved upper bound for testing whether an arbitrary pure state is a whether a pure state is a fermionic Gaussian state (Corollary 1.6).

Optimal certification of constant-local Hamiltonians, by Junseo Lee and Myeongjin Shin (arXiv). Last May, we covered two papers on Hamiltonian testing, one about tolerant certification (the quantum analogue of classical (distribution) identity testing), and the other about locality testing (i.e., whether a Hamiltonian can be decomposed as the sum of simple operators). In this paper, the authors consider the (non-tolerant) version of certification, under the assumption that the Hamiltonian is local: and obtain optimal query complexity for the case where it is $O(1)$-local. Now, the twist, and in contrast to the first paper from May mentioned above, here the algorithm is only allowed to query the the evolution operator $e^{-iH t}$ of the Hamiltonian $H$ in the forward time direction (no “rewinding”!), a more stringent query access model.

Going back to classical computing, the testing continues!

Property Testing of Computational Networks, by Artur Czumaj and Christian Sohler (arXiv). This paper initiates the study of computational networks (that is, neural networks) from the testing point of view. While one could think this is just standard property testing—after all, a neural network computes a function, and testing functions has been considered before! the authors make the case that the implementation of the function as a network is the key aspect here. The framework they define and introduce for this (with, as leading example, the case of ReLU networks) thus focuses on the device computing the function, and the notion of distance used is very much reliant on this: that is, the measure of distance is the fraction $\varepsilon$ of weights of the networks that need to be modified in order to make the function it computes $\delta$-close to the property.

A Distribution Testing Approach to Clustering Distributions, by Gunjan Kumar, Yash Pote, and Jonathan Scarlett (arXiv). Here, the authors consider an (arguable very natural) task: given $k$ discrete distributions available through sample access, and a parameter $\varepsilon$, how to recover a hidden partition in two sets such that (1) every distribution within a given set are identical, and (2) the distributions of the two sets are $\varepsilon$-far in total variation distance? While previous work on this focuses on asymptotic guarantees, this paper builds on techniques from distribution testing to obtain finite-sample upper and nearly-matching lower bounds, considering both the setting where one of the two distributions is known, and that where both distributions are unknown. The paper concludes with three open problems and future directions, so… worth having a look!

Instance Dependent Testing of Samplers using Interval Conditioning, by Rishiraj Bhattacharyya, Sourav Chakraborty, Yash Pote, Uddalok Sarkar, and Sayantan Sen (arXiv). To conclude this month’s post, a distribution testing paper focusing on infinite domains, and instance-optimal guarantees. Motivating their work from the viewpoint of verifying GenAI/probabilistic AI output distributions, the authors consider the question of identity testing over discrete but infinite domains (such as $\mathbb{Z}$), parameterized by a relevant “smoothness” parameter, the “tilt”, involving both the known and unknown distributions. To enable more efficient testing, the algorithms are allowed a stronger type of sampling that i.i.d. sampling, namely, the interval conditional sampling access of Canonne-Ron-Servedio (2015), which lets one sample from a distribution conditioned on any interval of one’s choosing. Finally, to complement their theoretical analysis, the authors provide an empirical evaluation of their algorithm.

Edit: two more papers, that we had missed from last month!

Unbounded-width CSPs are Untestable in a Sublinear Number of Queries, by Yumou Fei (arXiv). This paper shows strong lower bounds in the bounded-degree query model, for graph testing: namely, that testing satisfiability requires $\Omega(n)$ queries for the entire class of unbounded-width CSPs. Put differently, all CSPs known to be NP-hard are also maximally hard to test in the bounded-degree model!

A Dichotomy Theorem for Multi-Pass Streaming CSPs, by Yumou Fei, Dor Minzer and Shuo Wang (arXiv). In this paper, the authors show a dichotomy theorem on the space needed by multipass streaming algorithms to approximate CSPs. While focusing on streaming algorithms, this paper discusses connections to property testing (Section 1.2.2), specifically to the bounded-degree model for graph testing.

News for November 2025

Akash — Mon, 15 Dec 2025 15:40:26 +0000

Last month the community had a lot to say as a result of which we have a rare treat featuring eighteen papers. Results ranging from (almost) tight lower bounds for testing boolean monotonicity to low rank approximation of Hankel Matrices, to compressed sensing, to PCPs, to results about testing and learning convex functions, a trio of papers with Oded Goldreich as (co)-author, there are many many many more papers covered in this month’s potpourri. Featured in this spectacular collection of papers, we have a rare treat which seeks to understand in this age of LLMs, how Mathematical Explorations and Discoveries work at scale. Without further ado, let us examine our spread.

Boolean function monotonicity testing requires (almost) n^1/2 queries by Mark Chen, Xi Chen, Hao Cui, William Pires, Jonah Stockwell (arXiv) The problem of testing boolean monotonicity over the hypercube, $\{0,1\}^n$, is no stranger to regular PTReview readers. Finally, we have a result which puts this venerable to rest by proving an adaptive two-sided lower bound of $\Omega(N^{1/2-c})$ queries for testing boolean monotonicity over the boolean hypercube. The authors introduce what they call are multilevel Talagrand Functions. Before I detail this paper a little more, let me just say that right off the bat, the paper also uses these functions to obtain a nearly linear in $n$ two-sided lower bound for testing unateness. Alright, let me say a little about multilevel Talagrand Functions. Our starting point comes from one story we covered in News for November 2015 where we covered a result by Belovs-Blais where they proved a polynomial in $n$ two-sided lowerbound for the same problem. As mentioned in that post, the key insight of Belovs and Blais was to work with Talagrand’s random DNF. Perturbations to this DNF are sufficiently non-monotone and were shown to require $\Omega(n^{1/4})$ queries to test. Our News for February 2017 post covered the paper by Chen-Waingarten-Xie which showed that one could adaptively test the Belovs-Blais instance with a budget of $O(n^{1/4})$ queries and then went on to present an improved lower bound of $\Omega(n^{1/3})$ queries by defining what they called two-level Talagrand functions. The featured paper presents an intricate way to define the multilevel extensions of these Talagrand functions which are then used to squeeze out an almost $\Omega(\sqrt n)$ lower bound.

Low-soundness direct-product testers and PCPs from Kaufman–Oppenheim complexes by Ryan O’Donnell and Noah Singer (arXiv) This paper shows an agreement test for direct product testing with two provers in the low soundness regime. Such results were previously known using high dimensional complexes, constructed using deep facts form number theory. This paper proves that a known simpler and strongly explicit complex (KO complex) can be used instead to design the test. Although the methods are elementary, the proof is long and somewhat involved. The featured paper shows a new expansion property of KO complex in the required regime for building the test with low soundness. Additionally, this paper also confirms that their complex can be used to build an alternate efficient PCP with only $poly(\log n)$ blow up. Since the methods are elementary, it is more accessible. And, perhaps, we will have easier time to optimize the parameters to build more efficient PCPs.

A sufficient condition for characterizing the one-sided testable properties of families of graphs in the Random Neighbour Oracle Model by Christine Awofeso, Patrick Greaves, Oded Lachish, Amit Levi, Felix Reidl (arXiv) The story of this paper begins from a 2019 paper by Czumaj and Sohler which we covered in our September 2019 news. In that paper, Czumaj-Sohler gave algorithms efficient property testers for testing $H$-freeness when the input graph is known to be $K_r$-minor-free in the random neighbor query model. The algorithm presented in that result recursively explores a few random vertices of the current vertex (in particular, you can think of the algorithm as a truncated BFS with bounded recursion depth). The featured paper considers graph families parameterized by a certain parameter (in particular, it considers the class of graphs with universally bounded “$r$-admissibility”) . The key result of the paper is a theorem which asserts that restricted to the class of graphs with bounded $r$-admissibility, for every fixed finite graph $H$, $H$-freeness is testable with one-sided error (assuming access to a random neighbor oracle).

Testing H-freeness on sparse graphs, the case of bounded expansion by Samuel Humeau, Mamadou Moustapha Kanté, Daniel Mock, Timothé Picavet, Alexandre Vigny (arXiv) This paper continues the thread picked up by the former result. The algorithm presented in Czumaj-Sohler result recursively explores a few random vertices of the current vertex (in particular, you can think of the algorithm as a truncated BFS with bounded recursion depth). The proof presented in Czumaj-Sohler was fairly intricate. The featured paper uses ideas rooted in sparsity literature and shows how these ideas can be used to efficiently test $H$-freeness even when the input graph is only promised to come from a polynomially bounded-expansion graph class.

Sublinear Time Low-Rank Approximation of Hankel Matrices by Michael Kapralov, Cameron Musco, Kshiteej Sheth (arXiv) Our November 2022 News featured a paper which obtained low-rank approximation of Toeplitz matrices (which remarkably, were still Toeplitz!) in sublinear time. In this paper, a subset of authors carries the exploration forward and considers the following task: Suppose you are given as input a matrix obtained by noising a PSD Hankel matrix of order $n$ and a parameter $\varepsilon > 0$. The feature paper presents algorithms which run in time $poly(\log n, 1/\varepsilon)$ and returns a bonafide Hankel matrix $\widehat{H}$ with $rank(\widehat{H}) \approx poly(\log n)$ which is fairly close to the unknown Hankel Matrix which we started with (which was noised to produce the original input).

Support Recovery in One-bit Compressed Sensing with Near-Optimal Measurements and Sublinear Time by Xiaxin Li, Arya Mazumdar (arXiv) Let us begin by reviewing the setup for the One-Bit Compressed Sensing Problem. You want to recover an unknown signal which is modeled as a vector $\boldsymbol{x} \in \mathbb{R}^n$ which we further assume is $k$-sparse. To help you out, I give you a Sign of Dot-Product machine which on input your favorite vector $\boldsymbol{a} \in \mathbb{R}^n$ tells you $sign(\boldsymbol{a}^T\boldsymbol{x})$. So, in all you can look at the quantity $\boldsymbol{y} = sign(\boldsymbol{A}\boldsymbol{x})$ where $\boldsymbol{A} \in \mathbb{R}^{m \times n}$. I want to keep $m$ as small as possible. Another related goal seeks to recover just the support of the unknown $k$-sparse signal. The featured paper presents efficient algorithms that return a sparse vector $\widehat{\boldsymbol{x}}$ whose support is a great proxy for the support of our unknown signal (in the sense they have very small symmetric difference).

Homomorphism Testing with Resilience to Online Manipulations by Esty Kelman, Uri Meir, Debanuj Nayak, Sofya Raskhodnikova (arXiv) Let us consider the task of testing properties of functions $f \colon G \to H$ where $(G, +)$ and $(H, \oplus)$ are groups. A lot of classic results in this area (like the linearity test of BLR) assume that we have access to a reliable oracle which when given any $x \in G$ returns $f(x) \in H$. This assumption is no longer valid if the oracle is unreliable or worse adversarial. The featured paper considers the situation where the oracle is adversarial and can in fact manipulate the responses (like replacing $f(x)$ by some value $y \in H \cup \{\bot\}$). Moreover, the power of adversary is parameterized by a number $t$. One of the results in this paper shows the following: Take $f \colon G \to H$ and fix some $\varepsilon > 0$. Then one can test whether $f$ is a group homomorphism where the adversary can manipulate $t$ values after every query is answered using a budget of $O(1/\varepsilon + \log_2t)$ queries.

Computational Complexity in Property Testing by Renato Ferreira Pinto Jr., Diptaksho Palit, Sofya Raskhodnikova (arXiv) This paper studies the tradeoff between the query complexity and the time complexity aspects of property testing tasks. Most of the research works in property testing have focused on obtaining testers with better and better query bounds. The present work explores the query/time interplay in property testing and develops tools to prove computational hardness of property testers. The paper proves an unconditional and a conditional hierarchy theorem. The first theorem, for instance, asserts that there are property testing problems which witness a strict separation between the query complexity of the task and the running time required to solve the task. The second theorem asserts that assuming SETH, you can separation between the query bounds and the running time bounds can be improved.

Interactive proof systems for FARNESS by Oded Goldreich Tal Herman Guy N. Rothblum (ECCC) This paper continues a line of work on interactive proofs of proximity for distribution testing problems, building on earlier work of Chiesa and Gur (see our October 2017 news). The central question here is whether interaction can reduce the sample complexity barriers that appear unavoidable in standard (non-interactive) distribution testing. The paper answers this in the affirmative for natural promise problems such as deciding whether two distributions are identical (the NO case in the interactive setup) or $\varepsilon$-far (the YES case in the interactive setup), as well as testing whether a distribution is far from uniform (the far case being the YES case in the interactive setup yet again). The authors construct explicit interactive proof systems in which the verifier uses asymptotically fewer samples than required by the best testers, while simultaneously the honest prover uses asymptotically fewer samples than would be needed to learn the distribution. The protocols rely on collision-based ideas and admit clean tradeoffs between the verifier’s and prover’s sample complexities. Conceptually, the result shows that interaction can beat both testing and learning bounds at the same time for concrete, well-studied distribution testing problems.

On doubly-sublinear interactive proofs for distributions by Oded Goldreich Guy N. Rothblum (ECCC) As I understand it, this paper takes a step back from specific testing problems and asks how general the phenomenon exhibited above really is. So let’s try to place the context for this paper with the preceding paper in mind. Motivated with the interactive protocols presented for FARNESS, the authors introduce the notion of doubly-sublinear interactive proofs of proximity (ds-IPPs), where the verifier’s sample complexity is asymptotically smaller than testing complexity and the honest prover’s sample complexity is asymptotically smaller than learning complexity. The main contribution is conceptual rather than algorithmic: the paper shows that ds-IPPs do exist, by constructing distribution properties (algebraically defined and somewhat artificial) that admit such protocols. Together with the above paper, this work shows that interaction changes the game in the sense that the situation with sample complexity looks different with interaction than without.

Proving the PCP Theorem with 1.5 proof compositions (or yet another PCP construction) by Oded Goldreich (ECCC) This paper revisits the classic PCP Theorem and offers a new PCP construction of constant query complexity and sublinear randomness complexity $n^{\alpha}$ for any constant $\alpha > 0$. The traditional Arora–Lund–Motwani–Sudan–Szegedy proof uses two generic proof compositions (Reed–Muller with itself, then with a Hadamard-based PCP). Inspired from a recent work of Amireddy et al. which constructs a PCP (without composition) with constant query complexity and randomness complexity $n^{\alpha}$, this work approaches the same challenge. It combines a Reed-Muller based PCP with Hadamard encoding in a novel way. By working with a constant number of dimensions and a large alphabet, then encoding field elements via Hadamard codes and emulating key tests (e.g., low-degree and sum-check), the paper achieves a PCP system that can serve as an inner verifier without full generic composition. In all, this paper presents another route to the PCP Theorem with constant queries and $n^{\alpha}$ randomness.

Interactive Proofs For Distribution Testing With Conditional Oracles by Ari Biswas, Mark Bun, (Our Own) Clement Canonne, Satchit Sivakumar (arXiv) Another paper whose roots lie in the seminal work of Chiesa and Gur. This paper poses the following twist in distribution testing (on top of the Chiesa-Gur twist):

Can label-invariant properties be verified in a (query, sample and communication)-efficient way when the tester has access to a PCOND oracle.

The paper proves that there exist label invariant properties of distributions sample complexity of testing which is not allayed with the PCOND oracle (that is, it remains the same as the sample complexity of the task without one). However, when interaction is allowed a different picture emerges. In particular, the paper presents a round protocol which can be used to interactively test a label-invariant distribution property (over a universe of size $N$) in the PCOND model with a query complexity that grows as $poly(\log N)$.

Verification of Statistical Properties: Redefining the Possible by (Again, our own) Clement Canonne, Sam Polgar, Aditya Singh, Aravind Thyagarajan, Qiping Yang (ECCC) Let us again start our story from the Chiesa-Gur paper. So, we have an unreliable prover and a verifier where the verifier is interested in testing some label-invariant properties of a distribution and is allowed to chit-chat with the prover. One problematic aspect which prevents this construct from being applicable to some practical scenarios is that there are cursed properties (eg, things as basic as UNIFORMITY testing), testing which requires the verifier to hold $\Omega(\sqrt n)$ items where $n$ denotes the support size of the distribution whose properties we want to test. Undeterred by this news, the featured paper contemplates how to evade this curse. The eye-opening realization is to consider these tasks with the roles of completeness and soundness flipped — eg akin to the “Interactive proofs of farness” paper above, you think your goal is like testing FARNESS from UNIFORMITY. The paper shows this task indeed admits a protocol with sample complexity being $O(1)$. The paper also considers the multiplayer and the zero-knowledge variations of these tasks and looks like a nice conceptual read for the PTReview readers.

Learning and Testing Convex Functions by Renato Ferreira Pinto Jr., Cassandra Marcussen, Elchanan Mossel, Shivam Nadimpalli (arXiv) The paper studies functions $f \colon \mathbb{R}^d \to \mathbb{R}$ under Gaussian measure and formalizes convexity testing as distinguishing convex functions from those that are $\varepsilon$-far in $L^2$ equipped with the Gaussian Measure. A central theorem shows that if $f$ is Lipschitz (or satisfies comparable regularity), then convexity can be tested with sample complexity $n^{poly(1/\varepsilon^2)}$. Conversely, the paper supplements these results with a sample complexity lower bound in a natural query model.

Halfspaces Are Hard to Test with Relative Error by Xi Chen, Anindya De, Yizhi Huang, Shivam Nadimpalli, Rocco A. Servedio, Tianqi Yang (arXiv) This paper considers property testing of Boolean halfspaces in the relative-error model (which we covered earlier, eg see here), where distance between a pair of functions is defined as the fractional symmetric difference between the supports of the two functions. We assume that algorithms have access to a uniformly random element in the support of the input function. Alright, with these details behind us, let us state the main result of this paper. It asserts that for some constant $\varepsilon_0 > 0$, any relative-error $\varepsilon_0$-tester for halfspaces over $\{0,1\}^n$ must take $\Omega(\log n)$ samples, showing a super-constant lower bound in this model. This result sharply contrasts with classic absolute-error testing results, where halfspaces admit constant-query testers, and thus demonstrates that changing the relative-error model differs significantly from the classical model.

Testing Noisy Low-Degree Polynomials for Sparsity by Yiqiao Bao, Anindya De, Shivam Nadimpalli, Rocco A. Servedio, Nathan White (arXiv) Let us start from the setup considered in the Chen-De-Servedio paper we covered in our November 2019 post. In their setup, you are given samples from a noisy linear model $\boldsymbol{y} = \mathbf{w \cdot x} + \texttt{Noise}$. On the other hand, this paper considers a generalized situation where you are given $\boldsymbol{y} = \mathbf{p(x)}+\texttt{Noise}$. You are interested in knowing whether $\mathbf{p}$ is $s$-sparse or far-from $s$-sparse. The main result gives explicit conditions under which sparsity can be tested with a constant number of samples, depending on the degree, noise rate, and sparsity parameter $s$. Outside this regime, the authors prove logarithmic lower bounds, showing that once noise overwhelms certain structural signals, sparsity testing becomes nearly as hard as learning the polynomial itself.

Efficient Testing Implies Structured Symmetry by Cynthia Dwork, Pranay Tankala (arXiv) This paper addresses a structural question in property testing: which Boolean properties admit testers with small sample complexity and efficient computation? The paper shows that if a property $P \subseteq \{0,1\}^n$ is testable using (i) few samples and (ii) a circuit of small size, then $P$ must be close to a property $\mathcal{Q}$ which enjoy some kind of stuctured symmetry. In other words, efficient testability forces the property to exhibit a form of low-complexity symmetry. Conversely, the paper shows that properties with such structured symmetry admit efficient testers.

Mathematical exploration and discovery at scale by Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, Adam Zsolt Wagner (arXiv) This paper asks, fairly openly (see Section 1.2), what it might even mean for modern AI systems to participate in mathematical discovery. The authors place their work in the long lineage of computer-assisted mathematics, from early symbolic systems to contemporary automated reasoning, but are careful to frame large language models not as theorem-provers or replacements for mathematicians, but as exploratory engines: components inside a closed-loop pipeline where proposal, evaluation, and refinement happen repeatedly and at scale. Concretely, the paper studies a setup in which LLM-based code generation is coupled with automated evaluators, allowing the system to search large spaces of mathematical constructions and algorithms with minimal human intervention. Rather than optimizing for a single headline result, the authors emphasize breadth, reporting on 44 separate “adventures” spanning combinatorics, number theory, analysis, and even Olympiad-style geometry. In many cases, the system rediscovers known constructions or matches best-known bounds; in others, it produces plausible variants or refinements that look less like finished theorems and more like the kind of halfway-formed ideas one might jot down while exploring a problem for the first time. Quoting from the paper:

“One illustrative example comes from experiments related to the Erdős discrepancy problem. First, when the system was run with no human guidance, it found a sequence of length 200 before progress started to slow down. Next, when the prompt was augmented with the advice to try a function which is multiplicative, or approximately multiplicative, the system performed significantly better and found constructions of length 380 in the same amount of time. These constructions were still far from the optimal value of 1160, and the authors explicitly note that other hints (for example, suggesting the use of SAT solvers) might have improved the score further, but were not explored due to time limitations.”

The broader point of the paper is less about getting optimal bounds and more about scale and exploration. The results are empirical and depend on carefully chosen experimental settings, but the diversity of examples makes this an engaging and unusually readable account of what happens when machines are allowed to roam around mathematical landscapes with a long leash.

News for October 2025

Seshadhri — Mon, 03 Nov 2025 23:02:27 +0000

Another busy month in property testing. We have nine papers, covering a range of topics from distribution testing, pattern-freeness testing, and even LLMs! Let’s start the journey.

Proving Natural Distribution Properties is Harder than Testing Them by Tal Herman and Guy Rothblum (ECCC). This paper is on interactive proof systems for distribution testing. Suppose an untrusted prover claims to have discovered some property of a distribution $\mathcal{D}$. The verifier can sample from $\mathcal{D}$ and communicate with the prover, in the hope of verifying this property more efficiently that running a property tester directly. There is a rich line of work showing that many non-trivial properties can be verified much faster than testing directly. But in all these results, the honest prover learns the entire distribution and, thus has sample complexity $\Omega(N)$ (where $N$ is the domain size). This paper asks whether doubly-sublinear non-trivial protocols exists, wherein the honest prover has sample complexity $o(N)$ and the verifier is more efficient than directly testing. The main result is hardness: for many natural properties, if the honest prover has $o(N)$ sample complexity, then the verifier complexity is the same as the testing complexity. This says that proving is (provably!) more difficult than directly testing. The main technical work is in showing such a result for estimating collision statistics of the distribution. Collision statistics form the basis of most label-invariant distribution testing algorithms, leading to the various results in the paper.

Testing forbidden order-pattern properties on hypergrids by Harish Chandramouleeswaran, Ilan Newman, Tomer Pelleg, and Nithin Varma (arXiv). Monotonicity is arguably one of the most fundamental properties studied in property testing. It is a special case of “forbidden order” properties. Consider a function $f:[n]^d \to \mathbb{R}$, and permutation $\pi$ of $[k]$, for some constant $k$. The function is “$\pi$-free” if there do not exist $x_1 \prec x_2 \prec \ldots \prec x_k$ where $\pi$ is the permutation of the sorting of $f(x_1), f(x_2), \ldots, f(x_k)$. (We use $\prec$ for the standard coordinate-wise partial order.) So monotonicity is equivalent to $(2,1)$-freeness (or $(1,2)$-freeness depending on ascending/descending order), since it suffices to consider the order of pairs of domain points. Even over the line ($d=1$), forbidden order properties exhibit a rich theory. This paper initiates the study of forbidden order freeness for the $d=2$ case. For $\pi = (1,2,3)$-freeness, the paper gives a $poly(\log n)$ time property tester. But curiously, any one-sided tester for $\pi=(1,3,2)$-freeness requires $\Omega(\sqrt{n})$ queries. There is an interesting adaptivity gap: there is an adaptive $n^{4/5}$-query tester for any $\pi$ for $k=3$, but a corresponding non-adaptive (one-sided) tester requires $\Omega(n)$ queries.

Relative-error unateness testing by Xi Chen, Diptaksho Palit, Kabir Peshawaria, William Pires, Rocco A. Servedio, Yiding Zhang (arXiv). Another variant of monotonicity testing. A function $f:\{0,1\}^n \to \{0,1\}$ is unate if it is either non-decreasing or non-increasing along every dimension. (A monotone function must be non-decreasing in every dimension.) Interestingly, depending on the setting, unateness can be harder or have the same complexity as monotonicity testing. The paper investigates the recent “relative-error model”. In the standard setting, the distance between two functions $f, g$ is the (normalized) Hamming distance $\| f -g \|_0/2^n$. In the relative-error setting, we measure the distance as the symmetric difference between the “ones”, so it is $|f^{-1}(1) \Delta g^{-1}(1)|/|f^{-1}(1)|$. This distinction is analogous to dense vs sparse graph property testing. This paper shows that the non-adaptive complexity of unateness testing is (ignoring $\varepsilon$ factors) is $\widetilde{\Theta}(\log N)$, where $N = |f^{-1}(1)|$. There is also an adaptive $\Omega((\log N)^{2/3})$ bound. (Thanks for the post by M.C. on the correction. -Ed) The upper bound is the same as recently obtained for monotonicity in this model.

Uniformity Testing under User-Level Local Privacy by Clément L. Canonne, Abigail Gentle, Vikrant Singhal (arXiv). Let us revisit the standard uniformity testing problem with user-level privacy. There are $n$ users, each of whom holds $m$ samples of a distribution $\mathcal{D}$. The domain size is denoted as $k$. These users communicate with a central server, whose aim is to test $\mathcal{D}$ for uniformity (or equality, or whatever property you care for). This is a practically motivated problem where each user is a phone that does not want to share its private data, but the server wishes to perform some learning on all the data. If each user held one sample, the setting is called “local differential privacy”: we want the algorithm/tester output to be nearly identical if a single user changes their data. It is known that, for public coin protocols, such a tester requires $\Theta(k)$ samples in total (in contrast with the $\Theta(\sqrt{k})$ samples for the standard tester). In the paper’s setting, the protocol has the differentially private with respect to changes in any of the user’s data. This is much stronger, since an individual change is a much smaller fraction of the user’s data. If each user held more than $\sqrt{k}$ samples, then each user could simply run a distribution tester and communicate a single private bit with the server. The interesting case is when $m \ll \sqrt{k}$. This paper gives a natural tradeoff between $m$ and $n$, that generalizes the local DP guarantee. Basically, the paper shows that when $mn$ is at least $k$, then one can get uniformity testing with user-level privacy. Different results hold for private coin protocols, where local DP is harder.

Non-iid hypothesis testing: from classical to quantum by Giacomo De Palma, Marco Fanizza, Connor Mowry, Ryan O’Donnell (arXiv). This paper studies the “non-iid” distribution hypothesis/equality testing problem. Suppose there are $T$ distributions $p_1, p_2, \ldots, p_T$, over the domain $[d]$. The aim is to test if the average distribution $\sum_i p_i/T$ is a known distribution $q$. We are only allowed a few samples from each distribution. A recent result proved if we get just $2$ samples from each distribution, then equality testing is possible when $T \gg \sqrt{d}$ (ignoring $\varepsilon$ factors). On the other hand, this is not possible with just a single sample from each distribution. This paper studies the quantum analogue of this problem. But first, it actually improves the $\varepsilon$ dependence on the classical bound, matching the optimal $\sqrt{d}/\varepsilon^2$ bound. In the quantum setting, each “distribution” is a quantum state, which is represented as a $d \times d$ matrix. Samples of the state are basically projections/eigenvectors. It is known that $O(d)$ samples suffices to do test if a quantum state is “maximally mixed” (the equivalent of the identity distribution). This paper shows that, in the non-iid setting, even getting one sample from each state suffices for equality testing. This is in contrast with the classical setting, where $2$ samples are required per distribution.

Near-Optimal Property Testers for Pattern Matching by Ce Jin, Tomasz Kociumaka (arXiv). This is a result on sublinear algorithms for string matching. Consider a pattern string $P$ of length $m$ and a text $T$ of length $n$, where both are considered part of the input. Our aim is to determine if $P$ is a substring of $T$, or if $P$ has Hamming distance $k = \varepsilon m$ from every substring of $T$. Conventionally for this literature, the parameter $k$ (and not $\varepsilon$) is used. There is a simple sampling algorithm that makes $\widetilde{O}(\sqrt{nm/k} + n/k)$ queries, but has linear running time. The main question is to get an algorithm whose running time also matches this bound. Previous work gave a $\widetilde{O}((n^2m/k)^{1/3} + n/k)$ time procedure. The main result gives a non-adaptive algorithm whose running time matches the $\widetilde{O}(\sqrt{nm/k} + n/k)$ query bound. Curiously, one can get an adaptive algorithm that improves the dependence on $n$ to $n-m$, so it is faster when the pattern and text are roughly of the same length. These upper bounds are matched with lower bounds, so it proves an adaptivity gap for this regime. The paper gets the complete picture of the time complexity for all ranges of $n-m$.

Computational Complexity in Property Testing by Renato Ferreira Pinto Jr., Diptaksho Palit, Sofya Raskhodnikova (arXiv). A generalization of the spirit of the previous paper. The paper asks the fundamental question how query complexity and time complexity are related for property testing. A property tester is a $(q(n), t(n))$-tester if it makes $q(n)$ queries and has running time $t(n)$. The paper provides a precise formalization of running time in the RAM model for a sublinear algorithm. The main result is a hierarchy theorem akin to the classic time hierarchy theorem. Assuming SETH, for any (reasonable) $(q(n), t(n))$, there is a property with a $(q(n), t(n))$-tester that has no $(q'(n), t'(n))$-tester, where $q'(n) \ll q(n)$ and $t'(n) \ll t(n)$. (There is an unconditional hierarchy theorem, with much stronger conditions on $t'(n)$.) The gaps between query complexity and running time are explored for the classic problem of halfspace testing. For the distribution-free distance approximation problem, there are $O(d/\varepsilon^2)$-query testers, but they run in time exponential in $d$. Assuming the fine-grained hardness of $k$-SUM, this paper proves a lower bound showing that the running time of any property tester must have such an exponential dependence.

Sublinear Algorithms for Estimating Single-Linkage Clustering Costs by Pan Peng, Christian Sohler, Yi Xu (arXiv). Single-linkage clustering (SLC) is a common practical algorithm. The input is a weighted graph, where the weight denotes the distance between the objects/vertices. Given a collection of clusters, the distance between two clusters is the distance between the closest pair of vertices. SLC just repeatedly merges the two clusters that are closest, to get a hierarchical clustering. This paper investigates SLC from the sublinear lens. One can define an objective function corresponding to SLC; for a given clustering, the cost is the sum (over clusters) of the minimum spanning trees of each cluster. For any $k$, one can consider the optimal clustering with $k$ clusters. The aim is to approximate these costs. The “SLC profile vector” is the list of costs over all $k$. The main result gives a sublinear time algorithm that approximates this profile vector. The running time is $O(\sqrt{W} d)$ (ignoring $poly(\varepsilon^{-1} \log n)$ factors), where $W$ is the maximum weight and $d$ is the maximum degree of the graph. Naturally, these results are obtained by building on the classic sublinear time MST approximation algorithm. The algorithms in the paper are quite interesting, and one wonders if there is a potential to implement them.

Prior Makes It Possible: From Sublinear Graph Algorithms to LLM Test-Time Methods by Avrim Blum, Daniel Hsu, Cyrus Rashtchian, Donya Saless (arXiv). This paper is a fascinating take on Retrieval Augmented Generation (RAG) on Large Language Models (LLMs). While existing LLMs can surprise us with their synthesis abilities, they often suffer in learning new context or factual responses. RAGs are a technique used wherein the LLM is given more context through an existing, curated knowledge base to improve its accuracy. The paper models this problem in terms of a knowledge graph. Suppose there is an unknown ground truth graph $G^*$, which one can think of as a representation of all “facts” in an area. Most question/answer or factual retrievals can be thought as a paths in this graph. (Factual data is sometimes store in graph format, where edge represent relationships between entities.) So our aim is to find a short (even constant length) $s$-$t$ path in $G^*$. There is a subgraph $G \subseteq G^*$ that is known as “prior knowledge”. (One could think of this as what the LLM has been trained on.) There is a retrieval mechanism that generates edges from $G^*$; in property testing language, this is exactly the query model, and provides to connection to sublinear graph algorithms. With a few queries to the retrieval mechanism, we wish to find the $s$-$t$ path. The main result shows that the prior graph $G$ has to be quite dense to have constant query path algorithms. Otherwise, there is a lower bound of $\sqrt{n}$ queries, where $n$ is the number of vertices in $G^*$. There are many related results on hallucinations, where spurious edges may be in the prior.