News for November 2025

Last month the community had a lot to say as a result of which we have a rare treat featuring eighteen papers. Results ranging from (almost) tight lower bounds for testing boolean monotonicity to low rank approximation of Hankel Matrices, to compressed sensing, to PCPs, to results about testing and learning convex functions, a trio of papers with Oded Goldreich as (co)-author, there are many many many more papers covered in this month’s potpourri. Featured in this spectacular collection of papers, we have a rare treat which seeks to understand in this age of LLMs, how Mathematical Explorations and Discoveries work at scale. Without further ado, let us examine our spread.

Boolean function monotonicity testing requires (almost) n^1/2 queries by Mark Chen, Xi Chen, Hao Cui, William Pires, Jonah Stockwell (arXiv) The problem of testing boolean monotonicity over the hypercube, \(\{0,1\}^n\), is no stranger to regular PTReview readers. Finally, we have a result which puts this venerable to rest by proving an adaptive two-sided lower bound of \(\Omega(N^{1/2-c})\) queries for testing boolean monotonicity over the boolean hypercube. The authors introduce what they call are multilevel Talagrand Functions. Before I detail this paper a little more, let me just say that right off the bat, the paper also uses these functions to obtain a nearly linear in \(n\) two-sided lower bound for testing unateness. Alright, let me say a little about multilevel Talagrand Functions. Our starting point comes from one story we covered in News for November 2015 where we covered a result by Belovs-Blais where they proved a polynomial in \(n\) two-sided lowerbound for the same problem. As mentioned in that post, the key insight of Belovs and Blais was to work with Talagrand’s random DNF. Perturbations to this DNF are sufficiently non-monotone and were shown to require \(\Omega(n^{1/4})\) queries to test. Our News for February 2017 post covered the paper by Chen-Waingarten-Xie which showed that one could adaptively test the Belovs-Blais instance with a budget of \(O(n^{1/4})\) queries and then went on to present an improved lower bound of \(\Omega(n^{1/3})\) queries by defining what they called two-level Talagrand functions. The featured paper presents an intricate way to define the multilevel extensions of these Talagrand functions which are then used to squeeze out an almost \(\Omega(\sqrt n)\) lower bound.

Low-soundness direct-product testers and PCPs from Kaufman–Oppenheim complexes by Ryan O’Donnell and Noah Singer (arXiv) This paper shows an agreement test for direct product testing with two provers in the low soundness regime. Such results were previously known using high dimensional complexes, constructed using deep facts form number theory. This paper proves that a known simpler and strongly explicit complex (KO complex) can be used instead to design the test. Although the methods are elementary, the proof is long and somewhat involved. The featured paper shows a new expansion property of KO complex in the required regime for building the test with low soundness. Additionally, this paper also confirms that their complex can be used to build an alternate efficient PCP with only \(poly(\log n)\) blow up. Since the methods are elementary, it is more accessible. And, perhaps, we will have easier time to optimize the parameters to build more efficient PCPs.

A sufficient condition for characterizing the one-sided testable properties of families of graphs in the Random Neighbour Oracle Model by Christine Awofeso, Patrick Greaves, Oded Lachish, Amit Levi, Felix Reidl (arXiv) The story of this paper begins from a 2019 paper by Czumaj and Sohler which we covered in our September 2019 news. In that paper, Czumaj-Sohler gave algorithms efficient property testers for testing \(H\)-freeness when the input graph is known to be \(K_r\)-minor-free in the random neighbor query model. The algorithm presented in that result recursively explores a few random vertices of the current vertex (in particular, you can think of the algorithm as a truncated BFS with bounded recursion depth). The featured paper considers graph families parameterized by a certain parameter (in particular, it considers the class of graphs with universally bounded “\(r\)-admissibility”) . The key result of the paper is a theorem which asserts that restricted to the class of graphs with bounded \(r\)-admissibility, for every fixed finite graph \(H\), \(H\)-freeness is testable with one-sided error (assuming access to a random neighbor oracle).

Testing H-freeness on sparse graphs, the case of bounded expansion by Samuel Humeau, Mamadou Moustapha Kanté, Daniel Mock, Timothé Picavet, Alexandre Vigny (arXiv) This paper continues the thread picked up by the former result. The algorithm presented in Czumaj-Sohler result recursively explores a few random vertices of the current vertex (in particular, you can think of the algorithm as a truncated BFS with bounded recursion depth). The proof presented in Czumaj-Sohler was fairly intricate. The featured paper uses ideas rooted in sparsity literature and shows how these ideas can be used to efficiently test \(H\)-freeness even when the input graph is only promised to come from a polynomially bounded-expansion graph class.

Sublinear Time Low-Rank Approximation of Hankel Matrices by Michael Kapralov, Cameron Musco, Kshiteej Sheth (arXiv) Our November 2022 News featured a paper which obtained low-rank approximation of Toeplitz matrices (which remarkably, were still Toeplitz!) in sublinear time. In this paper, a subset of authors carries the exploration forward and considers the following task: Suppose you are given as input a matrix obtained by noising a PSD Hankel matrix of order \(n\) and a parameter \(\varepsilon > 0\). The feature paper presents algorithms which run in time \(poly(\log n, 1/\varepsilon)\) and returns a bonafide Hankel matrix \(\widehat{H}\) with \(rank(\widehat{H}) \approx poly(\log n)\) which is fairly close to the unknown Hankel Matrix which we started with (which was noised to produce the original input).

Support Recovery in One-bit Compressed Sensing with Near-Optimal Measurements and Sublinear Time by Xiaxin Li, Arya Mazumdar (arXiv) Let us begin by reviewing the setup for the One-Bit Compressed Sensing Problem. You want to recover an unknown signal which is modeled as a vector \(\boldsymbol{x} \in \mathbb{R}^n\) which we further assume is \(k\)-sparse. To help you out, I give you a Sign of Dot-Product machine which on input your favorite vector \(\boldsymbol{a} \in \mathbb{R}^n\) tells you \(sign(\boldsymbol{a}^T\boldsymbol{x})\). So, in all you can look at the quantity \(\boldsymbol{y} = sign(\boldsymbol{A}\boldsymbol{x})\) where \(\boldsymbol{A} \in \mathbb{R}^{m \times n}\). I want to keep \(m\) as small as possible. Another related goal seeks to recover just the support of the unknown \(k\)-sparse signal. The featured paper presents efficient algorithms that return a sparse vector \(\widehat{\boldsymbol{x}}\) whose support is a great proxy for the support of our unknown signal (in the sense they have very small symmetric difference).

Homomorphism Testing with Resilience to Online Manipulations by Esty Kelman, Uri Meir, Debanuj Nayak, Sofya Raskhodnikova (arXiv) Let us consider the task of testing properties of functions \(f \colon G \to H\) where \((G, +)\) and \((H, \oplus)\) are groups. A lot of classic results in this area (like the linearity test of BLR) assume that we have access to a reliable oracle which when given any \(x \in G\) returns \(f(x) \in H\). This assumption is no longer valid if the oracle is unreliable or worse adversarial. The featured paper considers the situation where the oracle is adversarial and can in fact manipulate the responses (like replacing \(f(x)\) by some value \(y \in H \cup \{\bot\}\)). Moreover, the power of adversary is parameterized by a number \(t\). One of the results in this paper shows the following: Take \(f \colon G \to H\) and fix some $\varepsilon > 0$. Then one can test whether \(f\) is a group homomorphism where the adversary can manipulate \(t\) values after every query is answered using a budget of \(O(1/\varepsilon + \log_2t)\) queries.

 Computational Complexity in Property Testing by Renato Ferreira Pinto Jr., Diptaksho Palit, Sofya Raskhodnikova (arXiv) This paper studies the tradeoff between the query complexity and the time complexity aspects of property testing tasks. Most of the research works in property testing have focused on obtaining testers with better and better query bounds. The present work explores the query/time interplay in property testing and develops tools to prove computational hardness of property testers. The paper proves an unconditional and a conditional hierarchy theorem. The first theorem, for instance, asserts that there are property testing problems which witness a strict separation between the query complexity of the task and the running time required to solve the task. The second theorem asserts that assuming SETH, you can separation between the query bounds and the running time bounds can be improved.


Interactive proof systems for FARNESS by Oded Goldreich Tal Herman Guy N. Rothblum (ECCC) This paper continues a line of work on interactive proofs of proximity for distribution testing problems, building on earlier work of Chiesa and Gur (see our October 2017 news). The central question here is whether interaction can reduce the sample complexity barriers that appear unavoidable in standard (non-interactive) distribution testing. The paper answers this in the affirmative for natural promise problems such as deciding whether two distributions are identical (the NO case in the interactive setup) or \(\varepsilon\)-far (the YES case in the interactive setup), as well as testing whether a distribution is far from uniform (the far case being the YES case in the interactive setup yet again). The authors construct explicit interactive proof systems in which the verifier uses asymptotically fewer samples than required by the best testers, while simultaneously the honest prover uses asymptotically fewer samples than would be needed to learn the distribution. The protocols rely on collision-based ideas and admit clean tradeoffs between the verifier’s and prover’s sample complexities. Conceptually, the result shows that interaction can beat both testing and learning bounds at the same time for concrete, well-studied distribution testing problems.

On doubly-sublinear interactive proofs for distributions by Oded Goldreich Guy N. Rothblum (ECCC) As I understand it, this paper takes a step back from specific testing problems and asks how general the phenomenon exhibited above really is. So let’s try to place the context for this paper with the preceding paper in mind. Motivated with the interactive protocols presented for FARNESS, the authors introduce the notion of doubly-sublinear interactive proofs of proximity (ds-IPPs), where the verifier’s sample complexity is asymptotically smaller than testing complexity and the honest prover’s sample complexity is asymptotically smaller than learning complexity. The main contribution is conceptual rather than algorithmic: the paper shows that ds-IPPs do exist, by constructing distribution properties (algebraically defined and somewhat artificial) that admit such protocols. Together with the above paper, this work shows that interaction changes the game in the sense that the situation with sample complexity looks different with interaction than without.

Proving the PCP Theorem with 1.5 proof compositions (or yet another PCP construction) by Oded Goldreich (ECCC) This paper revisits the classic PCP Theorem and offers a new PCP construction of constant query complexity and sublinear randomness complexity \(n^{\alpha}\) for any constant \(\alpha > 0\). The traditional Arora–Lund–Motwani–Sudan–Szegedy proof uses two generic proof compositions (Reed–Muller with itself, then with a Hadamard-based PCP). Inspired from a recent work of Amireddy et al. which constructs a PCP (without composition) with constant query complexity and randomness complexity \(n^{\alpha}\), this work approaches the same challenge. It combines a Reed-Muller based PCP with Hadamard encoding in a novel way. By working with a constant number of dimensions and a large alphabet, then encoding field elements via Hadamard codes and emulating key tests (e.g., low-degree and sum-check), the paper achieves a PCP system that can serve as an inner verifier without full generic composition. In all, this paper presents another route to the PCP Theorem with constant queries and \(n^{\alpha}\) randomness.

Interactive Proofs For Distribution Testing With Conditional Oracles by Ari Biswas, Mark Bun, (Our Own) Clement Canonne, Satchit Sivakumar (arXiv) Another paper whose roots lie in the seminal work of Chiesa and Gur. This paper poses the following twist in distribution testing (on top of the Chiesa-Gur twist):

Can label-invariant properties be verified in a (query, sample and communication)-efficient way when the tester has access to a PCOND oracle.

The paper proves that there exist label invariant properties of distributions sample complexity of testing which is not allayed with the PCOND oracle (that is, it remains the same as the sample complexity of the task without one). However, when interaction is allowed a different picture emerges. In particular, the paper presents a round protocol which can be used to interactively test a label-invariant distribution property (over a universe of size \(N\)) in the PCOND model with a query complexity that grows as \(poly(\log N)\).

Verification of Statistical Properties: Redefining the Possible by (Again, our own) Clement Canonne, Sam Polgar, Aditya Singh, Aravind Thyagarajan, Qiping Yang (ECCC) Let us again start our story from the Chiesa-Gur paper. So, we have an unreliable prover and a verifier where the verifier is interested in testing some label-invariant properties of a distribution and is allowed to chit-chat with the prover. One problematic aspect which prevents this construct from being applicable to some practical scenarios is that there are cursed properties (eg, things as basic as UNIFORMITY testing), testing which requires the verifier to hold \(\Omega(\sqrt n)\) items where \(n\) denotes the support size of the distribution whose properties we want to test. Undeterred by this news, the featured paper contemplates how to evade this curse. The eye-opening realization is to consider these tasks with the roles of completeness and soundness flipped — eg akin to the “Interactive proofs of farness” paper above, you think your goal is like testing FARNESS from UNIFORMITY. The paper shows this task indeed admits a protocol with sample complexity being \(O(1)\). The paper also considers the multiplayer and the zero-knowledge variations of these tasks and looks like a nice conceptual read for the PTReview readers.

Learning and Testing Convex Functions by Renato Ferreira Pinto Jr., Cassandra Marcussen, Elchanan Mossel, Shivam Nadimpalli (arXiv) The paper studies functions \(f \colon \mathbb{R}^d \to \mathbb{R}\) under Gaussian measure and formalizes convexity testing as distinguishing convex functions from those that are \(\varepsilon\)-far in \(L^2\) equipped with the Gaussian Measure. A central theorem shows that if \(f\) is Lipschitz (or satisfies comparable regularity), then convexity can be tested with sample complexity \(n^{poly(1/\varepsilon^2)}\). Conversely, the paper supplements these results with a sample complexity lower bound in a natural query model.

Halfspaces Are Hard to Test with Relative Error by Xi Chen, Anindya De, Yizhi Huang, Shivam Nadimpalli, Rocco A. Servedio, Tianqi Yang (arXiv) This paper considers property testing of Boolean halfspaces in the relative-error model (which we covered earlier, eg see here), where distance between a pair of functions is defined as the fractional symmetric difference between the supports of the two functions. We assume that algorithms have access to a uniformly random element in the support of the input function. Alright, with these details behind us, let us state the main result of this paper. It asserts that for some constant \(\varepsilon_0 > 0\), any relative-error ​\(\varepsilon_0\)-tester for halfspaces over \(\{0,1\}^n\) must take \(\Omega(\log n)\) samples, showing a super-constant lower bound in this model. This result sharply contrasts with classic absolute-error testing results, where halfspaces admit constant-query testers, and thus demonstrates that changing the relative-error model differs significantly from the classical model.

Testing Noisy Low-Degree Polynomials for Sparsity by Yiqiao Bao, Anindya De, Shivam Nadimpalli, Rocco A. Servedio, Nathan White (arXiv) Let us start from the setup considered in the Chen-De-Servedio paper we covered in our November 2019 post. In their setup, you are given samples from a noisy linear model \(\boldsymbol{y} = \mathbf{w \cdot x} + \texttt{Noise}\). On the other hand, this paper considers a generalized situation where you are given \(\boldsymbol{y} = \mathbf{p(x)}+\texttt{Noise}\). You are interested in knowing whether \(\mathbf{p}\) is \(s\)-sparse or far-from \(s\)-sparse. The main result gives explicit conditions under which sparsity can be tested with a constant number of samples, depending on the degree, noise rate, and sparsity parameter \(s\). Outside this regime, the authors prove logarithmic lower bounds, showing that once noise overwhelms certain structural signals, sparsity testing becomes nearly as hard as learning the polynomial itself.

Efficient Testing Implies Structured Symmetry by Cynthia Dwork, Pranay Tankala (arXiv) This paper addresses a structural question in property testing: which Boolean properties admit testers with small sample complexity and efficient computation? The paper shows that if a property \(P \subseteq \{0,1\}^n\) is testable using (i) few samples and (ii) a circuit of small size, then \(P\) must be close to a property \(\mathcal{Q}\) which enjoy some kind of stuctured symmetry. In other words, efficient testability forces the property to exhibit a form of low-complexity symmetry. Conversely, the paper shows that properties with such structured symmetry admit efficient testers.

Mathematical exploration and discovery at scale by Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, Adam Zsolt Wagner (arXiv) This paper asks, fairly openly (see Section 1.2), what it might even mean for modern AI systems to participate in mathematical discovery. The authors place their work in the long lineage of computer-assisted mathematics, from early symbolic systems to contemporary automated reasoning, but are careful to frame large language models not as theorem-provers or replacements for mathematicians, but as exploratory engines: components inside a closed-loop pipeline where proposal, evaluation, and refinement happen repeatedly and at scale. Concretely, the paper studies a setup in which LLM-based code generation is coupled with automated evaluators, allowing the system to search large spaces of mathematical constructions and algorithms with minimal human intervention. Rather than optimizing for a single headline result, the authors emphasize breadth, reporting on 44 separate “adventures” spanning combinatorics, number theory, analysis, and even Olympiad-style geometry. In many cases, the system rediscovers known constructions or matches best-known bounds; in others, it produces plausible variants or refinements that look less like finished theorems and more like the kind of halfway-formed ideas one might jot down while exploring a problem for the first time. Quoting from the paper:

“One illustrative example comes from experiments related to the Erdős discrepancy problem. First, when the system was run with no human guidance, it found a sequence of length 200 before progress started to slow down. Next, when the prompt was augmented with the advice to try a function which is multiplicative, or approximately multiplicative, the system performed significantly better and found constructions of length 380 in the same amount of time. These constructions were still far from the optimal value of 1160, and the authors explicitly note that other hints (for example, suggesting the use of SAT solvers) might have improved the score further, but were not explored due to time limitations.”

The broader point of the paper is less about getting optimal bounds and more about scale and exploration. The results are empirical and depend on carefully chosen experimental settings, but the diversity of examples makes this an engaging and unusually readable account of what happens when machines are allowed to roam around mathematical landscapes with a long leash.

News for October 2025

Another busy month in property testing. We have nine papers, covering a range of topics from distribution testing, pattern-freeness testing, and even LLMs! Let’s start the journey.

Proving Natural Distribution Properties is Harder than Testing Them by Tal Herman and Guy Rothblum (ECCC). This paper is on interactive proof systems for distribution testing. Suppose an untrusted prover claims to have discovered some property of a distribution \(\mathcal{D}\). The verifier can sample from \(\mathcal{D}\) and communicate with the prover, in the hope of verifying this property more efficiently that running a property tester directly. There is a rich line of work showing that many non-trivial properties can be verified much faster than testing directly. But in all these results, the honest prover learns the entire distribution and, thus has sample complexity \(\Omega(N)\) (where \(N\) is the domain size). This paper asks whether doubly-sublinear non-trivial protocols exists, wherein the honest prover has sample complexity \(o(N)\) and the verifier is more efficient than directly testing. The main result is hardness: for many natural properties, if the honest prover has \(o(N)\) sample complexity, then the verifier complexity is the same as the testing complexity. This says that proving is (provably!) more difficult than directly testing. The main technical work is in showing such a result for estimating collision statistics of the distribution. Collision statistics form the basis of most label-invariant distribution testing algorithms, leading to the various results in the paper.

Testing forbidden order-pattern properties on hypergrids by Harish Chandramouleeswaran, Ilan Newman, Tomer Pelleg, and Nithin Varma (arXiv). Monotonicity is arguably one of the most fundamental properties studied in property testing. It is a special case of “forbidden order” properties. Consider a function \(f:[n]^d \to \mathbb{R}\), and permutation \(\pi\) of \([k]\), for some constant \(k\). The function is “\(\pi\)-free” if there do not exist \(x_1 \prec x_2 \prec \ldots \prec x_k\) where \(\pi\) is the permutation of the sorting of \(f(x_1), f(x_2), \ldots, f(x_k)\). (We use \(\prec\) for the standard coordinate-wise partial order.) So monotonicity is equivalent to \((2,1)\)-freeness (or \((1,2)\)-freeness depending on ascending/descending order), since it suffices to consider the order of pairs of domain points. Even over the line (\(d=1\)), forbidden order properties exhibit a rich theory. This paper initiates the study of forbidden order freeness for the \(d=2\) case. For \(\pi = (1,2,3)\)-freeness, the paper gives a \(poly(\log n)\) time property tester. But curiously, any one-sided tester for \(\pi=(1,3,2)\)-freeness requires \(\Omega(\sqrt{n})\) queries. There is an interesting adaptivity gap: there is an adaptive \(n^{4/5}\)-query tester for any \(\pi\) for \(k=3\), but a corresponding non-adaptive (one-sided) tester requires \(\Omega(n)\) queries.

Relative-error unateness testing by Xi Chen, Diptaksho Palit, Kabir Peshawaria, William Pires, Rocco A. Servedio, Yiding Zhang (arXiv). Another variant of monotonicity testing. A function \(f:\{0,1\}^n \to \{0,1\}\) is unate if it is either non-decreasing or non-increasing along every dimension. (A monotone function must be non-decreasing in every dimension.) Interestingly, depending on the setting, unateness can be harder or have the same complexity as monotonicity testing. The paper investigates the recent “relative-error model”. In the standard setting, the distance between two functions \(f, g\) is the (normalized) Hamming distance \(\| f -g \|_0/2^n\). In the relative-error setting, we measure the distance as the symmetric difference between the “ones”, so it is \(|f^{-1}(1) \Delta g^{-1}(1)|/|f^{-1}(1)|\). This distinction is analogous to dense vs sparse graph property testing. This paper shows that the non-adaptive complexity of unateness testing is (ignoring \(\varepsilon\) factors) is \(\widetilde{\Theta}(\log N)\), where \(N = |f^{-1}(1)|\). There is also an adaptive \(\Omega((\log N)^{2/3})\) bound. (Thanks for the post by M.C. on the correction. -Ed) The upper bound is the same as recently obtained for monotonicity in this model.

Uniformity Testing under User-Level Local Privacy by Clément L. Canonne, Abigail Gentle, Vikrant Singhal (arXiv). Let us revisit the standard uniformity testing problem with user-level privacy. There are \(n\) users, each of whom holds \(m\) samples of a distribution \(\mathcal{D}\). The domain size is denoted as \(k\). These users communicate with a central server, whose aim is to test \(\mathcal{D}\) for uniformity (or equality, or whatever property you care for). This is a practically motivated problem where each user is a phone that does not want to share its private data, but the server wishes to perform some learning on all the data. If each user held one sample, the setting is called “local differential privacy”: we want the algorithm/tester output to be nearly identical if a single user changes their data. It is known that, for public coin protocols, such a tester requires \(\Theta(k)\) samples in total (in contrast with the \(\Theta(\sqrt{k})\) samples for the standard tester). In the paper’s setting, the protocol has the differentially private with respect to changes in any of the user’s data. This is much stronger, since an individual change is a much smaller fraction of the user’s data. If each user held more than \(\sqrt{k}\) samples, then each user could simply run a distribution tester and communicate a single private bit with the server. The interesting case is when \(m \ll \sqrt{k}\). This paper gives a natural tradeoff between \(m\) and \(n\), that generalizes the local DP guarantee. Basically, the paper shows that when \(mn\) is at least \(k\), then one can get uniformity testing with user-level privacy. Different results hold for private coin protocols, where local DP is harder.

Non-iid hypothesis testing: from classical to quantum by Giacomo De Palma, Marco Fanizza, Connor Mowry, Ryan O’Donnell (arXiv). This paper studies the “non-iid” distribution hypothesis/equality testing problem. Suppose there are \(T\) distributions \(p_1, p_2, \ldots, p_T\), over the domain \([d]\). The aim is to test if the average distribution \(\sum_i p_i/T\) is a known distribution \(q\). We are only allowed a few samples from each distribution. A recent result proved if we get just \(2\) samples from each distribution, then equality testing is possible when \(T \gg \sqrt{d}\) (ignoring \(\varepsilon\) factors). On the other hand, this is not possible with just a single sample from each distribution. This paper studies the quantum analogue of this problem. But first, it actually improves the \(\varepsilon\) dependence on the classical bound, matching the optimal \(\sqrt{d}/\varepsilon^2\) bound. In the quantum setting, each “distribution” is a quantum state, which is represented as a \(d \times d\) matrix. Samples of the state are basically projections/eigenvectors. It is known that \(O(d)\) samples suffices to do test if a quantum state is “maximally mixed” (the equivalent of the identity distribution). This paper shows that, in the non-iid setting, even getting one sample from each state suffices for equality testing. This is in contrast with the classical setting, where \(2\) samples are required per distribution.

Near-Optimal Property Testers for Pattern Matching by Ce Jin, Tomasz Kociumaka (arXiv). This is a result on sublinear algorithms for string matching. Consider a pattern string \(P\) of length \(m\) and a text \(T\) of length \(n\), where both are considered part of the input. Our aim is to determine if \(P\) is a substring of \(T\), or if \(P\) has Hamming distance \(k = \varepsilon m\) from every substring of \(T\). Conventionally for this literature, the parameter \(k\) (and not \(\varepsilon\)) is used. There is a simple sampling algorithm that makes \(\widetilde{O}(\sqrt{nm/k} + n/k)\) queries, but has linear running time. The main question is to get an algorithm whose running time also matches this bound. Previous work gave a \(\widetilde{O}((n^2m/k)^{1/3} + n/k)\) time procedure. The main result gives a non-adaptive algorithm whose running time matches the \(\widetilde{O}(\sqrt{nm/k} + n/k)\) query bound. Curiously, one can get an adaptive algorithm that improves the dependence on \(n\) to \(n-m\), so it is faster when the pattern and text are roughly of the same length. These upper bounds are matched with lower bounds, so it proves an adaptivity gap for this regime. The paper gets the complete picture of the time complexity for all ranges of \(n-m\).

Computational Complexity in Property Testing by Renato Ferreira Pinto Jr., Diptaksho Palit, Sofya Raskhodnikova (arXiv). A generalization of the spirit of the previous paper. The paper asks the fundamental question how query complexity and time complexity are related for property testing. A property tester is a \((q(n), t(n))\)-tester if it makes \(q(n)\) queries and has running time \(t(n)\). The paper provides a precise formalization of running time in the RAM model for a sublinear algorithm. The main result is a hierarchy theorem akin to the classic time hierarchy theorem. Assuming SETH, for any (reasonable) \((q(n), t(n))\), there is a property with a \((q(n), t(n))\)-tester that has no \((q'(n), t'(n))\)-tester, where \(q'(n) \ll q(n)\) and \(t'(n) \ll t(n)\). (There is an unconditional hierarchy theorem, with much stronger conditions on \(t'(n)\).) The gaps between query complexity and running time are explored for the classic problem of halfspace testing. For the distribution-free distance approximation problem, there are \(O(d/\varepsilon^2)\)-query testers, but they run in time exponential in \(d\). Assuming the fine-grained hardness of \(k\)-SUM, this paper proves a lower bound showing that the running time of any property tester must have such an exponential dependence.

Sublinear Algorithms for Estimating Single-Linkage Clustering Costs by Pan Peng, Christian Sohler, Yi Xu (arXiv). Single-linkage clustering (SLC) is a common practical algorithm. The input is a weighted graph, where the weight denotes the distance between the objects/vertices. Given a collection of clusters, the distance between two clusters is the distance between the closest pair of vertices. SLC just repeatedly merges the two clusters that are closest, to get a hierarchical clustering. This paper investigates SLC from the sublinear lens. One can define an objective function corresponding to SLC; for a given clustering, the cost is the sum (over clusters) of the minimum spanning trees of each cluster. For any \(k\), one can consider the optimal clustering with \(k\) clusters. The aim is to approximate these costs. The “SLC profile vector” is the list of costs over all \(k\). The main result gives a sublinear time algorithm that approximates this profile vector. The running time is \(O(\sqrt{W} d)\) (ignoring \(poly(\varepsilon^{-1} \log n)\) factors), where \(W\) is the maximum weight and \(d\) is the maximum degree of the graph. Naturally, these results are obtained by building on the classic sublinear time MST approximation algorithm. The algorithms in the paper are quite interesting, and one wonders if there is a potential to implement them.

Prior Makes It Possible: From Sublinear Graph Algorithms to LLM Test-Time Methods by Avrim Blum, Daniel Hsu, Cyrus Rashtchian, Donya Saless (arXiv). This paper is a fascinating take on Retrieval Augmented Generation (RAG) on Large Language Models (LLMs). While existing LLMs can surprise us with their synthesis abilities, they often suffer in learning new context or factual responses. RAGs are a technique used wherein the LLM is given more context through an existing, curated knowledge base to improve its accuracy. The paper models this problem in terms of a knowledge graph. Suppose there is an unknown ground truth graph \(G^*\), which one can think of as a representation of all “facts” in an area. Most question/answer or factual retrievals can be thought as a paths in this graph. (Factual data is sometimes store in graph format, where edge represent relationships between entities.) So our aim is to find a short (even constant length) \(s\)-\(t\) path in \(G^*\). There is a subgraph \(G \subseteq G^*\) that is known as “prior knowledge”. (One could think of this as what the LLM has been trained on.) There is a retrieval mechanism that generates edges from \(G^*\); in property testing language, this is exactly the query model, and provides to connection to sublinear graph algorithms. With a few queries to the retrieval mechanism, we wish to find the \(s\)-\(t\) path. The main result shows that the prior graph \(G\) has to be quite dense to have constant query path algorithms. Otherwise, there is a lower bound of \(\sqrt{n}\) queries, where \(n\) is the number of vertices in \(G^*\). There are many related results on hallucinations, where spurious edges may be in the prior.


News for September 2025

With October already here (!), time to look back at September, with seven property testing papers! With a healthy mix of error-correcting codes, graph testing, distribution testing, and computational geometry:

Robust Local Testability of Tensor Products of Constant-Rate Algebraic Geometry Codes, by Sumegha Garg and Akash Sengupta (ECCC). Given two error-correcting codes \(\mathcal{C}_1,\mathcal{C}_2\), their tensor product code \(\mathcal{C}_1\otimes\mathcal{C}_2\) can be represented as the set of matrices \(M\) whose rows are codewords of \(\mathcal{C}_1\) and columns codewords of \(\mathcal{C}_2\). How to (very quickly) test if an arbitrary matrix \(M\) corresponds to a codeword of \(\mathcal{C}_1\otimes\mathcal{C}_2\)? That is, when is such a tensor code a Locally Testable Code (LTC)?
A natural approach is to pick either a row or column of \(M\) uniformly at random, and check if that is a codeword of \(\mathcal{C}_1\) or \(\mathcal{C}_2\). If this natural test works, then \(\mathcal{C}_1\otimes\mathcal{C}_2\) is said to be robustly locally testable. Which brings us to this paper: the main result is to show that the tensor codes obtained from Algebraic-Geometry (AG) codes (a generalization of Reed-Solomon codes) are, indeed, robustly locally testable.

Expansion without Connectivity: A Property Testing Perspective, by Irit Dinur and Oded Goldreich (ECCC). Testing whether a (bounded-degree) \(n\)-vertex graph is an expander is among the earliest tasks considered in property testing, back to the influential work of Goldreich and Ron (2000) which connected this to the distribution testing question of uniformity testing. This paper looks at an important twist of the question: what if instead of testing if a graph is an expander, one wanted to assess whether it was expanding? This sounds like a rephrasing of the same question, but with a crucial difference: here, a graph doesn’t need to be connected to be expanding, as long as each of its connected components each (by itself) an expander. After motivating this variant of the question and connecting , the authors provide a lower bound of \(\Omega(\sqrt{n})\) queries for testing expansion, and complement it with an \(O(n^{2/3+0.0001})\)-query algorithm solving (a bicriterion version of) the testing task. Interestingly, the upper bound involves another connection to distribution testing, this time to the task of generalized uniformity testing introduced by Batu and Canonne (2017).

Plane vs. Plane Low Degree Test, by Amey Bhangale and Silas Richelson (ECCC). Low-degree tests, one of the key ingredients of the original PCP theorem and a central object of study in related areas of TCS, must, given query access to a function \(f\colon \mathbb{F}_q^m \to \mathbb{F}_q\), test whether \(f\) is a degree-\(d\) polynomial, of far from every such low-degree function. The “plane v. plane” test is a natural approach to perform this task: given a description of the function as a truth table listing its degree-d restriction \(f_P\) to every plane \(P\), do the following. Pick two random planes \(P,P’\), and check if the restrictions agree on the intersection \(P\cap P’\). The question can then be rephrased as that of characterizing the soundness of the test: if this “local test” accepts with probability \(\varepsilon\), then must \(f\) be \(\textrm{poly}(\varepsilon)\)-close to a “global” low-degree polynomial? This work significantly improves the state-of-the-art on this question, in the low agreement regime (i.e., when the soundness parameter \(\varepsilon\) can be very small: only \(\Omega(d/q)\)).

Distribution Testing in the Presence of Arbitrarily Dominant Noise with Verification Queries, by Hadley Black and Christopher Ye (arXiv). The authors introduce a new setting for distribution testing, where the (unknown) distribution \(p\) to be test is noisy (the “Huber contamination model”): that is, instead of getting i.i.d. samples from \(p\), the algorithms gets i.i.d. samples from some \(\tilde{p} = \alpha p + (1-\alpha)q\), where \(q\) is an arbitrary “noise distribution.” But all isn’t lost, as the algorithm, for any sample, can also query whether it comes from the “good” part of the mixture \(p\), or the “bad” part \(q\) (there is no “ugly” part). The obvious baseline then is to query for all samples, discard the bad ones, and run any out-of-the-box testing algorithm on what remains: this only costs an extra \(1/\alpha\) factor in the sample complexity. The authors show that one can do (much) better than this baseline! In the case of uniformity/identity and closeness testing, their results show that one can trade additional noisy samples for fewer verification queries.

On the Structure of Replicable Hypothesis Testers, by Anders Aamand, Maryam Aliakbarpour, Justin Y. Chen, Shyam Narayanan, Sandeep Silwal (arXiv). A paper from July, which we missed then. Say you want to perform your favorite distribution testing task, but in a replicable fashion (as defined by Impagliazzo, Lei, Pitassi, and Sorell (2022)): that is, run on a new set of samples, your algorithm should have the same output with high probability. “Easy”! After all, your algorithm already accepts things it should accept (in the property) with high probability, and rejects those it should reject (far from the property) with high probability… Yet not so easy: the replicability requirement also asks the algorithm to be consistent for the in-between distributions, too, those that are close but not in the property! This question was introduced by Liu and Ye in 2024: this new paper significantly improves both the results themselves and the understanding of replicable testing algorithms. In particular, it provides a characterization of what any such algorithm must satisfy, and uses this to design a “canonical tester” in the replicable setting.

Testing Depth First Search Numbering, by Artur Czumaj, Christian Sohler, Stefan Walzer (arXiv). In the bounded-degree graph model for testing, the algorithm is given query access to the adjacency function (given a vertex \(v\), can ask for the i-th neighbor of \(v\)). In this paper, the author augment this model by adding labels to the vertices, along with the ability to query the label of any vertex; but, crucially, not the reverse access, which would be to get the vertex corresponding to a chosen label. The main motivation and focus of the paper is then to test whether the (numerical) labels of the input graph correspond to a valid DFS numbering, as one would obtain by running a DFS on the graph, or if the graph is \(\varepsilon\)-far from any graph for which these labels are a DFS numbering. The authors provide both an upper and lower bound for testing DFS numbering, showing that \(\Theta(n^{1/3})\) queries are necessary and sufficient (for constant \(\varepsilon\)).

And finally, another paper we originally missed earlier this year, now updated on the arXiv:

Property Testing of Curve Similarity, by Peyman Afshani, Maike Buchin, Anne Driemel, Marena Richter, and Sampson Wong (arXiv). The authors initiate the study of property testing for Fréchet distance (continuous or discrete): given two \(n\)-vertex curves \(P,Q\) and parameters \(\delta, \varepsilon\), the algorithm must accept if \(P,Q\) have Fréchet distance at most \(\delta\), and reject if they are “\(\varepsilon\)-far from having Fréchet distance \(\delta\)” (where this “farness” is formally defined as saying the best coupling path between \(P,Q\) in their \(\delta\)-free space matrix has cost at least \(\varepsilon n\)).
The authors provide two different (incomparable) algorithms analyzed in terms of a locality parameter \(t\), the first algorithm showing that when this parameter is known only \(\tilde{O}(t/\varepsilon)\) queries are sufficient, and the second algorithm achieving query complexity \(\tilde{O}(t^2 \max(t,\log n)/\varepsilon)\) when \(t\) is unknown. The authors then conclude with a few open questions, which the readers of this blog keen to explore property testing in computational geometry may find interesting!

Update (Oct 8): another paper on replicable distribution testing we missed in July! With a very fitting title, too:
Replicable Distribution Testing, by Ilias Diakonikolas, Jingyi Gao, Daniel Kane, Sihan Liu, and Christopher Ye (arXiv). Concurrent to the work mentioned above, this paper establishes new bounds for uniformity, closeness and independence testing in the replicable setting; and complements it with a new general-purpose lower bound technique, enabling the authors to establish optimality (up to logarithmic factors) of their results on uniformity and closeness testing.

News for August 2025

Our press release this month features seven papers. Let’s take a look: a survey on efficiently testable graph properties; two papers on new directions in graph property testing; one that, after some meditations on distribution testing, introduces a novel direction; another on testing properties of strings; and two papers exploring property testing with a quantum twist. Without further delay, let us analyze our spread.

Polynomial Property Testing by Lior Gishboliner and Asaf Shapira (arXiv) This survey traces the development of property-testers for dense graph properties through the ages. The motivating question echoes throughout the survey: what is it that makes a dense graph property efficiently testable? Noting that any tester for your favorite property tester can be simulated using a canonical tester with at most a quadratic overhead in the number of queries, the survey focuses on properties which admit canonical testers that make at most \(poly(1/\varepsilon)\) queries. This part of the story is narrated in Section 3 of the paper. This section begins by reviewing the seminal result of Alon from 2002 which kicked off research in this area which asserts that a property of \(H\)-freeness can be tested efficiently \(\textit{iff}\) \(H\) is bipartite. Earlier this year, Gishboliner and Shapira extended this result to testing the property of \(H\)-freeness in \(k\)-uniform-hypergraphs. This result asserts that such a property is efficiently testable \(\textit{iff}\) \(H\) is \(k\)-partite. The survey then presents a result which extends this result to the setting of induced subgraphs. In particular, this result characterizes for which graphs \(H\), is the task of testing \(H\)-induced-subgraph freeness efficiently testable. The section concludes with a discussion of testing graph properties of bounded \(\textit{VC}\)-dimension.

Quality control in sublinear time: a case study via random graphs by Cassandra Marcussen, Ronitt Rubinfeld, and Madhu Sudan (arXiv) This paper introduces a general framework in which you can situate the following kinds of questions: So suppose you are given a probability parameter \(p\) and a value \(k \in \mathbb{N}\). Additionally, suppose I give you a graph \(G = (V,E)\) and I ask — is it true that the quantity

\(\rho_K(G) \colon = C_k(G)/\textbf{E}_{H \sim \mathcal{G}’_{n,p}}[C_k(G’)] \approx 1\)

or is it the case that

\(|\rho_K(G) – 1| > \varepsilon\)

(where the function \(C_k\) counts the number of \(k\)-sized cliques in \(G\)). The paper presents algorithms for this problem which issue at most \(\Big(\frac{1}{p}\Big)^{O(k)}\) queries.
Now, let me explain the broader framework this paper introduces. You want to think of \(\rho_k(G)\) as measuring the quality of your graph \(G\). You want to understand whether \(G\) is distributed as a genuine sample (that is, whether \(G\) is a “high-quality graph”) according to \(\mathcal{G}_{n,p}\) as measured by the function \(C_k\). In general, as the authors explain via an analogy, this is supposed to capture the following sentiment: you go to a bookstore and you want to buy a book with quality close to \(1\). The goal thus, is to choose a high-quality book. Note that this goal is weaker than both of the following goals: testing whether

\(\text{quality(book)} \approx 1\) for an adversarial book,

OR whether

\(\text{book} \sim \texttt{Some Distribution of your choice over books in store}\).

The authors situate quite a few other problems in this overarching framework and present algorithms for these.

Sublinear-Time Approximation for Graph Frequency Vectors in Hyperfinite Graphs by Gregory Moroie (arXiv) A useful first step for several algorithms on planar graphs of bounded degree is to exploit the fact that such graphs admit a hyperfinite decomposition. Using a primitive called the partition oracle, one can estimate in sublinear time various important graph parameters — like max-cut value, independent set size and so on to within a multiplicative \($(1 \pm \varepsilon)\). Often, a helpful step in these algorithms proceeds via a graph simplification step where one approximates the input graph with a crude sketch which tracks frequency counts of various subgraphs on the pieces produced by the partition oracle each of which have size at most \(k = O(1/\varepsilon^2)\). The featured paper gives algorithms which find a graph \(H\) in time \(poly(1/\varepsilon, d^k)\) which can be used to approximate the corresponding frequency vector in \(G\).

Instance-Optimal Uniformity Testing and Tracking by Guy Blanc, Clément L. Canonne, and Erik Waingarten (arXiv) After some meditations on the task of testing uniformity of distributions with support \([n]\), the featured paper suggests that despite the impressive algorithmic success on this task, there is a lot that one still desires. In particular, one of the fundamental reasons for dissatisfaction with the sample-optimal results for the above task, is that often in real-life situations, the only thing we are interested is any evidence of non-uniformity — without any distance notion in the picture. So, the paper formalizes the following question: Suppose we have a setup where at each time step \(t \in \mathbb{N}\), we receive as input a sample \(x_t \sim \boldsymbol{p}\) where \(\boldsymbol{p}\) is an unknown probability distribution supported on \([n]\). We want our algorithm to run continuously and return reject if at any point in time, the algorithm discovers \(\boldsymbol{p} \neq \texttt{UNIF}[n]\). The goal is to do this in as less time as possible. To rule out trivial algorithms which straight-up reject \(\boldsymbol{p}\), we require that the probability that the algorihtm EVER rejects \(\texttt{UNIF}[n]\) itself is at most some sufficiently small \(\delta > 0\). Denoting by \(OPT(\boldsymbol{p})\) the sample complexity of the best possible algorithm for this task which knows the entire distribution profile \(\boldsymbol{p}\) in hindsight. The featured paper devises an algorithm for this problem which is \(poly(\log (OPT))\)-competitive against this best algorithm.

Counting Distinct Square Substrings in Sublinear Time by Panagiotis Charalampopoulos, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba (arXiv) A square in a string is a substring of the form \(XX\), where \(X\) is a nonempty sequence of characters. The featured paper studies how to count the number of distinct square substrings in a string of length \(n\) over an alphabet of size \(\sigma\), given in packed form. The authors design a sublinear-time algorithm running in \(O(n/\log_\sigma n)\) in the word-RAM model by exploiting the structure of runs (maximal periodic fragments). To handle these runs systematically, the paper uses sparse Lyndon roots — where a Lyndon root is essentially the lexicographically smallest rotation of a repeated block — so that each repetition has a unique representative. The paper further groups certain overlapping runs into pyramidal structures to make counting efficient. This avoids scanning every position of the string and shows that distinct squares can be counted strictly faster than reading the entire string symbol by symbol, giving the first sublinear-time algorithm for this problem.

Resource-efficient algorithm for estimating the trace of quantum state powers by Myeongjin Shin, Junseo Lee, Seungwoo Lee, and Kabgyun Jeong (arXiv) The problem of estimating the trace of quantum state powers refers to computing \(\text{Tr}(\rho^k)\), where \(\rho\) is a quantum state (density matrix) and \(k\) is an integer — this quantity is key for tasks such as evaluating nonlinear functions of quantum states, Rényi entropy, and detecting entanglement. The featured paper presents a resource-efficient algorithm that reduces the quantum hardware requirements by using only \(O(\widetilde r)\) qubits and \(O(\widetilde r)\) multi-qubit gates, where \(\widetilde{r} = \min\Big({\text{rank}(\rho), \lceil \ln(2k / \varepsilon)\rceil}\Big)\). The featured paper builds on tools from mathematical foundations of entaglement spectroscopy (something called the Newton-Girard method): it estimates the traces of low powers \(\text{Tr}(\rho^i)\) for \(i = 1, \dots, \widetilde r\) using shallow quantum circuits and then uses classical recursion to reconstruct or approximate \(\text{Tr}(\rho^k)\) even when \(k\) is large — achieving rank-dependent complexity that works well for low-rank states.

Adversarially robust quantum state learning and testing by Maryam Aliakbarpour, Vladimir Braverman, Nai-Hui Chia, and Yuhan Liu (arXiv) The featured paper studies quantum state learning under a strong corruption model called \(\gamma\)-adversarial corruption, where an adversary can arbitrarily modify a \(\gamma\)-fraction of measurement outcomes, going beyond the usual SPAM noise assumptions. The authors give a non-adaptive measurement algorithm that learns a rank-\(r\) state up to trace-distance error \(\widetilde O(\gamma \sqrt{r})\), and also proves a matching lower bound \(\Omega(\gamma \sqrt{r})\), showing the guarantee is optimal. As the abstract notes (and piques your curiosity from the get-go), the results are at once optimistic and pessimistic: for constant-rank states, the error bound is dimension-independent, so adversarial corruption can be tolerated without dependence on the Hilbert space dimension; but for general full-rank states, the error necessarily scales as \(\gamma \sqrt{d}\), which means even a vanishingly small corruption rate, around \(1/\sqrt{d}\), is enough to destroy any non-adaptive learning algorithm.

The paper underscores that dimension-independent error is inevitable in this adversarial setting because single-copy measurements carry very limited information, so any algorithm must amplify small statistical differences to extract the state. This sensitivity leaves the estimation process inherently vulnerable to adversarially introduced errors unless the state has low rank, in which case the amount of information required to recover the state is substantially smaller.

News for July 2025

Finally, a post on the first! This month we have four papers, on understanding Nash Equilibria, deeper meditations on distribution testing, sublinear average degree estimation, and a paper on degree distributions that I’m quite excited to see.

Protocols for Verifying Smooth Strategies in Bandits and Games by Miranda Christ, Daniel Reichman, and Jonathan Shafer (arXiv). Consider a \(k\)-player game where each player has \(n\) options. Given a (mixed) strategy for each player, we would like to know if this is an approximate Nash equilibrium. But think of \(n\) as extremely large, so algorithms need to be sublinear in \(n\). This is a natural requirement, since in many real-world games, the independent options come from an extremely large space. This paper studies this setting from the verification perspective. Assume the algorithm/verifier can communicate with an all powerful prover, and has access to the game matrix/tensor. We want to verifier to make sublinear queries to the game and have sublinear communication (in bits) with the prover. The main result shows that, under some “smoothness” condition on the near equilibrium strategies, one can actually get such sublinear verifiers for Nash equilibria. The starting results in the paper are for the simpler multi-armed bandit setting, which are generalized to more complex games.

Faster Estimation of the Average Degree of a Graph Using Random Edges and Structural Queries by Lorenzo Beretta, Deeparnab Chakrabarty, and C. Seshadhri (arXiv). The problem of estimating the average degree of a graph in sublinear time is an old and classic problem. In a seminal work, Goldreich-Ron gave \((1+\varepsilon)\)-approximations with \(\widetilde{O}(\sqrt{n})\) queries, where \(n\) is the number of vertices. (Up to \(\log n, \varepsilon\) factors, the bound is optimal.) This works in the classic model where an algorithm can sample uar vertices, query degrees, and get neighbors. Suppose uar edges are also available. In that case, Motwani-Panigrahy-Xu give an algorithm that makes \(\widetilde{O}(n^{1/3})\) queries. What if one can also query pairs to check if they are edges? This is a standard operation in sublinear graph algorithms. Interestingly, this paper gives an \(\widetilde{O}(n^{1/4})\) query algorithm for this setting. And, if one can get the entire adjacency list in a single query (common in the study of web networks), then there is an \(\widetilde{O}(n^{1/5})\) query algorithm for this setting. This paper, following Beretta-Tetek, also studies the case where the number of vertices \(n\) is unknown, in which case the previous complexities change. Overall, there is a nuanced picture of the complexity of estimating the average degree.

Towards Tight Bounds for Estimating Degree Distribution in Streaming and Query Models by Arijit Bishnu, Debarshi Chanda, Gopinath Mishra (arXiv). In network science, which is study of real-world large graphs, probably one of the most important graph parameters studied is the degree distribution. Formally, let \(N(d)\) be the number of vertices of degree \(d\). The set \(\{N(d)\}\) is sometimes called the “degree distribution”, though to be precise, it is the “complementary cumulative degree histogram” (ccdh). There is a rich applied history of estimating the ccdh, but the only truly sublinear bound was given by Eden-Jain-Pinar-Ron-Seshadhri. The actual complexity was a little involved, and your truly had posed (in WOLA 2019) the question of determining the optimal bound. Which, I’m happy to say, this paper has nailed down. Ignoring log factors, the complexity is \(\Theta(m/h)\), where \(m\) is the number of edges and \(h\) is the, well, \(h\)-index of the degree distribution (the largest number \(k\) such that there are at least \(k\) vertices of degree at least \(k\)). The algorithm, which is similar to the previous result, is eminently practical and uses a combination of vertex and edge sampling to estimate different part of the distribution. Moreover, the algorithm can be implemented in the streaming setting.

Location-Invariant Properties of Functions versus Properties of Distributions: United in Testing but Separated in Verification by Oded Goldreich and Guy Rothblum (ECCC). Let us build up the distribution testing context. Consider a distribution \(\mathcal{D}\) over universe \([n]\). In the standard distribution testing setup, the tester gets samples from this distribution and has to (approximately) decide some property. One could imagine that there is an underlying space/function that actually generates this distribution. So think of \(f:[m] \to [n]\), where \(\mathcal{D}(i)\) is the fraction of the domain values \(j\) where \(f(j) = i\). A distribution property can be translated to the property/set of functions that induce the desired distributions. In the latter setting, there is more power, since a tester can explicitly query \(j \in [m]\) to get \(f(j)\). (While, in the distribution setting, the tester merely gets \(f(j)\) for a uar \(j \in [m]\).) It turns out that property testing in both settings are basically equivalent. This paper shows, to the contrary, there is a gap between these settings for the verification problem. Consider “doubly sublinear interactive protocols”, so that verifer must run with queries sublinear to the vanilla testing complexity, and the honest prover must run in queries sublinear to the learning complexity. For the classic uniformity testing problem, there is no such interactive protocol in the distribution testing setting. But, interestingly, there is a “doubly sublinear interactive protocol” in the functional setting, where the tester can get \(o(\sqrt{n})\) to verify uniformity.

News for June 2025

Another (sigh) delayed post. A moderately busy month, with 4 papers with sublinear graph algorithms and, of course, distribution testing.

A 0.51-Approximation of Maximum Matching in Sublinear \(n^{1.5}\) Time by Sepideh Mahabadi, Mohammad Roghani, and Jakub Tarnawski (arXiv). The title says it all. This paper gives a 0.51-approximation for the maximum matching in, well, \(O(n^{1.5})\) time. Here’s the back story. There are two kinds of sublinear maximum matching algorithms. The first kind (achieved after a long line of impressive results) get \((1- \varepsilon)\)-approximations in time \(O(n^{2 – \Omega_{\varepsilon}(1)})\). This means that that in sublinear \(o(n^2)\) time, one can get arbitrarily good approximations. The second kind beats the simple \(0.5\)-approximation (obtained from maximal matchings), but tries for closer to \(O(n)\) time. BehnezhadRoghaniRubinsteinSaberi gave a \((0.5+\varepsilon)\)-approximation algorithm running in \(n^{1+\varepsilon}\) time. But the largest value of \(\varepsilon\) possible in these results is tiny. Can one get a “significant” improvement over 0.5-approximations in time “significantly less” than \(n^2\)? The answer is yes, as the title explains.

Testing (Conditional) Mutual Information by Jan Seyfried, Sayantan Sen, and Marco Tomamichel (arXiv, ECCC). Consider a distribution over pairs \((A, B)\). The mutual information \(I(A,B)\) of \(A\) and \(B\) is the minimum KL-divergence between the original distribution and any product distribution over the support of \(A\) and \(B\). So the mutual information is zero iff \((A,B)\) is a product distribution, or equivalently \(A\) and \(B\) are independent. A natural distribution testing question is to distinguish \(A\) and \(B\) being independent from \(I(A,B) > \varepsilon\). One can take this problem a step further. Suppose the distribution is over triples \((A,B,C)\). Then can one distinguish conditional independence (\(I(A, B | C) = 0\)) from sufficient conditional dependence, so \(I(A, B | C) > \varepsilon\)? This problem was studied by (including our very own) Canonne-Diakonikolas-Kane-Stewart, and is known that \(O(d^{7/4})\) samples suffice. Here, \(d\) denotes the maximum alphabet size for each coordinate, so the overall domain size is \(d^3\). This paper gives a much more nuanced complexity, that depends separately on the support sizes for each coordinate. It is interesting that each coordinate plays a different role in the final complexity.

Tight simulation of a distribution using conditional samples by Tomer Adar (arXiv). Consider a distribution of \(\{0,1\}^n\). Many distribution testing results would not yield useful algorithms for this setting, since the domain is exponentially large. Inspired by this setting, the subcube conditional model was introduced by Bhattacharyya-Chakraborty. In this setting, we can generate samples conditioned on any subcube. The aim of this paper goes beyond property testing. Can we actually estimate specific values of the distribution with a few queries? Can we simulate the distribution, which means to “locally compute” a distribution that is close to the input? Specifically, given domain point \(x\), we want to compute a value \(\mu(x)\) such that \(\mu\) represents a distribution close to the input. This paper gives such a result, where each \(\mu(x)\) can be computed in \(O(n^2/\varepsilon^2)\) queries. The final distribution has KL-divergence \(\varepsilon^2\) from the original.

Sampling and Identity-Testing Without Approximate Tensorization of Entropy by William Gay, William He, Nicholas Kocurek, and Ryan O’Donnell (arXiv). Again, consider a distribution \(D\) over a large product domain \(\Sigma^n\). Our aim is identity testing, so we wish to determine if \({D} = {D}’\) or \(|{D} – {D}’| > \varepsilon\), where \({D}’\) is some explicitly known distribution. The model is much weaker than subcube condition access. Essentially, we can only get samples from subcubes of dimension (at least) \(n-1\). Blanca-Chen-Štefankovič-Vigoda studied identity testing where \({D}\) satisfies a condition called “approximate tensorization of entropy” (ATE). By Jensen’s inequality, if \({D}\) is a product distribution, the entropy of \({D}\) is at most the sum of entropies of the marginal distributions. If the entropy of \({D}\) is (say) at most a constant factor of this sum, we say it satisfies the ATE. Blanca-Chen-Štefankovič-Vigoda proves that \(\widetilde{O}(n/\varepsilon)\) suffices for identity testing. This paper studies input distributions that are mixtures of (say, a constant number of) distributions satisfying ATE. Interestingly, these mixtures do not satisfy the ATE. But this paper shows that one can still get \(\widetilde{O}(n/\varepsilon)\) query identity testers.

News for May 2025

Apologies for the delay in publishing this digest. May has seen a lot of activity in property testing, with quite a lot from the quantum side: 8 papers in total!

Improved Sample Upper and Lower Bounds for Trace Estimation of Quantum State Powers (arXiv) by Kean Chen and Qisheng Wang. Say you have access to copies of a quantum state \(\rho\) (lucky you!) and want to estimate some property of \(\rho\): specifically, you’d like an additive estimate of \(\mathrm{Tr} \rho^q\), for some \(q> 1\) of you choosing. (This quantity, for various \(q\), has connections to Rényi and Tsallis entropies). How many copies of the state do you need? The authors significantly strengthen previous bounds on the sample complexity of the task, providing a tight answer for \(q > 2\), and improved results for \(1< q < 2\). Besides their results, they also provide a list of future directions.

On estimating the quantum \(\ell_\alpha\) distance (arXiv), by Yupan Liu and Qisheng Wang. Now you’re given access to two quantum state \(\rho_0,\rho_1\), and you want to estimate their \(\alpha\)-Schatten distance, for a fixed \(\alpha>1\) — the “quantum analogue” of estimating the \(\ell_\alpha\) distance between two (classical) discrete probability distributions. The authors provide computationally efficient algorithms with constant sample complexity (for constant additive error), and characterize the computational complexity of the task as a function of \(\alpha\) (as \(\alpha \to 1\)).

Quantum Hamiltonian Certification (arXiv), by Minbo Gao, Zhengfeng Ji, Qisheng Wang, Wenjun Yu, and Qi Zhao. Now, you are given access to the time evolution operator \(e^{-i Ht}\), for an unknown Hamiltonian \(H\) over \(n\) qubits. Can you efficiently test (certify) whether \(H\) is close to (or far from) a reference Hamiltonian \(H_0\)? The paper considers this tolerant Hamiltonian task with respect to the (normalized) Frobenius distance, and provides tight bounds on the evolution time (the measure of “sample complexity” in this time-evolution setting). The authors also discuss consequences of their results for other norms, and also give an ancillary-free algorithm for the tolerant testing task.

Hamiltonian Locality Testing via Trotterized Postselection (arXiv), by John Kallaugher and Daniel Liang. When it Hamiltonian-rains, it Hamiltonian-pours! This papers considers another tolerant testing task on Hamiltonians, where instead of certifying whether \(H\) is close to a reference \(H_0\), the goal is to tolerantly test whether it is close to being “local” (vs. far from any local Hamiltonian), where a Hamiltonian is “local” if it can be decomposed as the sum of small-weight Pauli operators. The authors provide significantly improved upper bounds on the time evolution needed for this task, as well as a tight lower (and upper) bound for algorithms alllowed reverse time evolution.

And for the last quantum paper of the month (that we could find), quantum tomography:

Sample-optimal learning of quantum states using gentle measurements (arXiv), by Cristina Butucea, Jan Johannes, and Henning Stein. A gentle measurement, as formalized by Aaronson and Rothblum, is a measurement which only “mildly” collapses the quantum state. In their paper, Aaronson and Rothblum established two-way connections between gentle measurements and differential privacy. In this work, the authors study quantum tomography with gentle measurements, and provide what looks like a connection to local differential privacy, along with a new information theoretical tool, a quantum strong data processing inequality (SDPI) for gentle measurements.

And with this perfect segue, we now switch from quantum to local differential privacy:

Locally Differentially Private Two-Sample Testing (arXiv), by Alexander Kent, Thomas B. Berrett, and Yi Yu. This paper considers the question of two-sample testing (maybe more familiar to our reader under the name closeness testing) under local differential privacy, both for discrete and smooth continuous distributions. While some previous results on identity testing under LDP carry over to closeness testing, this work extends it to the smooth continuous case, and, focusing on practicality of the algorithms, focuses on a class of testers, the permutation tests.

… and to Boolean functions!

Testing Juntas Optimally with Samples (arXiv), by Lorenzo Beretta, Nathaniel Harms, and Caleb Koch. It is a truth universally acknowledged, that a single month in possession of a good number of property testing results, must be in want of a junta testing paper. And lo and behold: junta testing, indeed! The contributions of this paper are two-fold: first, the authors provide a tight query complexity bound for junta testing the distribution-free setting (the testing analogue of the PAC learning setting, where distance is measured with respect to an arbitrary, unknown, ambient distribution). Second, they give a lower bound showing that for tolerant junta testing in the distribution-free setting, one may as well learn the whole thing: testing is as hard as learning.

And to conclude… anti-concentration inequalities, and applications.

Algebraic aspects of the polynomial Littlewood-Offord problem (arXiv), by Zhihan Jin, Matthew Kwan, Lisa Sauermann, and Yiting Wang. While this paper is concerned primarily with the (polynomial) Littlewood–Offord problem, and how to improve a recent theorem of Meka, Nguyen and Vu under additional structural assumptions, the authors remark that some of the lemmas they establish along the way have direct implications for property testing of matrices and tensors. This is discussed in Section 1.1.4 of the paper.

Announcing WoLA 2025 in Chicago

The 9th edition of WoLA, the Workshop on Local Algorithms, will be taking place on August 18-20, 2025 at the Toyota Technological Institute (TTIC), Chicago, IL.

For those unfamiliar with WoLA:

Local algorithms, that is, algorithms that compute and make decisions on parts of the output considering only a portion of the input, have been studied in a number of areas in theoretical computer science and mathematics. Some of these areas include sublinear-time algorithms, distributed algorithms, inference in large networks and graphical models. These communities have similar goals but a variety of approaches, techniques, and methods. This workshop is aimed at fostering dialogue and cross-pollination of ideas between the various communities.

You are all invited and we would love to see you there!  For more on (free) registration (by ⏰ August 10), how to submit a poster, list of invited speakers, and local arrangements, see the website.

Program Committee: Arnab Bhattacharyya (University of Warwick), Clément Canonne (University of Sydney), Elena Grigorescu (University of Waterloo), Moti Medina (Bar Ilan University), Rocco Servedio (Columbia University), Asaf Shapira (Tel Aviv University) [Chair], Ali Vakilian (TTIC), Yuichi Yoshida (NII)

News for April 2025

The bonanza of property testing results continues this month, with seven new papers on the arXiv! And with range, too: from regular languages to quantum states, with detours through distribution testing, Boolean functions, and relative error property testing. Oh, and yes, also, (quantum) magic!

The Trichotomy of Regular Property Testing (arXiv), by Gabriel Bathie, Nathanaël Fijalkow, and Corto Mascle. In property testing of regular languages, initiated by Alon, Krivelevich, Newman, and Szegedy in 2001, one has to decide whether an input word belongs to the target language \(L\), or is at distance at least \(\varepsilon\) from every \(x \in L\) (where the distance is typically Hamming or the (easier) edit distance). Surprisingly, it was shown that every regular language could be tested with \(\tilde{O}(1/\varepsilon)\) queries, independent of the input size. But is that tight? And is that \(\tilde{O}\) necessary? Many years of work later, the main result of this paper is settling the question, showing that for testing under the Hamming distance there are only three options: either a language is trivial (0 queries needed!), or easy (\(\Theta(1/\varepsilon)\) necessary and sufficient), or just plain hard \(\Theta(\log(1/\varepsilon)/\varepsilon)\) queries necessary and sufficient). Nothing else!

A Mysterious Connection Between Tolerant Junta Testing and Agnostically Learning Conjunctions (arXiv), by Xi Chen, Shyamal Patel, and Rocco Servedio. (1) Take two well-studied, notoriously challenging and seemingly unrelated problems about Boolean functions: agnostic learning conjunctions (that is, learning conjunctions with noise), and tolerantly testing juntas. (2) Unearth a new connection between the two tasks. (3) Write a very entertaining, illuminating introduction about this. (4) Oh, also, provide a new \(\tilde{O})(2^{n^{1/3}})\)-time agnostic learning algorithm for conjunctions, improving the previous best known result after 18 years; use it to obtain a \(\tilde{O})(2^{k^{1/3}})\)-query tolerant tester for juntas using this new connection, thus showing a polynomial separation between adaptive and non-adaptive algorithms for this task. (5) You get this paper.

Distribution Testing Meets Sum Estimation (arXiv), by Pinki Pradhan and Sampriti Roy. In the sum estimation problem, there is a set of \(n\) elements, each with a non-negative weight, and the goal is to estimate the total weight \(W\) while minimizing the number of weight queried. Under a model which allows both weighted and uniform sampling, this is known to be achievable with \(O(n^{1/3})\) queries (Beretta and Tětek). This paper considers the task under two additional assumptions: first, the weights are non-increasing, and second, the algorithm is allowed conditional weighted and uniform sampling (i.e., the same two types of sampling, but conditioned on any subset \(S\) of its choosing). In this setting, the authors show how to estimate the total weight to \(1\pm\varepsilon\) with only \(O((\log n)\text{poly}(1/\varepsilon))\) queries.

Efficient witnessing and testing of magic in mixed quantum states (arXiv), by Tobias Haug, and Poetri Sonya Tarabunga. Magic is real, and it’s quantifiable: roughly speaking, it quantifies the amount or “non-stabilizerness” of a state. I won’t pretend to fully understand what this means, but this paper shows that one can test the magic of low-entropy \(n\)-qubit states (i.e., distinguish between low magic states and high-magic states) with only polynomially (in \(n\)) many copies of the state.

Mildly-Interacting Fermionic Unitaries are Efficiently Learnable (arXiv), by Vishnu Iyer. More quantum property testing! In this paper, the author shows how to test whether an \(n\)-mode fermionic unitary has Gaussian dimension at least \(k\) (or is \(\varepsilon\)-far from it in Frobenius norm) in time \(\text{poly}(n, 1/\varepsilon)\). (This is then used as a building block to efficiently learn such unitaries.)

Testing Juntas and Junta Subclasses with Relative Error (arXiv), by Xi Chen, William Pires, Toniann Pitassi, and Rocco Servedio. In this relatively (!) new model of property testing, which we covered last October, the notion of farness between two Boolean functions is relative to their number of satisfying assignments. Testing with relative error is at least as hard as in the standard setting, and could be strictly harder: this paper shows that, for the case of testing juntas, this is not the case. Even with relative error testing, \(\tilde{O}(k/\varepsilon)\) queries are necessary and sufficient for junta testing! Using ideas from “standard testing” (specifically, from the “testing by implicit learning” framework), their results further extend to testing a large number of subclasses of juntas.

Relative-error testing of conjunctions and decision lists (arXiv), by Xi Chen, William Pires, Toniann Pitassi, and Rocco Servedio. Same team, same model — more results! After testing juntas in the relative error model, the authors continue their systematic exploration of the testing questions, this time focusing on testing decision lists and conjunctions. They are able to obtain a \(\tilde{O}(1/\varepsilon)\)-tester with two-sided error for the former, and an \(O(1/\varepsilon)\)-tester with one-sided error for the latter: both matching the best-known query complexity in the standard model.

News for March 2025

After a good spell last month, the community saw no reason to relent. This month, we had five papers. From tolerant testers for fundamentally important graph problems, to privately testing distributions, to more explorations in the conditional sampling model, to counting and sampling motifs. Also included as a bonus is a result on mulitpass streaming lower bounds for the approximating max-cut.

A Tolerant Independent Set Tester (arXiv) (by Cameron Seth) Consider the setting of testing graph properties in the dense graph model where you have access to the the adjacency matrix of the graph and you may check whether a pair \((i,j)\) is connected by an edge or not. In this model, consider the following task: I want you to design a sublinear time procedure which on input a dense graph, runs in time \(poly(\rho/\varepsilon)\) and returns YES if the graph is \(\varepsilon_1 = \widetilde{\Omega}(\varepsilon)\)-close to having an independent set of size at least \(\rho n\) and returns NO if the graph is \(\varepsilon_2 = \varepsilon\)-far from all graphs that have an independent set of size at least \(\rho n\). The verdict is required in both cases to hold with probability at least \(2/3\). Alright, that’s what the title pretty much gave away already. So, how does the paper establish this result?

To this end, the paper uses a remarkably simple algorithm. You can pick a random subset \(S \subseteq V\) of size \(s = \widetilde{O}(\rho^3/\varepsilon^2)\) vertices and foucs on counting how many edges you see in various subsets of \(S\) all of size \(\rho s\). If you see some subset with size \(\rho s\) which induces \(\varepsilon_1 s^2\) edges, you accept \(G\). If this condition fails, you reject \(G\). The analysis presented in the paper is intricate and proceeds by proving a new container lemma for sparse graphs (rather than independent sets). It is a good read and you should totally take a look at this paper.

Better Private Distribution Testing by Leveraging Unverified Auxiliary Data (arXiv) (by Maryam Aliakbarpour, Arnav Burudgunte, Clément Canonne (our own!), and Ronitt Rubinfeld) If our January post is fresh in your mind, you might recall the theme: assuming you know a little about the distribution \(\mathbf{p}\) presented to you allows you give improved algorithms for classic distribution testing tasks such as identity testing and uniformity testing (I am assuming support of my distributions is \([n]\)). This paper considers these problems in with the setting of differential privacy. So, you want to design private algorithms for these classic tasks where you get to assume that you know something about the target distribution — that is, you assume it is not completely unknown just like we had in our January posting. The paper presents sample efficient private algorithms both for identity testing and closeness testing.

Optimal mass estimation in the conditional sampling model (arXiv) (by Tomer Adar, Eldar Fischer, and Amit Levi) As the title suggests, this paper considers the task of estimating the probability mass of all elements in the support of a probability distribution sitting on a finite universe \(\mathcal{U}\). Of course, you have to assume stronger oracle access for this task and the oracle people assume access to is the conditional sampling oracle. You get to specify a subset \(S \subseteq \mathcal{U}\) and you will get a sample from \(S\) with the correct conditional probability of element being in \(S\) relative to the underlying distribution. Meel-Kumar-Pote gave an algorithm for this task which requests \(poly(\log n)/poly(\varepsilon)\) samples. The featured paper, on the other hand, presents algorithms for this problem which request only \(O(\log(\log n)/poly(\varepsilon))\) samples. The authors also show a mathcing lower bound effectively closing this problem.

Approximately Counting and Sampling Hamiltonian Motifs in Sublinear Time (arXiv) (by Talya Eden, Reut Levi, Dana Ron, and Ronitt Rubinfeld) Suppose I give you a large graph \(G = (V,E)\) and a small subgraph \(H\) whose number of occurences in \(G\) I wish to (approximately) count. You may assume that you have access to an oracle which can give you a random vertex, the \(i\)-th neighbor of your favorite vertex, degree of a vertex, and can also answer whether a particular pair of vertices is connected by an edge. This is the standard model we use for coming up with sublinear time algorithms for graph problems. It is contrasted sometimes with the augmented model where in addition to all of the above, you also get to assume access to an oracle which can give you uniformly random edges in the graph \(G\). As you can imagine, the augmented model allows you to count the number of occurences of \(H\) for a much richer class of small subgraphs \(H\). Indeed, this used to be the status-quo till this paper. Indeed, before this result came out, we knew that in the augmented model, we can pretty much count the number of occurences of all small subgraphs \(H\). Whereas, in the standard model, we only knew how to count edges, stars and cliques. The featured paper makes progress towards this divide and presents algorithms for approximately counting the number of copies of \(H\) for a much richer class of small subgraphs \(H\). One thing you might want to try is to simulate the algorithms in the augmented model by hacking up algorithms for returning uniformly random edges. However, that approach is a no go as the running time to sample u.a.r edges can be prohibitive for a sublinear time implementation in the standard model. This paper develops ideas which bypass such annoyances.


Multi-Pass Streaming Lower Bounds for Approximating Max-Cut (arXiv) (by Yumou Fei, Dor Minzer, Shuo Wang) This result, as the title revealed is from the streaming literature. But it should be of interest to our readers and so we cover it here. The essential summary is this: If you want to get a better than \(1/2\) approximation algorithm for the max cut problem in the streaming setting — like a \(1/2 + \varepsilon\) approximation — be ready to pay up \(poly(n)\) passes. In more detail, the technical statement reads out as follows: Any randomized, \(k\)-pass \(1/2 + \varepsilon\) approximation algorithm for max-cut requires \(\Omega(n^{1/3}/k)\) space. The lower bound results in this paper are inspired by the communication complexity lower bound of a certain problem investigated in the seminal work of Kapralov-Krachun from 2019 where the authors showed that any single pass streaming algorithm for max-cut aiming to produce a better than \(1/2\) approximation must cough up \(\Omega(n)\) space. This requires a lot of care as in the multipass streaming setting, you cannot hope to show your communication complexity lower bounds by a naive application of discrepancy method. The delicate finesse needed is hijacked in by a clever use of hypercontractivity. Looks like this will be an interesting treat for our intrepid readers.