Author Archives: Akash

News for November 2025

Last month the community had a lot to say as a result of which we have a rare treat featuring eighteen papers. Results ranging from (almost) tight lower bounds for testing boolean monotonicity to low rank approximation of Hankel Matrices, to compressed sensing, to PCPs, to results about testing and learning convex functions, a trio of papers with Oded Goldreich as (co)-author, there are many many many more papers covered in this month’s potpourri. Featured in this spectacular collection of papers, we have a rare treat which seeks to understand in this age of LLMs, how Mathematical Explorations and Discoveries work at scale. Without further ado, let us examine our spread.

Boolean function monotonicity testing requires (almost) n^1/2 queries by Mark Chen, Xi Chen, Hao Cui, William Pires, Jonah Stockwell (arXiv) The problem of testing boolean monotonicity over the hypercube, \(\{0,1\}^n\), is no stranger to regular PTReview readers. Finally, we have a result which puts this venerable to rest by proving an adaptive two-sided lower bound of \(\Omega(N^{1/2-c})\) queries for testing boolean monotonicity over the boolean hypercube. The authors introduce what they call are multilevel Talagrand Functions. Before I detail this paper a little more, let me just say that right off the bat, the paper also uses these functions to obtain a nearly linear in \(n\) two-sided lower bound for testing unateness. Alright, let me say a little about multilevel Talagrand Functions. Our starting point comes from one story we covered in News for November 2015 where we covered a result by Belovs-Blais where they proved a polynomial in \(n\) two-sided lowerbound for the same problem. As mentioned in that post, the key insight of Belovs and Blais was to work with Talagrand’s random DNF. Perturbations to this DNF are sufficiently non-monotone and were shown to require \(\Omega(n^{1/4})\) queries to test. Our News for February 2017 post covered the paper by Chen-Waingarten-Xie which showed that one could adaptively test the Belovs-Blais instance with a budget of \(O(n^{1/4})\) queries and then went on to present an improved lower bound of \(\Omega(n^{1/3})\) queries by defining what they called two-level Talagrand functions. The featured paper presents an intricate way to define the multilevel extensions of these Talagrand functions which are then used to squeeze out an almost \(\Omega(\sqrt n)\) lower bound.

Low-soundness direct-product testers and PCPs from Kaufman–Oppenheim complexes by Ryan O’Donnell and Noah Singer (arXiv) This paper shows an agreement test for direct product testing with two provers in the low soundness regime. Such results were previously known using high dimensional complexes, constructed using deep facts form number theory. This paper proves that a known simpler and strongly explicit complex (KO complex) can be used instead to design the test. Although the methods are elementary, the proof is long and somewhat involved. The featured paper shows a new expansion property of KO complex in the required regime for building the test with low soundness. Additionally, this paper also confirms that their complex can be used to build an alternate efficient PCP with only \(poly(\log n)\) blow up. Since the methods are elementary, it is more accessible. And, perhaps, we will have easier time to optimize the parameters to build more efficient PCPs.

A sufficient condition for characterizing the one-sided testable properties of families of graphs in the Random Neighbour Oracle Model by Christine Awofeso, Patrick Greaves, Oded Lachish, Amit Levi, Felix Reidl (arXiv) The story of this paper begins from a 2019 paper by Czumaj and Sohler which we covered in our September 2019 news. In that paper, Czumaj-Sohler gave algorithms efficient property testers for testing \(H\)-freeness when the input graph is known to be \(K_r\)-minor-free in the random neighbor query model. The algorithm presented in that result recursively explores a few random vertices of the current vertex (in particular, you can think of the algorithm as a truncated BFS with bounded recursion depth). The featured paper considers graph families parameterized by a certain parameter (in particular, it considers the class of graphs with universally bounded “\(r\)-admissibility”) . The key result of the paper is a theorem which asserts that restricted to the class of graphs with bounded \(r\)-admissibility, for every fixed finite graph \(H\), \(H\)-freeness is testable with one-sided error (assuming access to a random neighbor oracle).

Testing H-freeness on sparse graphs, the case of bounded expansion by Samuel Humeau, Mamadou Moustapha Kanté, Daniel Mock, Timothé Picavet, Alexandre Vigny (arXiv) This paper continues the thread picked up by the former result. The algorithm presented in Czumaj-Sohler result recursively explores a few random vertices of the current vertex (in particular, you can think of the algorithm as a truncated BFS with bounded recursion depth). The proof presented in Czumaj-Sohler was fairly intricate. The featured paper uses ideas rooted in sparsity literature and shows how these ideas can be used to efficiently test \(H\)-freeness even when the input graph is only promised to come from a polynomially bounded-expansion graph class.

Sublinear Time Low-Rank Approximation of Hankel Matrices by Michael Kapralov, Cameron Musco, Kshiteej Sheth (arXiv) Our November 2022 News featured a paper which obtained low-rank approximation of Toeplitz matrices (which remarkably, were still Toeplitz!) in sublinear time. In this paper, a subset of authors carries the exploration forward and considers the following task: Suppose you are given as input a matrix obtained by noising a PSD Hankel matrix of order \(n\) and a parameter \(\varepsilon > 0\). The feature paper presents algorithms which run in time \(poly(\log n, 1/\varepsilon)\) and returns a bonafide Hankel matrix \(\widehat{H}\) with \(rank(\widehat{H}) \approx poly(\log n)\) which is fairly close to the unknown Hankel Matrix which we started with (which was noised to produce the original input).

Support Recovery in One-bit Compressed Sensing with Near-Optimal Measurements and Sublinear Time by Xiaxin Li, Arya Mazumdar (arXiv) Let us begin by reviewing the setup for the One-Bit Compressed Sensing Problem. You want to recover an unknown signal which is modeled as a vector \(\boldsymbol{x} \in \mathbb{R}^n\) which we further assume is \(k\)-sparse. To help you out, I give you a Sign of Dot-Product machine which on input your favorite vector \(\boldsymbol{a} \in \mathbb{R}^n\) tells you \(sign(\boldsymbol{a}^T\boldsymbol{x})\). So, in all you can look at the quantity \(\boldsymbol{y} = sign(\boldsymbol{A}\boldsymbol{x})\) where \(\boldsymbol{A} \in \mathbb{R}^{m \times n}\). I want to keep \(m\) as small as possible. Another related goal seeks to recover just the support of the unknown \(k\)-sparse signal. The featured paper presents efficient algorithms that return a sparse vector \(\widehat{\boldsymbol{x}}\) whose support is a great proxy for the support of our unknown signal (in the sense they have very small symmetric difference).

Homomorphism Testing with Resilience to Online Manipulations by Esty Kelman, Uri Meir, Debanuj Nayak, Sofya Raskhodnikova (arXiv) Let us consider the task of testing properties of functions \(f \colon G \to H\) where \((G, +)\) and \((H, \oplus)\) are groups. A lot of classic results in this area (like the linearity test of BLR) assume that we have access to a reliable oracle which when given any \(x \in G\) returns \(f(x) \in H\). This assumption is no longer valid if the oracle is unreliable or worse adversarial. The featured paper considers the situation where the oracle is adversarial and can in fact manipulate the responses (like replacing \(f(x)\) by some value \(y \in H \cup \{\bot\}\)). Moreover, the power of adversary is parameterized by a number \(t\). One of the results in this paper shows the following: Take \(f \colon G \to H\) and fix some $\varepsilon > 0$. Then one can test whether \(f\) is a group homomorphism where the adversary can manipulate \(t\) values after every query is answered using a budget of \(O(1/\varepsilon + \log_2t)\) queries.

 Computational Complexity in Property Testing by Renato Ferreira Pinto Jr., Diptaksho Palit, Sofya Raskhodnikova (arXiv) This paper studies the tradeoff between the query complexity and the time complexity aspects of property testing tasks. Most of the research works in property testing have focused on obtaining testers with better and better query bounds. The present work explores the query/time interplay in property testing and develops tools to prove computational hardness of property testers. The paper proves an unconditional and a conditional hierarchy theorem. The first theorem, for instance, asserts that there are property testing problems which witness a strict separation between the query complexity of the task and the running time required to solve the task. The second theorem asserts that assuming SETH, you can separation between the query bounds and the running time bounds can be improved.


Interactive proof systems for FARNESS by Oded Goldreich Tal Herman Guy N. Rothblum (ECCC) This paper continues a line of work on interactive proofs of proximity for distribution testing problems, building on earlier work of Chiesa and Gur (see our October 2017 news). The central question here is whether interaction can reduce the sample complexity barriers that appear unavoidable in standard (non-interactive) distribution testing. The paper answers this in the affirmative for natural promise problems such as deciding whether two distributions are identical (the NO case in the interactive setup) or \(\varepsilon\)-far (the YES case in the interactive setup), as well as testing whether a distribution is far from uniform (the far case being the YES case in the interactive setup yet again). The authors construct explicit interactive proof systems in which the verifier uses asymptotically fewer samples than required by the best testers, while simultaneously the honest prover uses asymptotically fewer samples than would be needed to learn the distribution. The protocols rely on collision-based ideas and admit clean tradeoffs between the verifier’s and prover’s sample complexities. Conceptually, the result shows that interaction can beat both testing and learning bounds at the same time for concrete, well-studied distribution testing problems.

On doubly-sublinear interactive proofs for distributions by Oded Goldreich Guy N. Rothblum (ECCC) As I understand it, this paper takes a step back from specific testing problems and asks how general the phenomenon exhibited above really is. So let’s try to place the context for this paper with the preceding paper in mind. Motivated with the interactive protocols presented for FARNESS, the authors introduce the notion of doubly-sublinear interactive proofs of proximity (ds-IPPs), where the verifier’s sample complexity is asymptotically smaller than testing complexity and the honest prover’s sample complexity is asymptotically smaller than learning complexity. The main contribution is conceptual rather than algorithmic: the paper shows that ds-IPPs do exist, by constructing distribution properties (algebraically defined and somewhat artificial) that admit such protocols. Together with the above paper, this work shows that interaction changes the game in the sense that the situation with sample complexity looks different with interaction than without.

Proving the PCP Theorem with 1.5 proof compositions (or yet another PCP construction) by Oded Goldreich (ECCC) This paper revisits the classic PCP Theorem and offers a new PCP construction of constant query complexity and sublinear randomness complexity \(n^{\alpha}\) for any constant \(\alpha > 0\). The traditional Arora–Lund–Motwani–Sudan–Szegedy proof uses two generic proof compositions (Reed–Muller with itself, then with a Hadamard-based PCP). Inspired from a recent work of Amireddy et al. which constructs a PCP (without composition) with constant query complexity and randomness complexity \(n^{\alpha}\), this work approaches the same challenge. It combines a Reed-Muller based PCP with Hadamard encoding in a novel way. By working with a constant number of dimensions and a large alphabet, then encoding field elements via Hadamard codes and emulating key tests (e.g., low-degree and sum-check), the paper achieves a PCP system that can serve as an inner verifier without full generic composition. In all, this paper presents another route to the PCP Theorem with constant queries and \(n^{\alpha}\) randomness.

Interactive Proofs For Distribution Testing With Conditional Oracles by Ari Biswas, Mark Bun, (Our Own) Clement Canonne, Satchit Sivakumar (arXiv) Another paper whose roots lie in the seminal work of Chiesa and Gur. This paper poses the following twist in distribution testing (on top of the Chiesa-Gur twist):

Can label-invariant properties be verified in a (query, sample and communication)-efficient way when the tester has access to a PCOND oracle.

The paper proves that there exist label invariant properties of distributions sample complexity of testing which is not allayed with the PCOND oracle (that is, it remains the same as the sample complexity of the task without one). However, when interaction is allowed a different picture emerges. In particular, the paper presents a round protocol which can be used to interactively test a label-invariant distribution property (over a universe of size \(N\)) in the PCOND model with a query complexity that grows as \(poly(\log N)\).

Verification of Statistical Properties: Redefining the Possible by (Again, our own) Clement Canonne, Sam Polgar, Aditya Singh, Aravind Thyagarajan, Qiping Yang (ECCC) Let us again start our story from the Chiesa-Gur paper. So, we have an unreliable prover and a verifier where the verifier is interested in testing some label-invariant properties of a distribution and is allowed to chit-chat with the prover. One problematic aspect which prevents this construct from being applicable to some practical scenarios is that there are cursed properties (eg, things as basic as UNIFORMITY testing), testing which requires the verifier to hold \(\Omega(\sqrt n)\) items where \(n\) denotes the support size of the distribution whose properties we want to test. Undeterred by this news, the featured paper contemplates how to evade this curse. The eye-opening realization is to consider these tasks with the roles of completeness and soundness flipped — eg akin to the “Interactive proofs of farness” paper above, you think your goal is like testing FARNESS from UNIFORMITY. The paper shows this task indeed admits a protocol with sample complexity being \(O(1)\). The paper also considers the multiplayer and the zero-knowledge variations of these tasks and looks like a nice conceptual read for the PTReview readers.

Learning and Testing Convex Functions by Renato Ferreira Pinto Jr., Cassandra Marcussen, Elchanan Mossel, Shivam Nadimpalli (arXiv) The paper studies functions \(f \colon \mathbb{R}^d \to \mathbb{R}\) under Gaussian measure and formalizes convexity testing as distinguishing convex functions from those that are \(\varepsilon\)-far in \(L^2\) equipped with the Gaussian Measure. A central theorem shows that if \(f\) is Lipschitz (or satisfies comparable regularity), then convexity can be tested with sample complexity \(n^{poly(1/\varepsilon^2)}\). Conversely, the paper supplements these results with a sample complexity lower bound in a natural query model.

Halfspaces Are Hard to Test with Relative Error by Xi Chen, Anindya De, Yizhi Huang, Shivam Nadimpalli, Rocco A. Servedio, Tianqi Yang (arXiv) This paper considers property testing of Boolean halfspaces in the relative-error model (which we covered earlier, eg see here), where distance between a pair of functions is defined as the fractional symmetric difference between the supports of the two functions. We assume that algorithms have access to a uniformly random element in the support of the input function. Alright, with these details behind us, let us state the main result of this paper. It asserts that for some constant \(\varepsilon_0 > 0\), any relative-error ​\(\varepsilon_0\)-tester for halfspaces over \(\{0,1\}^n\) must take \(\Omega(\log n)\) samples, showing a super-constant lower bound in this model. This result sharply contrasts with classic absolute-error testing results, where halfspaces admit constant-query testers, and thus demonstrates that changing the relative-error model differs significantly from the classical model.

Testing Noisy Low-Degree Polynomials for Sparsity by Yiqiao Bao, Anindya De, Shivam Nadimpalli, Rocco A. Servedio, Nathan White (arXiv) Let us start from the setup considered in the Chen-De-Servedio paper we covered in our November 2019 post. In their setup, you are given samples from a noisy linear model \(\boldsymbol{y} = \mathbf{w \cdot x} + \texttt{Noise}\). On the other hand, this paper considers a generalized situation where you are given \(\boldsymbol{y} = \mathbf{p(x)}+\texttt{Noise}\). You are interested in knowing whether \(\mathbf{p}\) is \(s\)-sparse or far-from \(s\)-sparse. The main result gives explicit conditions under which sparsity can be tested with a constant number of samples, depending on the degree, noise rate, and sparsity parameter \(s\). Outside this regime, the authors prove logarithmic lower bounds, showing that once noise overwhelms certain structural signals, sparsity testing becomes nearly as hard as learning the polynomial itself.

Efficient Testing Implies Structured Symmetry by Cynthia Dwork, Pranay Tankala (arXiv) This paper addresses a structural question in property testing: which Boolean properties admit testers with small sample complexity and efficient computation? The paper shows that if a property \(P \subseteq \{0,1\}^n\) is testable using (i) few samples and (ii) a circuit of small size, then \(P\) must be close to a property \(\mathcal{Q}\) which enjoy some kind of stuctured symmetry. In other words, efficient testability forces the property to exhibit a form of low-complexity symmetry. Conversely, the paper shows that properties with such structured symmetry admit efficient testers.

Mathematical exploration and discovery at scale by Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, Adam Zsolt Wagner (arXiv) This paper asks, fairly openly (see Section 1.2), what it might even mean for modern AI systems to participate in mathematical discovery. The authors place their work in the long lineage of computer-assisted mathematics, from early symbolic systems to contemporary automated reasoning, but are careful to frame large language models not as theorem-provers or replacements for mathematicians, but as exploratory engines: components inside a closed-loop pipeline where proposal, evaluation, and refinement happen repeatedly and at scale. Concretely, the paper studies a setup in which LLM-based code generation is coupled with automated evaluators, allowing the system to search large spaces of mathematical constructions and algorithms with minimal human intervention. Rather than optimizing for a single headline result, the authors emphasize breadth, reporting on 44 separate “adventures” spanning combinatorics, number theory, analysis, and even Olympiad-style geometry. In many cases, the system rediscovers known constructions or matches best-known bounds; in others, it produces plausible variants or refinements that look less like finished theorems and more like the kind of halfway-formed ideas one might jot down while exploring a problem for the first time. Quoting from the paper:

“One illustrative example comes from experiments related to the Erdős discrepancy problem. First, when the system was run with no human guidance, it found a sequence of length 200 before progress started to slow down. Next, when the prompt was augmented with the advice to try a function which is multiplicative, or approximately multiplicative, the system performed significantly better and found constructions of length 380 in the same amount of time. These constructions were still far from the optimal value of 1160, and the authors explicitly note that other hints (for example, suggesting the use of SAT solvers) might have improved the score further, but were not explored due to time limitations.”

The broader point of the paper is less about getting optimal bounds and more about scale and exploration. The results are empirical and depend on carefully chosen experimental settings, but the diversity of examples makes this an engaging and unusually readable account of what happens when machines are allowed to roam around mathematical landscapes with a long leash.

News for August 2025

Our press release this month features seven papers. Let’s take a look: a survey on efficiently testable graph properties; two papers on new directions in graph property testing; one that, after some meditations on distribution testing, introduces a novel direction; another on testing properties of strings; and two papers exploring property testing with a quantum twist. Without further delay, let us analyze our spread.

Polynomial Property Testing by Lior Gishboliner and Asaf Shapira (arXiv) This survey traces the development of property-testers for dense graph properties through the ages. The motivating question echoes throughout the survey: what is it that makes a dense graph property efficiently testable? Noting that any tester for your favorite property tester can be simulated using a canonical tester with at most a quadratic overhead in the number of queries, the survey focuses on properties which admit canonical testers that make at most \(poly(1/\varepsilon)\) queries. This part of the story is narrated in Section 3 of the paper. This section begins by reviewing the seminal result of Alon from 2002 which kicked off research in this area which asserts that a property of \(H\)-freeness can be tested efficiently \(\textit{iff}\) \(H\) is bipartite. Earlier this year, Gishboliner and Shapira extended this result to testing the property of \(H\)-freeness in \(k\)-uniform-hypergraphs. This result asserts that such a property is efficiently testable \(\textit{iff}\) \(H\) is \(k\)-partite. The survey then presents a result which extends this result to the setting of induced subgraphs. In particular, this result characterizes for which graphs \(H\), is the task of testing \(H\)-induced-subgraph freeness efficiently testable. The section concludes with a discussion of testing graph properties of bounded \(\textit{VC}\)-dimension.

Quality control in sublinear time: a case study via random graphs by Cassandra Marcussen, Ronitt Rubinfeld, and Madhu Sudan (arXiv) This paper introduces a general framework in which you can situate the following kinds of questions: So suppose you are given a probability parameter \(p\) and a value \(k \in \mathbb{N}\). Additionally, suppose I give you a graph \(G = (V,E)\) and I ask — is it true that the quantity

\(\rho_K(G) \colon = C_k(G)/\textbf{E}_{H \sim \mathcal{G}’_{n,p}}[C_k(G’)] \approx 1\)

or is it the case that

\(|\rho_K(G) – 1| > \varepsilon\)

(where the function \(C_k\) counts the number of \(k\)-sized cliques in \(G\)). The paper presents algorithms for this problem which issue at most \(\Big(\frac{1}{p}\Big)^{O(k)}\) queries.
Now, let me explain the broader framework this paper introduces. You want to think of \(\rho_k(G)\) as measuring the quality of your graph \(G\). You want to understand whether \(G\) is distributed as a genuine sample (that is, whether \(G\) is a “high-quality graph”) according to \(\mathcal{G}_{n,p}\) as measured by the function \(C_k\). In general, as the authors explain via an analogy, this is supposed to capture the following sentiment: you go to a bookstore and you want to buy a book with quality close to \(1\). The goal thus, is to choose a high-quality book. Note that this goal is weaker than both of the following goals: testing whether

\(\text{quality(book)} \approx 1\) for an adversarial book,

OR whether

\(\text{book} \sim \texttt{Some Distribution of your choice over books in store}\).

The authors situate quite a few other problems in this overarching framework and present algorithms for these.

Sublinear-Time Approximation for Graph Frequency Vectors in Hyperfinite Graphs by Gregory Moroie (arXiv) A useful first step for several algorithms on planar graphs of bounded degree is to exploit the fact that such graphs admit a hyperfinite decomposition. Using a primitive called the partition oracle, one can estimate in sublinear time various important graph parameters — like max-cut value, independent set size and so on to within a multiplicative \($(1 \pm \varepsilon)\). Often, a helpful step in these algorithms proceeds via a graph simplification step where one approximates the input graph with a crude sketch which tracks frequency counts of various subgraphs on the pieces produced by the partition oracle each of which have size at most \(k = O(1/\varepsilon^2)\). The featured paper gives algorithms which find a graph \(H\) in time \(poly(1/\varepsilon, d^k)\) which can be used to approximate the corresponding frequency vector in \(G\).

Instance-Optimal Uniformity Testing and Tracking by Guy Blanc, Clément L. Canonne, and Erik Waingarten (arXiv) After some meditations on the task of testing uniformity of distributions with support \([n]\), the featured paper suggests that despite the impressive algorithmic success on this task, there is a lot that one still desires. In particular, one of the fundamental reasons for dissatisfaction with the sample-optimal results for the above task, is that often in real-life situations, the only thing we are interested is any evidence of non-uniformity — without any distance notion in the picture. So, the paper formalizes the following question: Suppose we have a setup where at each time step \(t \in \mathbb{N}\), we receive as input a sample \(x_t \sim \boldsymbol{p}\) where \(\boldsymbol{p}\) is an unknown probability distribution supported on \([n]\). We want our algorithm to run continuously and return reject if at any point in time, the algorithm discovers \(\boldsymbol{p} \neq \texttt{UNIF}[n]\). The goal is to do this in as less time as possible. To rule out trivial algorithms which straight-up reject \(\boldsymbol{p}\), we require that the probability that the algorihtm EVER rejects \(\texttt{UNIF}[n]\) itself is at most some sufficiently small \(\delta > 0\). Denoting by \(OPT(\boldsymbol{p})\) the sample complexity of the best possible algorithm for this task which knows the entire distribution profile \(\boldsymbol{p}\) in hindsight. The featured paper devises an algorithm for this problem which is \(poly(\log (OPT))\)-competitive against this best algorithm.

Counting Distinct Square Substrings in Sublinear Time by Panagiotis Charalampopoulos, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba (arXiv) A square in a string is a substring of the form \(XX\), where \(X\) is a nonempty sequence of characters. The featured paper studies how to count the number of distinct square substrings in a string of length \(n\) over an alphabet of size \(\sigma\), given in packed form. The authors design a sublinear-time algorithm running in \(O(n/\log_\sigma n)\) in the word-RAM model by exploiting the structure of runs (maximal periodic fragments). To handle these runs systematically, the paper uses sparse Lyndon roots — where a Lyndon root is essentially the lexicographically smallest rotation of a repeated block — so that each repetition has a unique representative. The paper further groups certain overlapping runs into pyramidal structures to make counting efficient. This avoids scanning every position of the string and shows that distinct squares can be counted strictly faster than reading the entire string symbol by symbol, giving the first sublinear-time algorithm for this problem.

Resource-efficient algorithm for estimating the trace of quantum state powers by Myeongjin Shin, Junseo Lee, Seungwoo Lee, and Kabgyun Jeong (arXiv) The problem of estimating the trace of quantum state powers refers to computing \(\text{Tr}(\rho^k)\), where \(\rho\) is a quantum state (density matrix) and \(k\) is an integer — this quantity is key for tasks such as evaluating nonlinear functions of quantum states, Rényi entropy, and detecting entanglement. The featured paper presents a resource-efficient algorithm that reduces the quantum hardware requirements by using only \(O(\widetilde r)\) qubits and \(O(\widetilde r)\) multi-qubit gates, where \(\widetilde{r} = \min\Big({\text{rank}(\rho), \lceil \ln(2k / \varepsilon)\rceil}\Big)\). The featured paper builds on tools from mathematical foundations of entaglement spectroscopy (something called the Newton-Girard method): it estimates the traces of low powers \(\text{Tr}(\rho^i)\) for \(i = 1, \dots, \widetilde r\) using shallow quantum circuits and then uses classical recursion to reconstruct or approximate \(\text{Tr}(\rho^k)\) even when \(k\) is large — achieving rank-dependent complexity that works well for low-rank states.

Adversarially robust quantum state learning and testing by Maryam Aliakbarpour, Vladimir Braverman, Nai-Hui Chia, and Yuhan Liu (arXiv) The featured paper studies quantum state learning under a strong corruption model called \(\gamma\)-adversarial corruption, where an adversary can arbitrarily modify a \(\gamma\)-fraction of measurement outcomes, going beyond the usual SPAM noise assumptions. The authors give a non-adaptive measurement algorithm that learns a rank-\(r\) state up to trace-distance error \(\widetilde O(\gamma \sqrt{r})\), and also proves a matching lower bound \(\Omega(\gamma \sqrt{r})\), showing the guarantee is optimal. As the abstract notes (and piques your curiosity from the get-go), the results are at once optimistic and pessimistic: for constant-rank states, the error bound is dimension-independent, so adversarial corruption can be tolerated without dependence on the Hilbert space dimension; but for general full-rank states, the error necessarily scales as \(\gamma \sqrt{d}\), which means even a vanishingly small corruption rate, around \(1/\sqrt{d}\), is enough to destroy any non-adaptive learning algorithm.

The paper underscores that dimension-independent error is inevitable in this adversarial setting because single-copy measurements carry very limited information, so any algorithm must amplify small statistical differences to extract the state. This sensitivity leaves the estimation process inherently vulnerable to adversarially introduced errors unless the state has low rank, in which case the amount of information required to recover the state is substantially smaller.

News for March 2025

After a good spell last month, the community saw no reason to relent. This month, we had five papers. From tolerant testers for fundamentally important graph problems, to privately testing distributions, to more explorations in the conditional sampling model, to counting and sampling motifs. Also included as a bonus is a result on mulitpass streaming lower bounds for the approximating max-cut.

A Tolerant Independent Set Tester (arXiv) (by Cameron Seth) Consider the setting of testing graph properties in the dense graph model where you have access to the the adjacency matrix of the graph and you may check whether a pair \((i,j)\) is connected by an edge or not. In this model, consider the following task: I want you to design a sublinear time procedure which on input a dense graph, runs in time \(poly(\rho/\varepsilon)\) and returns YES if the graph is \(\varepsilon_1 = \widetilde{\Omega}(\varepsilon)\)-close to having an independent set of size at least \(\rho n\) and returns NO if the graph is \(\varepsilon_2 = \varepsilon\)-far from all graphs that have an independent set of size at least \(\rho n\). The verdict is required in both cases to hold with probability at least \(2/3\). Alright, that’s what the title pretty much gave away already. So, how does the paper establish this result?

To this end, the paper uses a remarkably simple algorithm. You can pick a random subset \(S \subseteq V\) of size \(s = \widetilde{O}(\rho^3/\varepsilon^2)\) vertices and foucs on counting how many edges you see in various subsets of \(S\) all of size \(\rho s\). If you see some subset with size \(\rho s\) which induces \(\varepsilon_1 s^2\) edges, you accept \(G\). If this condition fails, you reject \(G\). The analysis presented in the paper is intricate and proceeds by proving a new container lemma for sparse graphs (rather than independent sets). It is a good read and you should totally take a look at this paper.

Better Private Distribution Testing by Leveraging Unverified Auxiliary Data (arXiv) (by Maryam Aliakbarpour, Arnav Burudgunte, Clément Canonne (our own!), and Ronitt Rubinfeld) If our January post is fresh in your mind, you might recall the theme: assuming you know a little about the distribution \(\mathbf{p}\) presented to you allows you give improved algorithms for classic distribution testing tasks such as identity testing and uniformity testing (I am assuming support of my distributions is \([n]\)). This paper considers these problems in with the setting of differential privacy. So, you want to design private algorithms for these classic tasks where you get to assume that you know something about the target distribution — that is, you assume it is not completely unknown just like we had in our January posting. The paper presents sample efficient private algorithms both for identity testing and closeness testing.

Optimal mass estimation in the conditional sampling model (arXiv) (by Tomer Adar, Eldar Fischer, and Amit Levi) As the title suggests, this paper considers the task of estimating the probability mass of all elements in the support of a probability distribution sitting on a finite universe \(\mathcal{U}\). Of course, you have to assume stronger oracle access for this task and the oracle people assume access to is the conditional sampling oracle. You get to specify a subset \(S \subseteq \mathcal{U}\) and you will get a sample from \(S\) with the correct conditional probability of element being in \(S\) relative to the underlying distribution. Meel-Kumar-Pote gave an algorithm for this task which requests \(poly(\log n)/poly(\varepsilon)\) samples. The featured paper, on the other hand, presents algorithms for this problem which request only \(O(\log(\log n)/poly(\varepsilon))\) samples. The authors also show a mathcing lower bound effectively closing this problem.

Approximately Counting and Sampling Hamiltonian Motifs in Sublinear Time (arXiv) (by Talya Eden, Reut Levi, Dana Ron, and Ronitt Rubinfeld) Suppose I give you a large graph \(G = (V,E)\) and a small subgraph \(H\) whose number of occurences in \(G\) I wish to (approximately) count. You may assume that you have access to an oracle which can give you a random vertex, the \(i\)-th neighbor of your favorite vertex, degree of a vertex, and can also answer whether a particular pair of vertices is connected by an edge. This is the standard model we use for coming up with sublinear time algorithms for graph problems. It is contrasted sometimes with the augmented model where in addition to all of the above, you also get to assume access to an oracle which can give you uniformly random edges in the graph \(G\). As you can imagine, the augmented model allows you to count the number of occurences of \(H\) for a much richer class of small subgraphs \(H\). Indeed, this used to be the status-quo till this paper. Indeed, before this result came out, we knew that in the augmented model, we can pretty much count the number of occurences of all small subgraphs \(H\). Whereas, in the standard model, we only knew how to count edges, stars and cliques. The featured paper makes progress towards this divide and presents algorithms for approximately counting the number of copies of \(H\) for a much richer class of small subgraphs \(H\). One thing you might want to try is to simulate the algorithms in the augmented model by hacking up algorithms for returning uniformly random edges. However, that approach is a no go as the running time to sample u.a.r edges can be prohibitive for a sublinear time implementation in the standard model. This paper develops ideas which bypass such annoyances.


Multi-Pass Streaming Lower Bounds for Approximating Max-Cut (arXiv) (by Yumou Fei, Dor Minzer, Shuo Wang) This result, as the title revealed is from the streaming literature. But it should be of interest to our readers and so we cover it here. The essential summary is this: If you want to get a better than \(1/2\) approximation algorithm for the max cut problem in the streaming setting — like a \(1/2 + \varepsilon\) approximation — be ready to pay up \(poly(n)\) passes. In more detail, the technical statement reads out as follows: Any randomized, \(k\)-pass \(1/2 + \varepsilon\) approximation algorithm for max-cut requires \(\Omega(n^{1/3}/k)\) space. The lower bound results in this paper are inspired by the communication complexity lower bound of a certain problem investigated in the seminal work of Kapralov-Krachun from 2019 where the authors showed that any single pass streaming algorithm for max-cut aiming to produce a better than \(1/2\) approximation must cough up \(\Omega(n)\) space. This requires a lot of care as in the multipass streaming setting, you cannot hope to show your communication complexity lower bounds by a naive application of discrepancy method. The delicate finesse needed is hijacked in by a clever use of hypercontractivity. Looks like this will be an interesting treat for our intrepid readers.

News for December 2024

Happy New Year to you and your loved ones! Welcome to the first post of 2025. This month, we feature four papers: one on property testing in the huge object model, two on distribution testing, and a fourth that, while not strictly a property testing paper, was too intriguing to ignore. The last paper recounts the resolution of a fascinating bet between Peter Sarnak and Noga Alon. With that teaser, let’s dive in!

Testing vs Estimation for Index-Invariant Properties in the Huge Object Model by Sourav Chakraborty, Eldar Fischer, Arijit Ghosh, Amit Levi, Gopinath Mishra, and Sayantan Sen (arXiv) Let me begin by reminding you the setup for distribution testing in the huge object model which was introduced by Goldreich and Ron. Here is a brief refresher of this model. Suppose you want to test whether a distribution \(\mathcal{D}\), supported over the Boolean hypercube, satisfies a certain property \(\mathcal{P}\). To do this, you sample a string \(x \sim \mathcal{D}\), where \(x\) is of length \(n\). Huge object model considers situations where \(n\) is very large and so you instead assume query access to the sampled strings. In this model, the distribution \(\mathcal{D}\) is considered \(\varepsilon\)-far from \(\mathcal{P}\) if the earthmover distance (EMD) between \(\mathcal{D}\) and the closest distribution satisfying \(\mathcal{P}\), measured with respect to the relative Hamming distance between bitstrings, is at least \(\varepsilon\). In our July 2022 post, we covered a paper by a subset of the authors which presented efficient testers in this model for index-invariant properties whose support had bounded VC dimension.

The main result of the featured paper shows the following. For an index invariant property, you can basically upgrade a plain tester to a tolerant tester. Thus, the ability to efficiently \(\varepsilon\)-test an index-invariant distribution property in the huge object model translates into an ability of being able to estimate distance from the property.

Optimal Algorithms for Augmented Testing of Discrete Distributions by Maryam Aliakbarpour, Piotr Indyk, Ronitt Rubinfeld, and Sandeep Silwal (arXiv) Let us take the standard setup of discrete distribution testing supported over \([n]\) with a slight twist. Suppose you can assume that the underlying distribution \(\boldsymbol{p}\) is not entirely unknown. As the paper argues, this might be a realistic assumption in distributions dealing with network traffic data or search engine queries. Among a couple more, the main result of this paper shows that indeed, with a good proxy \(\boldsymbol{p}’\) for the input distribution \(\boldsymbol{p}\) i.e., a situation where say \(\|\boldsymbol{p}’-\boldsymbol{u}\|_1 \gg \|\boldsymbol{p}’ – \boldsymbol{p}\|_1\) and \(\|\boldsymbol{p}’ – \boldsymbol{p}\|_1\) is small enough, you get testers for uniformity testing with sample complexity \(O(1)\). (Here, \(\boldsymbol{u}\) denotes the uniform distribution over \([n]\). In this framework, the authors also present algorithms with improved sample complexity for identity testing and closeness testing.

Settling the complexity of testing grainedness of distributions, and application to uniformity testing in the Huge Object model by Clement Canonne, Sayantan Sen, Qiping Yang (ECCC) A discrete distribution is called \(m\)-grained if all probabilities are integer multiples of \(1/m\). If you are a regular PTReview reader, you might recall our News for September 2021 which featured a paper by Goldreich and Ron which proved an \(\Omega(n^c)\) lower bound for testing \(m\)-grainedness for any \(c < 1\). Goldreich and Ron also conjectured that the true lower bound is actually \(\Omega(n/\log n)\) (when \(m = \Theta(n)\)). The current work resolves this conjecture settling the complexity of this problem.

Ramanujan Property and Edge Universality of Random Regular Graphs by Jiaoyang Huang, Theo Mckenzie, and Horng-Tzer Yau (arXiv) So, yes here is the (non-property testing paper) I wanted to tell you about. Let me start with the juicy bit, So, Peter Sarnak and Noga Alon had a bet about the following situation: Fix \(d \in \mathbb{N}\) and let us take a random \(d\)-regular graph. Sarnak conjectured that the probability this graph is in fact a Ramanujan expander goes to zero as \(n \to \infty\) whereas Alon conjectured that this probability tends to one as \(n \to \infty\). The featured paper shows that while this probability decreases with increasing \(n\), it approaches a limiting value which is around \(0.69\). You can watch this juicy bit here.

News for September 2024

So, we had a pretty fantastic September. Besides the fact that September saw eight papers, the Computer Science community also bagged two Nobel Prizes(!) — reactions to which are kind of mixed from what I see around me. Anyhow without further delay, let us circle back to property testing. So, eight papers, yes. Without further ado, let us look at our spread.

Public Coin Interactive Proofs for Label-Invariant Distribution Properties by Tal Herman (ECCC) Suppose you have an unknown distribution \(\mathcal{D}\) supported over \([N]\). Suppose I claim that this distribution has entropy \(\texttt{blah}\). You have sample access to \(\mathcal{D}\) and you want to check my claim. To this end, you decide to engage me, a suspicious shady prover, in an Interactive Protocol where you, the verifier is restricted to use public coin tosses. The main result of this paper asserts that you can do this with a mere \(\widetilde{O}(N^{2/3})\) samples. You also get the same bound on the communication complexity. What’s more is that you get algorithms with running time of the same order. What’s even more, is that you get similar algorithms for all label invariant properties of \(\mathcal{D}\). You can contrast this with the seminal result of Valiant and Valiant from 2011 which asserts that you can approximate the distance between your input distribution \(\mathcal{D}\) and the label invariant property of interest (without any prover) using \(\Theta(N/\log N)\) samples. So, this result shows that the computation under the interactive model is more efficient than standalone computation even in the public coin toss model.

How to Verify Any (Reasonable) Distribution Property: Computationally Sound Argument Systems for Distributions by Tal Herman and Guy Rothblum (arXiv) Let us consider the same setup as above. Again, you have an unknown distribution (supported over \([N]\)) which you have sample access to and I assert that this distribution has a certain property. You want to verify whether my claim is correct or bogus. The main algorithmic result of this paper has a cryptographic ring to it: Assuming the existence of collision resistant hash functions, the authors show that for any reasonable distribution property, you have set up an interactive protocol where the verifier can decide whether or not the prover’s claim is bogus using \(\widetilde{O}(\sqrt N)\) samples. Also, the communication complexity is of the same order.

Random local access for sampling k-SAT solutions by Dingding Dong and Nitya Mani (arXiv) I find this paper pretty intriguing. So, here is the setup. I give you some \(k\)-SAT instance and I promise that no variable in this instance shows up in more than \(d\) clauses. Recall that for \(d \leq 2^k/(2e k)\), Lovasz Local Lemma assures you that there exists a satisfying assignment. What’s more is that the famous Moser-Tardos algorithm even allows you to find one satisfying assignment in polynomial time. In a beautiful work, in the regime where \(d \leq 2^{ck}\) (where \(0 < c < 1\) is a sufficiently small constant), Moitra gave samplers which return an almost uniformly random satisfying assignment. Note that this is not possible direct adaptation the Moser-Tardos algorithm. Anyhow, back to the setting considered in this work. So, consider the following sublinear challenge. Denote by \(\mu\) the uniform distribution supported over all satisfying assignments to the given \(k\)-SAT instance where each variable shows up in more than \(d \leq 2^{ck}\) clauses. I want you to sample the assignment \(\sigma(v)\) for a variable \(v\) from the marginal \(\mu_v\) induced by \(\mu\) on \(v\) fast (in \(poly(\log n)\) time) in expectation. Of course this has to be done in some consistent fashion overall which is very LCAish in flavor and I will not detail that here. Rest assured the featured paper rises up to the challenge.

New Direct Sum Tests by Alek Westover, Edward Yu, Kai Zheng (arXiv) We say that a \(\mathbb{F}_2\)-valued function \(f\) over the \(d\)-dimensional grid \(f \colon [n]^d \to \mathbb{F}_2\) is a direct sum if there are \(d\) one-dimensional functions \(L_i \colon [n] \to \mathbb{F}_2\) such that \(f(x) = \sum_{i=1}^d L_i(x_i)\). In a paper we covered in October 2019, Dinur and Golubev presented an algorithm for direct sum testing and left its analysis for future research. The featured paper analyzes this test and shows that it is indeed a bonafide direct sum tester. I omit the description of the tester. I will just leave with one note — this looks like an appetizing read for Boolean Fourier Analysis aficionados.

Directed Hypercube Routing, a Generalized Lehman-Ron Theorem, and Monotonicity Testing by Deeparnab Chakrabarty and C. Seshadhri (arXiv) This is a very refreshing read with (our very own) Seshadhri as one of the authors. Chakrabarty and Seshadhri have teamed up (often in a trio with Hadley Black) in their attempts to understand the deep dark secrets of testing boolean monotonicity over the boolean hypercube. The featured paper conjectures a bold generalization of Lehman-Ron theorem over the hypercube and suggests that the conjectured routing once established could pave the path forward towards a more transparent understanding of directed isoperimetric inequalities over the boolean hypercube. In my opinion, it is worth your while to check this one out.

Lempel-Ziv (LZ77) Factorization in Sublinear Time by Dominik Kempa and Tomasz Kociumaka (arXiv) A pretty unique topic. The main result of this paper is what you see in the title — after 50 years since its introduction, finally we can factorize LZ77 in sublinear time. If you are like me, you might be asking but what does that mean? Quoting from the abstract

Lempel-Ziv (LZ77) factorization is a fundamental problem in string processing: Greedily partition a given string T from left to right into blocks (called phrases) so that each phrase is either the leftmost occurrence of a letter or the longest prefix of the unprocessed suffix that has another occurrence earlier in T.

Indeed, the abstract has more fascinating information about this problem. Quoting again.

Sublinear-time algorithms are known for nearly all other fundamental problems on
strings, but LZ77 seems resistant to all currently known techniques.

Looks like new year came early for PTReview readers. This paper promises to be fun by the contents I sampled so far.

Computing String Covers in Sublinear Time by Jakub Radoszewski and Wiktor Zuba (arXiv) A substring (actually prefix) \(C\) is called a cover of text \(T\) is \(T\) can be constructed by concatenations and superpositions of \(C\). Suppose the text string \(T\) contains \(n\) symbols from some alphabet of fixed size. The main theorem of the featured paper asserts that given a string \(T\), you can find a representation of all of its covers (at most \(n\) of them — they are all prefixes) in merely \(O(n/\log_{\sigma} n)\) time. This bound is also shown to be optimal — indeed the authors show that you cannot compute a representation (in a certain model formulated by Charalampopoulos et al in FOCS 2020) for the shortest cover in less than \(o(n/log n)\) time.

Quantum Channel Testing in Average-Case Distance by Gregory Rosenthal, Hugo Aaronson, Sathyawageeswar Subramanian, Animesh Datta and Tom Gur (arXiv) This paper considers a new diversion in quantum land. Namely, it considers the task of testing quantum channels. The setup is this: I have a \(d\)-dimensional quantum system which is just a \(\mathbb{C}^{d \times d}\) psd matrix over complex numbers with trace \(1\). A quantum channel is a just a linear transformation that maps \(d_{in}\) dimensional quantum systems to a \(d_{out}\)-dimensional quantum system. With repsect to some appropriate norm (the so-called diamond norm), one of the results in this paper proves a \(poly(d_{in})/\varepsilon\) lower bound for testing identity to any fixed channel in diamond distance. This lower bound is shown to hold in a very strong query model appropriate for the quantum setting.

News for July 2024

This month we have only one paper. I imagine the community will be on fire when we come back with our news for August.

Constant Degree Direct Product Testers with Small Soundness by Mitali Bafna, Noam Lifshitz, and Dor Minzer (arXiv) (Please excuse my lack of technical comfort with the contents of this paper. Corrections Welcome) As the title indicates, and the authors also emphasize, the primary goal of the featured paper is to construct direct product testers with constant degree. Let us try to unpack this a little by first understanding what does a direct product test mean. So I have this function \(f \colon [n] \to \{0,1\}\). I give you access to this function in an indirect way via a table \(F\) to which you have query access. The central task in direct product testing is to check whether \(F\) is a valid encoding of \(f\) by querying \(F\) on a small number of locations. This paper focuses on those properties which you can test with two queries. Dinur and Kaufman noted that high dimensional expanders can be leveraged towards obtaining \(2\)-query direct product testers. The main result of this paper shows that there are high dimensional expanders for which the Dinur-Kaufman direct product test has small soundness.

News for April 2024

We have seven papers for you this month. Our potpourri includes two papers apiece on each of the following themes: Distribution Testing, property testing with a quantum twist, graph property testing, and finally two papers on testing function properties. A featured paper this month covers progress on optimal non-adaptive junta testers. Without further ado, let’s dig in. As usual, please let us know if I missed any papers.

Testing \(C_k\) freeness in bounded-arboricity graphs by Talya Eden, Reut Levi, and Dana Ron (arXiv) Our post from July 2021 highlighted an open problem posed by Goldreich. This problem asks if it is possible to transform property testers for bounded degree graphs to property testers for unbounded degree graphs with general arboricity. The featured paper answers the question Goldreich posed in the negative. In particular, testing \(C_4\) freeness in bounded arboricity graphs (with possibly unbounded degree) already admits an \(\Omega(n^{1/4})\) one-sided lowerbound. Up to \(\log\) factors, the paper also proves a matching upperbound. The same bounds hold for \(C_5\)-freeness. Further, for every \(k \geq 6\), the paper proves an \(\Omega(n^{1/3})\) one-sided lower-bound.

Directed Isoperimetry and Monotonicity Testing: A Dynamical Approach by Renato Ferreira Pinto Jr (arXiv) The featured paper considers a classic of property testing. So, you want to test whether an input real-valued function \(f \colon [0,1]^d \to \mathbb{R}\) over the solid cube is monotone or whether it is \(\varepsilon\)-far from being monotone. It is convenient to require that the input function better be differentiable and Lipschictz (for those of you keeping scores, this should remind you of our post from July 2023 where we covered a previous work in the same setting again by Renato). Motivated by the success of testers developed for the boolean hypercube domain, a natural analogy suggests to have in addition to a standard value oracle, an additional derivative oracle which takes as input a point \(x \in [0,1]^d\) and a direction \(\mathbf{v} \in \mathbb{R}^d\). This oracle allows you to query directional derivative at \(x\) along \(\mathbf{v}\). Renato’s punchline is a directed Poincare Inequality which in the above setup, connects the distance to monotonicity to the square root of directed analog of a suitable notion of influence. The techniques used in the paper seem intriguing. They are inspired by the original proof of the classic Poincare Inequality. In particular, as Renato notes “the main theme of our proof of this result is the study of the convergence of a partial differential equation.”

Optimal Non-Adaptive Tolerant Junta Testing via Local Estimators by Shivam Nadimpalli, Shyamal Patel (arXiv) The problem of Tolerant Junta testing is no stranger to our regular readers. This paper is fairly intriguing for those of you who care about testing function properties. As the abstract notes, this is the first paper to nail down the true (non-adaptive) query complexity for tolerantly testing some natural property of boolean functions. The main result of this paper presents a lower bound of \(2^{\widetilde{\Omega}(\sqrt{k \log(1/\varepsilon)})}\) on the task of non-adaptive tolerant \(k\)-junta testing. The paper presents an almost matching non-adaptive algorithm as well.

Testing Intersectingness of Uniform Families by Ishay Haviv and Michal Parnas (arXiv) Let \(\mathcal{F}\) denote an intersecting family all sets in which are subsets of an underlying \(n\)-element universe. This means that for any \(F_1, F_2 \in \mathcal{F}\), you have that \(F_1 \cap F_2 \neq \emptyset\). Some of you might immediately recall Erdos-Ko-Rado theorem which asserts an upperbound on the size of such an intersecting family where every set has size \(k\). Another famous result is Lovasz’s (positive) resolution of Kneser’s conjecture which asserts an lowerbound on the number of intersecting families you need to cover all \(k\)-subsets of \([n]\). For ease of discussion, let us follow the authors and cook up a property testing problem \(\textsf{INTERSECTING}_{n,k, \varepsilon}\). In this problem, you are given access to the indicator \(f \colon {[n] \choose k} \to \{0,1\}\) encoding the family \(\mathcal{F}\) and you ask whether \(\mathcal{F}\) is intersecting or whether it is \(\varepsilon\)-far from it. Recently, Chen-De-Li-Nadimpalli-Servedio explored the non-uniform-set-size variant of this problem (which we covered here). They presented one-sided algorithms with a non-adaptive query bound of \(2^{\widetilde{O}(\sqrt{n \log(1/\varepsilon)})}\) for this problem and they also showed an almost matching lowerbound. The featured paper contrasts these results with the situation you obtain when all the set sizes are indeed the same (that is, the paper explores the testability of \(\textsf{INTERSECTING}_{n,k, \varepsilon}\). Of the numerous results in the paper, let me highlight one: you can test \(\textsf{INTERSECTING}_{n,k, \varepsilon}\) with two-sided error with a mere \(O\big(\log n \cdot \frac{1}{\varepsilon} \big)\) queries. This result holds for \(\varepsilon \geq \Omega(k/n)^r\) for a fixed \(r\).

Distribution-Free Testing of Decision Lists with a Sublinear Number of Queries by Xi Chen, Yumou Fei and Shyamal Patel (arXiv) Decision lists are a popular and convenient way to represent a boolean function which is learnable from examples. In more detail, a decision list is a collection of pairs \((\alpha_1, \beta_1), (\alpha_2, \beta_2), \ldots (\alpha_k, \beta_k)\) (here, \(\alpha_j\)’s denote literals and \(\beta_j\)’s are bits). This list defines a boolean function \(f \colon \{0,1\}^n \to \{0,1\}\) as follows: for any \(\mathbf{x} \in \{0,1\}^n\), you let \(f(x) = \beta_j\) where \(j\) is the smallest index such that \(\alpha_j\) is satisfied by \(\mathbf{x}\). With this (not super rigorous) definition out of the way, I can now tell you about the main result of this featured paper. The main theorem of this paper asserts that in the distribution-free framework for property testing, you still get sublinear time algorithms for testing decision lists. In particular, thanks to this paper, now you can engineer a two-sided adaptive distribution free algorithm for testing decision lists which makes runs in time \(\widetilde{O}(n^{11/12})\).

Simple algorithms to test and learn local Hamiltonians by Francisco Escudero Gutiérrez (arXiv) The featured paper considers the task of (tolerantly) testing and learning an \(n\)-qubit \(k\)-local Hamiltonian from queries to its evolution operator. The main result of the paper asserts that the task of Tolerant Hamiltonian Locality Testing can be done with a mere \(1/(\varepsilon_2 – \varepsilon_1)^8\) queries to the evolution operator.

Local Test for Unitarily Invariant Properties of Bipartite Quantum States by Kean Chen, Qisheng Wang, Zhicheng Zhang (arXiv) Lots of investigations on quantum entanglement consider bipartite quantum states. The featured paper considers the task of testing properties of bipartite pure states. The paper begins by recalling a helpful duality between a property of bipartite pure states being unitarily invariant on one part, and the property being locally testable on the other part. This duality does not offer any insights into the query complexity of the local tester. The main result of the paper proves that the local tester indeed achieves optimal sample complexity over all global testers.

Learning Intersections of Halfspaces with Distribution Shift: Improved Algorithms and SQ Lower Bounds by Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan (arXiv) Consider the standard setup of supervised learning with a twist. Namely, the distribution from which you receive samples in the training phase (\(\mathcal{D}\)) and the distribution from which the samples are taken in the test phase (\(\mathcal{D}’\)) are not necessarily the same (and hence the name — distribution shift). For example, this situation may arise if you use one set of equipments in the training phase and another during the test phase. This model was explored in a previous work by the same set of authors where they considered the task of obtaining a low-error classifier (on \(\mathcal{D}’\)) when they are additionally told that the training samples pass some helpful test. In this paper, the authors explore the problem of testing intersections of halfspaces with respect to Gaussian Training Distributions. The main contribution of the paper is a set of new positive results which extend the results from PAC learnability to the learnability under the new model. Indeed, under some reasonably mild assumptions, the bounds in the new model match the bounds from the standard PAC model. For quantitative details, please refer to the paper.

Announcing Monotonicity Festival at IIT Bombay

IIT Bombay is organizing an online Monotonicity Festival which is dedicated to understanding the impressive progress over the last couple of decades in Testing Boolean Monotonicity over various partially ordered domains (most notably, the Boolean Hypercube and the Hypergrid). Five of the lectures in this ongoing festival (with six planned lectures in total) have already taken place. Tomorrow is the last lecture where (PTReview’s own) Seshadhri will shed some light on the Directed Talagrand Inequality central to the analysis of the Monotonicity tester presented in the seminal work of Khot-Minzer-Safra.

The talk will begin at 10:30 AM India time tomorrow (April 30, 2024). Here is the zoom link: https://us06web.zoom.us/j/85365027303?pwd=hreOInC1dMwdFYnTOVa0nblSs9kDz6.1

Also, here is the link for all the YouTube playlist where we will upload all the talks: https://www.youtube.com/playlist?list=PLuoJqx7PPeVKzhTgFbMQH1zvF40yClpES

Thanks to Hadley Black, Deeparnab Chakrabarty and C. Seshadhri for kindly agreeing to give the talk.

News for January 2024

Welcome to the posting for first month of 2024! We hope you had a good start to this year. Last month, we had 3 papers. So, without further delay, let’s get started.

On locally-characterized expander graphs (a survey) by Oded Goldreich (ECCC) In this paper, written in an expository style, Goldreich surveys the result of Adler, Kohler and Peng which we covered in our posts on May 2021 and August 2020. The survey begins by reminding the reader what a locally-characterizable graph property is. A family of graphs \(\mathcal{G}\) is said to be characterized by a finite class \(\mathcal{F}\) of graphs if every graph \(G \in \mathcal{G}\) is \(F\)-free for all \(F \in \mathcal{G}\). Thanks to its connections with expressibility in first order logic, one would expect the a locally-characterizable graph property to admit testers depending only on the proximity parameter (in the bounded degree graph model). So, it was quite a surprise when Adler, Kohler and Peng showed that there are locally characterizable graph properties which provably require testers whose query complexity increases with the size of the graph. The main theorem of Adler, Kohler and Peng describes the locally characterizable property that is hard to test. This characterization asserts that you can get your hands on a finite class \(\mathcal{F}\) of graphs so that \(\mathcal{F}\)-freeness of a graph \(G\) means that the graph is a (bounded-degree) expander. One of the key ingredients used in this proof is the Zig-Zag construction of Reingold, Vadhan and Wigderson.

Fast Parallel Sampling under Isoperimetry by Nima Anari, Sinho Chewi, and Thuy-Duong Vuong (arXiv) The featured paper considers the task of sampling (in parallel) from a continuous distribution \(\pi\) supported over \(\mathbb{R}^d\). The main result in the paper shows that for distributions which satisfy a log-Sobolev inequality, you can use parallelized Langevin Algorithms and obtain samples from a distribution \(\pi’\) where \(\pi’\) and \(\pi\) are close in KL-divergence. As an application of their techniques, the authors show that their results can be used to do obtain RNC samples for directed Eulerian Tours and asymmetric Determinantal Point Processes.

Holey graphs: very large Betti numbers are testable by Dániel Szabó, Simon Apers (arXiv) This paper considers the problem of testing whether an input graph \(G\) has a very large \(k\)-th Betti Number (at least \((1-\delta) d_k\) where \(d_k\) denotes the number of \(k\)-cliques in \(G\) and \(\delta > 0\) is sufficiently small) in the dense graph model. As the title indicates, the result says that this property is testable for constant \(k\). Earlier in 2010, Elek investigated this question in the bounded degree model. Elek’s main result showed that with a number of queries \(q = q(\varepsilon\), you can obtain an estimate to the \(k\)-th Betti Number which is within an additive \(\pm \varepsilon n\) of the true \(k\)-th Betti Number. Let us contrast this result with the setting of dense graphs. The authors note that in the dense graph model, getting an estimate to the \(k\)-th Betti Number which is within an additive $\pm \varepsilon d_k$ of the true value needs \(\Omega(n)\) queries. This is the reason why the authors consider the formulation above.

News for September 2023

Sorry for delay in getting this month’s post out. This time we had seven (EDIT) eight (LATER EDIT) nine papers. Thanks to our readers for pointing out a paper we missed. Do let us know if we missed any (EDIT) others. Alright, without further delay, let us look at this month’s spread.

Mildly Exponential Lower Bounds on Tolerant Testers for Monotonicity, Unateness, and Juntas by Xi Chen, Anindya De, Yuhao Li, Shivam Nadimpalli, and Rocco A. Servedio (arXiv) Let us begin with a paper on property testing of boolean functions over the binary hypercube \(\{0,1\}^n\). Quoting from the abstract.

[This paper gives] the first superpolynomial (in fact, mildly exponential) lower bounds for tolerant testing of monotonicity, unateness, and juntas with a constant separation between yes and no cases.

Let us take a little dip to get a superficial feel for what techniques the paper uses for boolean monotonicity. The key lies in an adaptation of the lower bound techniques pioneered in the paper of Pallavoor-Raskhodnikova-Waingarten (PRW) which considers distributions over “yes functions” (which are \(\varepsilon/\sqrt{n}\)-close to monone) and “far functions” (which are $\eps$-far from being monotone). PRW split the \(n\) variables into \(n/2\) control variables and \(n/2\) action variables. PRW consider the subcubes indexed by bit-strings where the control variables are balanced. The function value is chosen carefully on these subcubes which ensures any tester that reliably distinguishes yes and no functions need to sample bit-strings from the same balanced subcube which differ in lots of action variables. As PRW argue, this event occurs with low probability if the allowed number of queries is small. The key insight of the featured paper is to modify the PRW construction by using random monotone DNF formulas due to Talagrand. If this sets up your appetite, go read the paper!

Testing Junta Truncation by William He and Shivam Nadimpalli (arXiv)

Let \(f \colon \{0,1\}^n \to \{0,1\}\) be a \(k\)-junta. You are given a set \(\mathcal{U}_{\text{yes}} = \bigl\{0,1\bigr\}^n\) and a set \(\mathcal{U}_{\text{no}} = \bigl\{x \in \{0,1\}^n \colon f(x) = 1 \bigr\}.\) Consider uniform distributions supported on both of these sets which we call \(\mathcal{D}_{\text{yes}}\) and \(\mathcal{D}_{\text{no}}\). Finally, we define a distribution \(\mathcal{D} = \begin{cases} \mathcal{D}_{\text{yes}} \text{ w.p. } 1/2 \\ \mathcal{D}_{\text{no}} \text{ w.p } 1/2 \end{cases}.\)

You are given \(t\) samples from the distribution \(D\). The task is to decide whether \(D\) is the yes distribution or the no distribution. The featured paper shows you can reliably solve this task with \(t \leq \min(2^k + \log{n \choose k }, 2^{k/2} \log^{1/2}{n \choose k})\) samples. The paper also supplements this result with a lower bound of \(t \geq \log{n \choose k}\) samples fewer than which cannot be used to reliably distinguish these two distributions. The results suggest that this “testing junta truncation” problem requires learning the set of relevant variables for the junta.

Longest Common Substring and Longest Palindromic Substring in \(\mathbf{\widetilde{O}(\sqrt n)}\) Time by Domenico Cantone, Simone Faro, Arianna Pavone, and Caterina Viola (arXiv) I paraphrase from the abstract of this paper. You know the longest common substring and longest palindromic substring as classic problems in computer science both of which can be solved in linear time using suffix trees. Recently, quantum algorithms were proposed for both of these problems in the query model both of which issue only \(o(n)\) quantum queries. The featured paper notes that this query model has a shortcoming namely when it comes to real life implementation on actual hardware. The current paper address this shortcoming by presenting \(o(n)\) quantum-query algorithms in the circuit model of computation.

Testing properties of distributions in the streaming model by Sampriti Roy and Yadu Vasudev (arXiv) Alright, now let us consider a different twist on distribution testing. Suppose you have a small memory. You obtain a bunch of samples to solve some standard distribution testing task but the twist is of course you cannot store all the samples. What can you say about how sample complexity trades off against space complexity? The featured paper studies this trade off in the standard access model and the conditional access model. One of the results of the paper asserts that in the conditional access model, you can do identity testing with \( \widetilde{O}\bigl(\frac{\log^4n}{\varepsilon^4}\bigr)\) samples while using only \(O\bigl(\frac{\log^2 n}{\varepsilon^2} \bigr)\) bits of memory.

Testing Spreading Behavior in Networks with Arbitrary Topologies by Augusto Modanese and Yuichi Yoshida (arXiv) We covered the problem of testing dynamic environments in this March 2014 post and that May 2021 post earlier. The goal here is to check whether a dynamically evolving system evolves according to some fixed rule or whether it evolves according to some fixed rule or whether the system is far from systems that evolve according to that fixed rule. The May 2021 post covered a paper which shows you can test dynamically evolving systems that evolve according to what is called the threshold rule. The featured paper considers rules motivated by some kind of models for infection spreading. One of the results in the paper presents one-sided and two-sided testers (with \(O(1/\varepsilon)\) query complexity) for testing a single step of evolution (on bounded degree graphs) with these rules.

A Tight Lower Bound of Ω(log n) for the Estimation of the Number of Defective Items by Nader Bshouty and Gergely Harcos (ECCC) The featured paper considers a problem in group testing. Let us quickly review the setup for group testing. You are given some ground set \(X\) of \(|X| = n\) items. Suppose the set of items in the set \(I \subseteq X\) is defective. The challenge is to devise a test which refers to some set \(Q \subseteq X\) where the test is said to be successful iff \(Q \cap I \neq \emptyset\). This paper presents lower bounds for non-adpative algorithms for group testing. And as the title says, if your algorithm wishes to estimate the number of defective items to within a constant factor, you better pay up \(\Omega(\log n)\) tests.

A tight lower bound on non-adaptive group testing estimation by Tsun-Ming Cheung, Hamed Hatami, and Anthony Ostuni (arXiv) As our readers pointed out, this paper is concurrent with the paper above and achieves the same lower bound. Indeed, this holds for both the one-sided and the two-sided variants. Furthermore, as this paper shows if one knows the set \(I\) satisfies \(L \leq |I| \leq U\) then you can show both one-sided and two-sided lower bounds of \(\Omega(U/L)\) non-adaptive queries if you want a constant approximation to \(|I|\).

On Testing Isomorphism to a Fixed Graph in the Bounded-Degree Graph Model by Oded Goldreich and Laliv Tauber (ECCC) This paper looks like a fantastic read for your students — especially when written in such an engaging style spanning (only) 18 highly readable pages. As the title indicates, this paper considers the challenge of testing isomorphism to a fixed graph in the bounded degree model. The main result of this paper asserts that for almost all \(d\)-regular graphs \(H\) on \(|V(H)| = n\) vertices, testing isomorphism to \(H\) can be done in about \(\approx \sqrt n\) queries. The paper also presents an almost matching (query) lower bound which also holds for almost all graphs \(H\).

Tolerant Testing of High-Dimensional Samplers with Subcube Conditioning by Gunjan Kumar, Kuldeep S. Meel, and Yash Pote (arXiv) Let us consider the distribution testing setup with a twist: Suppose you are given some unknown distribution \(\mathcal{Q}\) supported over \(\{0,1\}^n\) and you want to sample from it conditioned on some predicate \(\mathcal{Q}\). The question is can you efficiently check whether these are legit samples (satisfying the predicate) which are taken from the distribution \(\mathcal{Q}\). Our October 2020 news covered a tolerant tester on this problem which involved some subset of the authors of the current paper. The featured paper considers what additional leverage you gain if you are given access to a sampling oracle which can sample from “conditioned subcubes.” In this model, you can query some subcube and after issuing a query, you will receive an element \(x\) from this subcube with probability proportional to original probability weight of \(x\). The paper provides a tolerant tester in this setup which makes at most \(\widetilde{O}(n^3/(\varepsilon_2 – \varepsilon_1)^5)\) queries to this sampling oracle.