Announcing WoLA 2025 in Chicago

Leave a reply

The 9th edition of WoLA, the Workshop on Local Algorithms, will be taking place on August 18-20, 2025 at the Toyota Technological Institute (TTIC), Chicago, IL.

For those unfamiliar with WoLA:

Local algorithms, that is, algorithms that compute and make decisions on parts of the output considering only a portion of the input, have been studied in a number of areas in theoretical computer science and mathematics. Some of these areas include sublinear-time algorithms, distributed algorithms, inference in large networks and graphical models. These communities have similar goals but a variety of approaches, techniques, and methods. This workshop is aimed at fostering dialogue and cross-pollination of ideas between the various communities.

You are all invited and we would love to see you there! For more on (free) registration (by ⏰ August 10), how to submit a poster, list of invited speakers, and local arrangements, see the website.

Program Committee: Arnab Bhattacharyya (University of Warwick), Clément Canonne (University of Sydney), Elena Grigorescu (University of Waterloo), Moti Medina (Bar Ilan University), Rocco Servedio (Columbia University), Asaf Shapira (Tel Aviv University) [Chair], Ali Vakilian (TTIC), Yuichi Yoshida (NII)

News for April 2025

Leave a reply

The bonanza of property testing results continues this month, with seven new papers on the arXiv! And with range, too: from regular languages to quantum states, with detours through distribution testing, Boolean functions, and relative error property testing. Oh, and yes, also, (quantum) magic!

The Trichotomy of Regular Property Testing (arXiv), by Gabriel Bathie, Nathanaël Fijalkow, and Corto Mascle. In property testing of regular languages, initiated by Alon, Krivelevich, Newman, and Szegedy in 2001, one has to decide whether an input word belongs to the target language $L$, or is at distance at least $\varepsilon$ from every $x \in L$ (where the distance is typically Hamming or the (easier) edit distance). Surprisingly, it was shown that every regular language could be tested with $\tilde{O}(1/\varepsilon)$ queries, independent of the input size. But is that tight? And is that $\tilde{O}$ necessary? Many years of work later, the main result of this paper is settling the question, showing that for testing under the Hamming distance there are only three options: either a language is trivial (0 queries needed!), or easy ($\Theta(1/\varepsilon)$ necessary and sufficient), or just plain hard $\Theta(\log(1/\varepsilon)/\varepsilon)$ queries necessary and sufficient). Nothing else!

A Mysterious Connection Between Tolerant Junta Testing and Agnostically Learning Conjunctions (arXiv), by Xi Chen, Shyamal Patel, and Rocco Servedio. (1) Take two well-studied, notoriously challenging and seemingly unrelated problems about Boolean functions: agnostic learning conjunctions (that is, learning conjunctions with noise), and tolerantly testing juntas. (2) Unearth a new connection between the two tasks. (3) Write a very entertaining, illuminating introduction about this. (4) Oh, also, provide a new $\tilde{O})(2^{n^{1/3}})$-time agnostic learning algorithm for conjunctions, improving the previous best known result after 18 years; use it to obtain a $\tilde{O})(2^{k^{1/3}})$-query tolerant tester for juntas using this new connection, thus showing a polynomial separation between adaptive and non-adaptive algorithms for this task. (5) You get this paper.

Distribution Testing Meets Sum Estimation (arXiv), by Pinki Pradhan and Sampriti Roy. In the sum estimation problem, there is a set of $n$ elements, each with a non-negative weight, and the goal is to estimate the total weight $W$ while minimizing the number of weight queried. Under a model which allows both weighted and uniform sampling, this is known to be achievable with $O(n^{1/3})$ queries (Beretta and Tětek). This paper considers the task under two additional assumptions: first, the weights are non-increasing, and second, the algorithm is allowed conditional weighted and uniform sampling (i.e., the same two types of sampling, but conditioned on any subset $S$ of its choosing). In this setting, the authors show how to estimate the total weight to $1\pm\varepsilon$ with only $O((\log n)\text{poly}(1/\varepsilon))$ queries.

Efficient witnessing and testing of magic in mixed quantum states (arXiv), by Tobias Haug, and Poetri Sonya Tarabunga. Magic is real, and it’s quantifiable: roughly speaking, it quantifies the amount or “non-stabilizerness” of a state. I won’t pretend to fully understand what this means, but this paper shows that one can test the magic of low-entropy $n$-qubit states (i.e., distinguish between low magic states and high-magic states) with only polynomially (in $n$) many copies of the state.

Mildly-Interacting Fermionic Unitaries are Efficiently Learnable (arXiv), by Vishnu Iyer. More quantum property testing! In this paper, the author shows how to test whether an $n$-mode fermionic unitary has Gaussian dimension at least $k$ (or is $\varepsilon$-far from it in Frobenius norm) in time $\text{poly}(n, 1/\varepsilon)$. (This is then used as a building block to efficiently learn such unitaries.)

Testing Juntas and Junta Subclasses with Relative Error (arXiv), by Xi Chen, William Pires, Toniann Pitassi, and Rocco Servedio. In this relatively (!) new model of property testing, which we covered last October, the notion of farness between two Boolean functions is relative to their number of satisfying assignments. Testing with relative error is at least as hard as in the standard setting, and could be strictly harder: this paper shows that, for the case of testing juntas, this is not the case. Even with relative error testing, $\tilde{O}(k/\varepsilon)$ queries are necessary and sufficient for junta testing! Using ideas from “standard testing” (specifically, from the “testing by implicit learning” framework), their results further extend to testing a large number of subclasses of juntas.

Relative-error testing of conjunctions and decision lists (arXiv), by Xi Chen, William Pires, Toniann Pitassi, and Rocco Servedio. Same team, same model — more results! After testing juntas in the relative error model, the authors continue their systematic exploration of the testing questions, this time focusing on testing decision lists and conjunctions. They are able to obtain a $\tilde{O}(1/\varepsilon)$-tester with two-sided error for the former, and an $O(1/\varepsilon)$-tester with one-sided error for the latter: both matching the best-known query complexity in the standard model.

News for March 2025

Leave a reply

After a good spell last month, the community saw no reason to relent. This month, we had five papers. From tolerant testers for fundamentally important graph problems, to privately testing distributions, to more explorations in the conditional sampling model, to counting and sampling motifs. Also included as a bonus is a result on mulitpass streaming lower bounds for the approximating max-cut.

A Tolerant Independent Set Tester (arXiv) (by Cameron Seth) Consider the setting of testing graph properties in the dense graph model where you have access to the the adjacency matrix of the graph and you may check whether a pair $(i,j)$ is connected by an edge or not. In this model, consider the following task: I want you to design a sublinear time procedure which on input a dense graph, runs in time $poly(\rho/\varepsilon)$ and returns YES if the graph is $\varepsilon_1 = \widetilde{\Omega}(\varepsilon)$-close to having an independent set of size at least $\rho n$ and returns NO if the graph is $\varepsilon_2 = \varepsilon$-far from all graphs that have an independent set of size at least $\rho n$. The verdict is required in both cases to hold with probability at least $2/3$. Alright, that’s what the title pretty much gave away already. So, how does the paper establish this result?

To this end, the paper uses a remarkably simple algorithm. You can pick a random subset $S \subseteq V$ of size $s = \widetilde{O}(\rho^3/\varepsilon^2)$ vertices and foucs on counting how many edges you see in various subsets of $S$ all of size $\rho s$. If you see some subset with size $\rho s$ which induces $\varepsilon_1 s^2$ edges, you accept $G$. If this condition fails, you reject $G$. The analysis presented in the paper is intricate and proceeds by proving a new container lemma for sparse graphs (rather than independent sets). It is a good read and you should totally take a look at this paper.

Better Private Distribution Testing by Leveraging Unverified Auxiliary Data (arXiv) (by Maryam Aliakbarpour, Arnav Burudgunte, Clément Canonne (our own!), and Ronitt Rubinfeld) If our January post is fresh in your mind, you might recall the theme: assuming you know a little about the distribution $\mathbf{p}$ presented to you allows you give improved algorithms for classic distribution testing tasks such as identity testing and uniformity testing (I am assuming support of my distributions is $[n]$). This paper considers these problems in with the setting of differential privacy. So, you want to design private algorithms for these classic tasks where you get to assume that you know something about the target distribution — that is, you assume it is not completely unknown just like we had in our January posting. The paper presents sample efficient private algorithms both for identity testing and closeness testing.

Optimal mass estimation in the conditional sampling model (arXiv) (by Tomer Adar, Eldar Fischer, and Amit Levi) As the title suggests, this paper considers the task of estimating the probability mass of all elements in the support of a probability distribution sitting on a finite universe $\mathcal{U}$. Of course, you have to assume stronger oracle access for this task and the oracle people assume access to is the conditional sampling oracle. You get to specify a subset $S \subseteq \mathcal{U}$ and you will get a sample from $S$ with the correct conditional probability of element being in $S$ relative to the underlying distribution. Meel-Kumar-Pote gave an algorithm for this task which requests $poly(\log n)/poly(\varepsilon)$ samples. The featured paper, on the other hand, presents algorithms for this problem which request only $O(\log(\log n)/poly(\varepsilon))$ samples. The authors also show a mathcing lower bound effectively closing this problem.

Approximately Counting and Sampling Hamiltonian Motifs in Sublinear Time (arXiv) (by Talya Eden, Reut Levi, Dana Ron, and Ronitt Rubinfeld) Suppose I give you a large graph $G = (V,E)$ and a small subgraph $H$ whose number of occurences in $G$ I wish to (approximately) count. You may assume that you have access to an oracle which can give you a random vertex, the $i$-th neighbor of your favorite vertex, degree of a vertex, and can also answer whether a particular pair of vertices is connected by an edge. This is the standard model we use for coming up with sublinear time algorithms for graph problems. It is contrasted sometimes with the augmented model where in addition to all of the above, you also get to assume access to an oracle which can give you uniformly random edges in the graph $G$. As you can imagine, the augmented model allows you to count the number of occurences of $H$ for a much richer class of small subgraphs $H$. Indeed, this used to be the status-quo till this paper. Indeed, before this result came out, we knew that in the augmented model, we can pretty much count the number of occurences of all small subgraphs $H$. Whereas, in the standard model, we only knew how to count edges, stars and cliques. The featured paper makes progress towards this divide and presents algorithms for approximately counting the number of copies of $H$ for a much richer class of small subgraphs $H$. One thing you might want to try is to simulate the algorithms in the augmented model by hacking up algorithms for returning uniformly random edges. However, that approach is a no go as the running time to sample u.a.r edges can be prohibitive for a sublinear time implementation in the standard model. This paper develops ideas which bypass such annoyances.

Multi-Pass Streaming Lower Bounds for Approximating Max-Cut (arXiv) (by Yumou Fei, Dor Minzer, Shuo Wang) This result, as the title revealed is from the streaming literature. But it should be of interest to our readers and so we cover it here. The essential summary is this: If you want to get a better than $1/2$ approximation algorithm for the max cut problem in the streaming setting — like a $1/2 + \varepsilon$ approximation — be ready to pay up $poly(n)$ passes. In more detail, the technical statement reads out as follows: Any randomized, $k$-pass $1/2 + \varepsilon$ approximation algorithm for max-cut requires $\Omega(n^{1/3}/k)$ space. The lower bound results in this paper are inspired by the communication complexity lower bound of a certain problem investigated in the seminal work of Kapralov-Krachun from 2019 where the authors showed that any single pass streaming algorithm for max-cut aiming to produce a better than $1/2$ approximation must cough up $\Omega(n)$ space. This requires a lot of care as in the multipass streaming setting, you cannot hope to show your communication complexity lower bounds by a naive application of discrepancy method. The delicate finesse needed is hijacked in by a clever use of hypercontractivity. Looks like this will be an interesting treat for our intrepid readers.

News for February 2025

Leave a reply

After a few quiet months on PTReview, the community is back with a bang. We have a selection of nine papers this month, ranging from hypergraphs, vertex coloring, triangle counting, a new take on linearity testing, and a sublinear foray into transformers.

Property Testing in Bounded Degree Hypergraphs by Hugo Aaronson, Gaia Carenini, and Atreyi Chanda (arXiv). We all know about the rich history of property testing and sublinear algorithms for graphs. The space for hypergraphs is far less studied, and likely has an even richer theory. Consider the input being an $n$ vertex $k$-uniform hypergraph, so every hyperedge is a subset of $n$ vertices. This paper is the first to define the bounded degree setting. The graph is represented as an adjacency list, wherein, for each vertex, we have the list of hyperedges containing that vertex. Distance between hypergraphs is defined in terms of the symmetric difference between the sets of hyperedges. The primary focus of the paper is on the fundamental property of $k$-colorability (note that $k$ is also the hyperedge size). The main lower bound result is that for $k \geq 3$, this property requires $\Omega(n)$ queries to test. This is in contrast to the graph setting ($k = 2$), where bipartiteness is known to be testable for bounded degree graphs. This paper shows if that inputs have bounded treewidth, then $k$-colorability is testable with one-sided error in constant queries.

Simple Sublinear Algorithms for $(\Delta + 1)$ Vertex Coloring via Asymmetric Palette Sparsification by Sepehr Assadi and Helia Yazdanyar (arXiv). Consider access to the adjacency list of a graph with $n$ vertices and maximum degree $\Delta$. Most of our readers will be familiar with result of Assadi-Chen-Khanna proving that $(\Delta + 1)$-coloring can be done in $\widetilde{O}(n^{3/2})$ time. The key tool is a palette sparsification lemma, that shows that each vertex can independently pick a small list of colors, such that the final coloring is consistent with these lists. This paper proves a weaker version of this lemma with a significantly simpler proof. The result is an asymmetric version, where vertices choose color lists of varying sizes. The average length goes up by a log factor. But the proof is simpler, and it also allows for the final coloring to just use the greedy algorithm (constrained to these lists).

Improved Sublinear Algorithms for Classical and Quantum Graph Coloring by Asaf Ferber, Liam Hardiman, and Xiaonan Chen (arXiv). Another result on $(\Delta +1)$-coloring. A result of Morris and Song show that there are simpler algorithms that give a $(1+\varepsilon)\Delta$-coloring. Adapting their techniques, this paper gives a sublinear algorithm for $(\Delta+1)$-coloring that runs in $O(n^{3/2}\sqrt{\log n})$ time. This gives a $\sqrt{\log n}$ factor improvement from the previous results. The algorithm can be accelerated in the quantum model using Grover’s search, to get a running time of $\widetilde{O}(n^{4/3})$. Finally, the paper gives a $\widetilde{O}(n^{5/4})$ time quantum algorithm for getting a $(1+\varepsilon)\Delta$-coloring.

Monotonicity Testing of High-Dimensional Distributions with Subcube Conditioning by Deeparnab Chakrabarty, Xi Chen, Simeon Ristic, C. Seshadhri, and Erik Waingarten (arXiv). In my first foray into distribution testing, this paper studies the problem of testing monotonicity of distributions over the hypercube, $\{-1,1\}^n$. Observe that the domain is considered to be exponentially large, and the aim is to get distribution testers that make $\textrm{poly}(n)$ samples. To get such bounds, this paper assumes subcube conditional samples. Given any subhypercube of the domain, the algorithm can generate samples restricted to this subhypercube. A distribution is called monotone, if the corresponding probability mass function is monotone (wrt to the standard partial order on $\{-1,1\}^n$). For this setting, this paper shows that property of monotonicity can tested with $\widetilde{O}(n/\varepsilon^2)$ samples. Moreover, this bound is shown to be optimal, up to polylog factors. One of the main tools is a new directed isoperimetric theorem for real-valued functions on the hypercube.

Compression Barriers in Autoregressive Transformers by Themistoklis Haris and Krzysztof Onak (arXiv). While this paper does not give a sublinear time algorithm, it uses many techniques from the sublinear algorithms toolkit to prove lower bounds. And it is on transformers, arguably one of the most important components for machine learning on sequences and Large Language Models. Consider a sequence of $n$ vectors $v_1, v_2, \ldots, v_n \in \mathbb{R}^d$, which one can think of as vector embeddings of $n$ words (coming from, say, some generated text). Based on extensive learning and model fine tuning, assume that the model has learned some “key” vectors $k_1, k_2, \ldots, k_n \in \mathbb{R}^d$. The next word is generated from a “query vector” $q$, using which we compute $M^{-1} \sum_{\ell \leq n} \exp(q \cdot k_\ell) v_\ell$ (where $M$ is a normalization factor). Suppose there are $n$ query vectors, and the aim is to generate the above sum for all the query vectors (these are like the next $n$ words). In practice, there are many schemes to speed up computation by compressing the key vectors. This paper shows that, in the worst case, a computation of the next word/vector takes $\Omega(nd)$ time and space. Thus, any sublinear compression/speed up method must fundamentally use properties of the data.

Arboricity and Random Edge Queries Matter for Triangle Counting using Sublinear Queries by Arijit Bishnu, Debarshi Chanda, Gopinath Mishra (arXiv). Triangle counting is a classic problem, with a rich history in sublinear algorithms. The earliest work gives an algorithm running in $\widetilde{O}(n/t^{1/3} + m^{3/2}/t)$ time, where $n, m$ are the usual graph parameters and $t$ is the triangle count. (Polylog and $\varepsilon$ factors are ignored.) This complexity can be improved to $\widetilde{O}(n/t^{1/3} + m\alpha/t)$, where $\alpha$ is the graph arboricity/degeneracy. The standard adjacency list model can be augmented with random edge queries, in which case, it was known that $O(m^{3/2}/t)$ time algorithms exist. This paper shows that $O(m\alpha/t)$-time algorithms exist for the model with random edge queries, which combines the previous results into an even better algorithm.

Improved Sublinear-time Moment Estimation using Weighted Sampling by Anup Bhattacharya, Pinki Pradhan (arXiv). Consider an array of $n$ non-negative weights $w_1, \ldots, w_n$. The $t$th-moment is the sum $\sum_{i \leq n} w^t_i$. The aim is to estimate this moment, with proportional samples. This means that an algorithm can generate index samples where the index $i$ has probability proportional to $w_i$. The first result on this problem, for $t=1$, was by Motwani-Panigrahy-Xu. Beretta-Tetek do a detailed study of this problem for case $t=1$ (sum estimation), and show that this problem can be solved in $O(\sqrt{n})$ queries. When the weights are vertex degrees and we have graph access, there are results showing the existence of $\widetilde{O}(n^{1-1/t})$ algorithms for $t > 1$. This paper gives such a result for any non-negative weights with $t \geq 1$, and proves a matching lower bound. Moreover, this paper considers the setting where $atex t < 1$. Interestingly, there is an algorithm of query complexity $O(\sqrt{n} + n^{1/t-1})$, and a lower bound showing that $\Omega(n)$ samples are needed when $t \leq 1/2$.

Biased Linearity Testing in the 1% Regime by Subhash Khot and Kunal Mittal (ECCC). The classic problem of linearity testing never stops yielding new insights. A function $f:\{0,1\}^n \to \{-1,1\}$, if $f(x) = (-1)^{a \cdot x}$ where $a \in \{0,1\}^n$. The classic 3-query BLR test distinguishes linear functions from those that are far from linear. Specifically, if the tester passes with probability $1-\delta$, then the function $f$ is $O(\delta)$-close to linear. This called testing in the “99% regime”. Suppose we want to tester with guarantees in the “1%” regime. This means that if the tester passes with probability $> \delta$, then $f$ has at least $\epsilon$ correlation with some linear function (where $\epsilon$ is some function of $\delta$). Such results were known for the uniform distribution on the hypercube. This paper studies product distributions on the hypercube, where all marginals are identically $p$-biased (the probability of choosing a $1$ is $p$). Such 1%-testers were known for $p \in (1/3,2/3)$ from recent work. This paper gives a $1+1/p$-query tester for all values of $p$, and proves that the number of queries must be at least $1+1/p$.

News for January 2025

Leave a reply

January 2025 was by many measures a very… eventful month; as far as property testing is concerned, not so much, with only two papers (and a third we had previously missed). Uneventful is not a bad thing, sometimes!

Many pentagons in triple systems, by Dhruv Mubayi and Jozsef Solymosi (arXiv). This paper is interested in quantifying the number of copies of $C_k$ in 3-uniform hypergraphs. In the process, the authors establish a quantitative result very relevant to property testing, at least for those with an interest in testing triangle-freeness in dense graphs, improving on a result of Gishboliner, Shapira and Wigderson: namely, that if an $n$-vertex graph is $\varepsilon$-far from triangle-freeness, then for every $\ell \geq 2$ it must contain $\Omega(\varepsilon^{3\ell} n^{2\ell+1})$ copies of $C_{2\ell+1}$.

Testing Noise Assumptions of Learning Algorithms, by Surbhi Goel, Adam R. Klivans, Konstantinos Stavropoulos, and Arsen Vasilyan (arXiv). Testable learning has seen a surge of interest since its (recent) introduction by Rubinfeld and Vasilyan (2023). In this framework, a learning algorithm which works under some data distribution assumption (e.g., the data is from a spherical ~~cow~~ Gaussian) is not “off the hook” when that assumption isn’t met, as is the case in classical learning. Instead, the algorithm must put its money where its algorithmic mouth is: if the data does indeed satisfy the assumption, then it must output a hypothesis that satisfies the learning guarantee; if the data does not satisfy the assumption, it is allowed to abort and output an error flag; but if it does output a hypothesis, regardless of whether the distributional assumption is met then that hypothesis must satisfy the learning guarantee. In this sense, the algorithm must act like a property tester for the distributional assumption made on the data.
This paper extends the testable learning framework from data distribution tonoisy data generation model: the assumption to be tested (and used) is no longer only on the distribution of the data (regardless of the associated labels), but on the distribution of the pair (data, label), including the way the label may be corrupted. In particular, the authors focus as an application on learning high-dimensional origin-centered halfspaces, where the assumption is that the data is from a Gaussian distribution, with labels perturbed by Massart noise.

Learning multivariate Gaussians with imperfect advice, by Arnab Bhattacharyya, Davin Choo, Philips George John, and Themis Gouleakis (arXiv). Suppose you want to learn the mean (or, if the covariance isn’t known, even better, the mean and covariance) of a high-dimensional Gaussian from i.i.d. samples. You’re in luck: we know how to do it, and the algorithm is very simple! You’re not in luck, though: you’ll need a lot of these i.i.d. samples to achieve non-trivial accuracy. A number either linear or quadratic in the dimension, depending on whether you’re learning only the mean vector or the whole thing.
But say you’re in “luck”: a “good” (but not very trustworthy) friend comes to your help, claiming they already know the mean and covariance, and tell you what they (claim they) are. Can you use this possibly unreliable advice to learn the Gaussian better? This is the setting of learning with advice, and, in this paper, the authors show that yes, when learning Gaussians, you can! And, what’s even better (for this blog), the algorithm they design uses as a core subroutine a tolerant tester, which allows them to carefully checks the quality of the “advice”.

News for December 2024

3 Replies

Happy New Year to you and your loved ones! Welcome to the first post of 2025. This month, we feature four papers: one on property testing in the huge object model, two on distribution testing, and a fourth that, while not strictly a property testing paper, was too intriguing to ignore. The last paper recounts the resolution of a fascinating bet between Peter Sarnak and Noga Alon. With that teaser, let’s dive in!

Testing vs Estimation for Index-Invariant Properties in the Huge Object Model by Sourav Chakraborty, Eldar Fischer, Arijit Ghosh, Amit Levi, Gopinath Mishra, and Sayantan Sen (arXiv) Let me begin by reminding you the setup for distribution testing in the huge object model which was introduced by Goldreich and Ron. Here is a brief refresher of this model. Suppose you want to test whether a distribution $\mathcal{D}$, supported over the Boolean hypercube, satisfies a certain property $\mathcal{P}$. To do this, you sample a string $x \sim \mathcal{D}$, where $x$ is of length $n$. Huge object model considers situations where $n$ is very large and so you instead assume query access to the sampled strings. In this model, the distribution $\mathcal{D}$ is considered $\varepsilon$-far from $\mathcal{P}$ if the earthmover distance (EMD) between $\mathcal{D}$ and the closest distribution satisfying $\mathcal{P}$, measured with respect to the relative Hamming distance between bitstrings, is at least $\varepsilon$. In our July 2022 post, we covered a paper by a subset of the authors which presented efficient testers in this model for index-invariant properties whose support had bounded VC dimension.

The main result of the featured paper shows the following. For an index invariant property, you can basically upgrade a plain tester to a tolerant tester. Thus, the ability to efficiently $\varepsilon$-test an index-invariant distribution property in the huge object model translates into an ability of being able to estimate distance from the property.

Optimal Algorithms for Augmented Testing of Discrete Distributions by Maryam Aliakbarpour, Piotr Indyk, Ronitt Rubinfeld, and Sandeep Silwal (arXiv) Let us take the standard setup of discrete distribution testing supported over $[n]$ with a slight twist. Suppose you can assume that the underlying distribution $\boldsymbol{p}$ is not entirely unknown. As the paper argues, this might be a realistic assumption in distributions dealing with network traffic data or search engine queries. Among a couple more, the main result of this paper shows that indeed, with a good proxy $\boldsymbol{p}’$ for the input distribution $\boldsymbol{p}$ i.e., a situation where say $\|\boldsymbol{p}’-\boldsymbol{u}\|_1 \gg \|\boldsymbol{p}’ – \boldsymbol{p}\|_1$ and $\|\boldsymbol{p}’ – \boldsymbol{p}\|_1$ is small enough, you get testers for uniformity testing with sample complexity $O(1)$. (Here, $\boldsymbol{u}$ denotes the uniform distribution over $[n]$. In this framework, the authors also present algorithms with improved sample complexity for identity testing and closeness testing.

Settling the complexity of testing grainedness of distributions, and application to uniformity testing in the Huge Object model by Clement Canonne, Sayantan Sen, Qiping Yang (ECCC) A discrete distribution is called $m$-grained if all probabilities are integer multiples of $1/m$. If you are a regular PTReview reader, you might recall our News for September 2021 which featured a paper by Goldreich and Ron which proved an $\Omega(n^c)$ lower bound for testing $m$-grainedness for any $c < 1$. Goldreich and Ron also conjectured that the true lower bound is actually $\Omega(n/\log n)$ (when $m = \Theta(n)$). The current work resolves this conjecture settling the complexity of this problem.

Ramanujan Property and Edge Universality of Random Regular Graphs by Jiaoyang Huang, Theo Mckenzie, and Horng-Tzer Yau (arXiv) So, yes here is the (non-property testing paper) I wanted to tell you about. Let me start with the juicy bit, So, Peter Sarnak and Noga Alon had a bet about the following situation: Fix $d \in \mathbb{N}$ and let us take a random $d$-regular graph. Sarnak conjectured that the probability this graph is in fact a Ramanujan expander goes to zero as $n \to \infty$ whereas Alon conjectured that this probability tends to one as $n \to \infty$. The featured paper shows that while this probability decreases with increasing $n$, it approaches a limiting value which is around $0.69$. You can watch this juicy bit here.

News for November 2024

Leave a reply

Alas, another late post! But we compensate for that. With a rich hoard of ten papers this month, covering quantum property testing, distribution testing, matchings, Steiner trees, linearity testing, and dealing with corruptions. And some papers on multiple topics simultaneously.

Uniformity testing when you have the source code by Clément L. Canonne, Robin Kothari, and Ryan O’Donnell (arXiv). Consider classic problem of uniformity testing, where the domain is $[d]$. We all know by now that the complexity of uniformity testing is $\Theta(\sqrt{d}/\varepsilon^2)$, where $\varepsilon$ is the proximity parameter. Suppose the distribution was the output of an algorithm (like, a hash function) and we have access to the source code. (The source code is thought of as a circuit that outputs a single domain element.) Can we beat this bound? Surprisingly, the answer is yes, when the tester is allowed to be quantum. A single “sample” is basically a run of the code. The main result is an upper bound of $O(d^{1/3}/\varepsilon^{4/3})$ (with some additional terms for the low $\varepsilon$ setting). Tight lower bounds for this setting are still unknown, so this research direction of “distribution testing with the source code” may lead to many more results.

Coherence in Property Testing: Quantum-Classical Collapses and Separations by Fernando Jeronimo, Nir Magrafta, Joseph Slote, and Pei Wu (ECCC). The distribution testing problem again, where the domain is $\{0,1\}^n$ and thought of as exponentially large. To distinguish (say) a uniform distribution with support size $2^{n/8}$ vs support size $2^{n/4}$ would require exponentially many ($2^{n/16}$) samples. From a quantum standpoint, we can restrict the input distribution to be coherent. Intuitively, this is analogous to being restricted to a subcube. Even then, the paper shows that the testing problem requires exponentially many samples. To show a separation between classical and quantum algorithms, the paper gives a model of interactive protocols for distribution testing (which has been studied before in the classical setting, like Chiesa-Gur). In this setting, the paper gives a quantum algorithm that runs in $\textrm{poly}(n)$ time with polynomially many proof strings, and can approximate the support size of coherent distributions. Classical algorithms, on the other hand, still require exponentially many queries, even with access to proof strings.

Testing classical properties from quantum data by Matthias C. Caro, Preksha Naik, and Joseph Slote (arXiv). Property testing gains its power because of query access, since the tester can “zero in” on the relevant portion of the data. Sample based testers often have far worse complexities. Such testers only get access to random samples of the data/function. Consider access to a function $f:\{0,1\}^n \rightarrow \{0,1\}$. The classic property of monotonicity can be tested in $\sqrt{n}$ queries (ignoring the proximity parameter dependencies), but requires $2^{\Theta(\sqrt{n})}$ samples. The paper studies sample based property testing, except with quantum “samples”. In the quantum world, the function $f$ is stored/represented as a quantum superposition of states. Quantum samples can obtain richer information, like sampling the Fourier coefficients of $f$. (Quantum queries give even stronger access.) This allows for many classical query-based property testers to become quantum sample-based testers. This paper gives results for many fundamental properties like monotonicity, symmetry, and triangle freeness.

Tolerant Testing of Stabilizer States with Mixed State Inputs by Vishnu Iyer and Daniel Liang (arXiv). Another quantum testing paper, but here, the property itself is quantum. The input is a quantum state, and the aim is test the property of being a stabilizer state. In all previous work on property testing of being a stabilizer, it was assumed that the input is a pure state. (One should think of a pure state as a sort of “basis” state, while a mixed state is a linear combination of these states.) Given noise, one would typically expect the input to be mixed. This paper gives the first tolerant tester for stabilizer states, where the input is allowed to be a mixed state.

Stochastic Matching via In-n-Out Local Computation Algorithms by Amir Azarmehr, Soheil Behnezhad, Alma Ghafari, and Ronitt Rubinfeld (arXiv). This is not a sublinear result by itself, but has connections that should interest our readers. The main problem is stochastic matching. We are given a graph $G$. An input $G_p$ is generated by keeping each edge independently with probability $p$. The aim is to construct a sparse graph $H$ such that the expected matching size of $H \cap G_p$ is close to the expected matching size of $G_p$. After a long line of work, this paper shows that there exists a subgraph $H$ of $\textrm{poly}(1/p)$ degree whose expected matching size is a $(1-\varepsilon)$-approximation of the overall expectation. The main analysis technique is to study a local computation algorithm (LCA) for computing the maximum matching. An LCA essentially gives local access to a large output (say, a matching or coloring) such that each matching edge (say) of the output can be computed with sublinear lookups to the input. The standard complexity measure is the number of lookups of the input to compute a matching edge. But this paper looks at the “in complexity”, which is the number of queries that lookup a given edge. When both complexities are small, one can show a “decorrelation” of the overall output, which is used in the analysis.

Nearly Tight Bounds on Testing of Metric Properties by Yiqiao Bao, Sampath Kannan, and Erik Waingarten (arXiv). Consider an $n \times n$ matrix of “distances” between $n$ points. The aim is to test the fundamental property of the distances forming a metric. Despite a long line of work studying property testing of distances, points, etc., there has surprisingly been no result on this problem. The earliest work in this line is by Parnas-Ron, studying the property of being a tree metric or ultrametric (which satisfy a stronger version of triangle inequality). This paper gives a $O(n^{2/3}/\varepsilon^{4/3})$ query algorithm for property testing metrics. There is a also a nearly matching lower bound, if one assumes that the dependence on $\varepsilon$ is polynomial. For the problems of testing tree metrics (and ultrametrics), the paper gives a $O(1/\varepsilon^2)$ query algorithm, an improvement from the previous bound of $O(1/\varepsilon^6)$.

Sublinear Metric Steiner Tree via Improved Bounds for Set Cover by Sepideh Mahabadi, Mohammad Roghani, Jakub Tarnawski, Ali Vakilian (arXiv). This paper is on the classic metric Steiner tree problem. The input is an $n \times n$ matrix of distances in a metric and a subset $S$ of points. The aim is to estimate the minimum weight Steiner tree (a tree that connects all the terminals $S$). Getting a $2$-approximation is quite easy, since one can just consider a spanning tree on $S$. A result of Chen-Khanna-Tan give a $(2-\eta)$-approximation algorithm that runs in $O(n^{13/7})$ queries (for some positive constant $\eta$). This paper gets the same approximation with $O(n^{5/3})$ queries. The main technical work goes in designing a sublinear algorithm for a version of set cover problem, where the input set system is given as a matrix.

On Optimal Testing of Linearity by Vipul Arora, Esty Kelman, Uri Meir (arXiv). In property testing, it’s hard to get more fundamental than linearity testing. What is there new to say about this well-studied problem? This paper studies linearity testing under the online manipulations model of Kalemaj-Raskhodnikova-Varma. In this model, after every query of the tester, an adversary corrupts up to $t$ entries on the input. Previous results show that linearity testing can be done in $O(\log t + 1/\varepsilon)$ queries. But, the paper notes, all previous results require $t \leq 2^{n/c}$ for some $c \geq 2$. How far can be push $t$? Remarkably, the main result shows that $t$ can be as large as $2^n/\varepsilon^2$ and linearity testing is still feasible under online manipulations. The next result of this paper studies the property of low degree polynomials over the reals. The paper gives an optimal $O(1/\varepsilon)$-query tester for the property of being a $d$-degree polynomial.

Online versus Offline Adversaries in Property Testing by Esty Kelman, Ephraim Linder, and Sofya Raskhodnikova (arXiv). This paper is related to the online manipulations model discussed above. There is also an offline “erasure” model, wherein some coordinates/bits of the input are erased by an adversary. The original paper of Kalemaj-Raskhodnikova-Varma showed that sortedness of arrays can be tested with offline erasures, but cannot be tested with online erasures even when $t=1$. This paper proves a complementary theorem. There are properties that can be tested with a $O(1)$ queries for $t = O(n/\log n)$ in the online manipulations model, but requires $\Omega(\log n)$ queries in the offline setting with $\Theta(n/\log\log n)$ erasures. So the online and offline models are incomparable in power. There is a similar result for a setting where $t$ is constant. The lower bounds are shown using a property regarding repeated characters in strings.

News for October 2024

1 Reply

Four* papers on property testing last month! Lower bounds, upper bounds, distribution testing, quantum, and a new testing model!

* at least four. If we missed any, please let us know in the comments!

Lower Bounds for Convexity Testing, by Xi Chen, Anindya De, Shivam Nadimpalli, Rocco Servedio, and Erik Waingarten (arXiv). You’re given a membership oracle to a set $S$ in $\mathbb{R}^n$ (that is, query access to its indicator function $f_S\colon \mathbb{R}^n\to \{0,1\}$), and asked to decide if this set is convex, or “far from it”. This is a very natural and seemingly basic question— of course, we need to define what “far” means here, and the natural (normal, one may say) choice of underlying measure in $\mathbb{R}^n$ is the standard Gaussian measure: $d(S,T) = \Pr_{\mathcal{N}(0,I_n)}[ x \in S\Delta T]$.
Previous algorithms for this convexity testing question (and its tolerant testing analogue) are non-adaptive, and have $2^{\tilde{O}(\sqrt{n})}$ query complexity. This paper shows that this is not just unfortunate, but also necessary: every non-adaptive tolerant tester for this question must make $2^{\Omega(\sqrt[4]{n}}$ queries, and every (possibly adaptive) one-sided tester must have polynomial query complexity.

Replicable Uniformity Testing, by Sihan Liu and Christopher Ye (arXiv). In property testing, the algorithm must say YES with high probability on inputs which have the property, and NO with high probability on those which are far. On anything else, the algorithm is off the hook and can output either. This is typically considered to be fine, and, in any case, necessary to be able to obtain ultra-efficient algorithms. But what if, in this third case, we wanted to put the algorithm partially back on that hook, and required it to be consistent? The algorithm can answer either YES or NO, sure, but if I run it again on that same input, it should give the same answer with high probability. This is in line with a recent line of works on replicable algorithms, and is non-trivial to achieve in (the standard model of) distribution testing, where a distribution testing algorithm only gets to see random samples from the distribution, and thus needs to have a replicable behavior over that randomness. This paper introduces the question of replicable distribution testing, and provides both upper and lower bounds (essentially matching, with an asterisk) for the flagship task of uniformity testing.

Quantum property testing in sparse directed graphs, by Simon Apers, Frédéric Magniez, Sayantan Sen, and Dániel Szabó (arXiv). Graph property testing has a long and rich history in the classical setting, spanning more than two decades. There are several testing models, depending on whether the graph is dense, sparse, and directed or not: and even in the sparse, directed case, it is important to sometimes only allow outgoing edge queries. All these variants capture different meaningful scenarios, and relations and separations between them are known. This paper opens the direction of quantum testing for sparse graphs, either directed or not. The authors investigate what advantage quantum computers can bring for graph testing in this setting, and show one natural property for which a quadratic speedup exists: $o(\sqrt{n})$ quantum queries in the outgoing-edge-query-only (unidrectional) sparse model, while classically $\Omega(n)$ are necessary. They also show that this is not always the case: quantum testing of 3-colorability, as in the classical case, does not admit a $o(n)$-query tester.

Relative-error monotonicity testing, by Xi Chen, Anindya De, Yizhi Huang, Yuhao Li, Shivam Nadimpalli, Rocco Servedio, and Tianqi Yang (arXiv). Property testing of Boolean functions is defined “absolutely“: the distance between two functions is the fraction of the domain on which they differ, i.e., $\displaystyle\frac{|f^{-1}(\{1\})\Delta g^{-1}(\{1\})|}{2^n}$
This makes sense when the functions have a reasonable number of satisfying assignments: but may be much less meaningful for sparse functions, which only are non-zero on a $o(1)$ fraction of the inputs—for instance, functions where “all the action” is concentrated in a tiny subcube of the hypercube. All these functions are vanishingly close to each other! To address this, the authors introduce a new distance notion, relative-error, where the distance from $g$ to $f$ is scaled by the sparsity of $f$:
$\displaystyle\frac{|f^{-1}(\{1\})\Delta g^{-1}(\{1\})|}{|f^{-1}(\{1\})|}$
This requires a slightly different access model to avoid trivial impossibility results, so the tester is augmented with sampling access to satisfying assignments of $f$, on top of query access to $f$ (as otherwise it may just never even find one satisfying assignment). After introducing and motivating this testing model, the paper initiates its study in the specific case of testing monotonicity of Boolean functions.

News for September 2024

Leave a reply

So, we had a pretty fantastic September. Besides the fact that September saw eight papers, the Computer Science community also bagged two Nobel Prizes(!) — reactions to which are kind of mixed from what I see around me. Anyhow without further delay, let us circle back to property testing. So, eight papers, yes. Without further ado, let us look at our spread.

Public Coin Interactive Proofs for Label-Invariant Distribution Properties by Tal Herman (ECCC) Suppose you have an unknown distribution $\mathcal{D}$ supported over $[N]$. Suppose I claim that this distribution has entropy $\texttt{blah}$. You have sample access to $\mathcal{D}$ and you want to check my claim. To this end, you decide to engage me, a suspicious shady prover, in an Interactive Protocol where you, the verifier is restricted to use public coin tosses. The main result of this paper asserts that you can do this with a mere $\widetilde{O}(N^{2/3})$ samples. You also get the same bound on the communication complexity. What’s more is that you get algorithms with running time of the same order. What’s even more, is that you get similar algorithms for all label invariant properties of $\mathcal{D}$. You can contrast this with the seminal result of Valiant and Valiant from 2011 which asserts that you can approximate the distance between your input distribution $\mathcal{D}$ and the label invariant property of interest (without any prover) using $\Theta(N/\log N)$ samples. So, this result shows that the computation under the interactive model is more efficient than standalone computation even in the public coin toss model.

How to Verify Any (Reasonable) Distribution Property: Computationally Sound Argument Systems for Distributions by Tal Herman and Guy Rothblum (arXiv) Let us consider the same setup as above. Again, you have an unknown distribution (supported over $[N]$) which you have sample access to and I assert that this distribution has a certain property. You want to verify whether my claim is correct or bogus. The main algorithmic result of this paper has a cryptographic ring to it: Assuming the existence of collision resistant hash functions, the authors show that for any reasonable distribution property, you have set up an interactive protocol where the verifier can decide whether or not the prover’s claim is bogus using $\widetilde{O}(\sqrt N)$ samples. Also, the communication complexity is of the same order.

Random local access for sampling k-SAT solutions by Dingding Dong and Nitya Mani (arXiv) I find this paper pretty intriguing. So, here is the setup. I give you some $k$-SAT instance and I promise that no variable in this instance shows up in more than $d$ clauses. Recall that for $d \leq 2^k/(2e k)$, Lovasz Local Lemma assures you that there exists a satisfying assignment. What’s more is that the famous Moser-Tardos algorithm even allows you to find one satisfying assignment in polynomial time. In a beautiful work, in the regime where $d \leq 2^{ck}$ (where $0 < c < 1$ is a sufficiently small constant), Moitra gave samplers which return an almost uniformly random satisfying assignment. Note that this is not possible direct adaptation the Moser-Tardos algorithm. Anyhow, back to the setting considered in this work. So, consider the following sublinear challenge. Denote by $\mu$ the uniform distribution supported over all satisfying assignments to the given $k$-SAT instance where each variable shows up in more than $d \leq 2^{ck}$ clauses. I want you to sample the assignment $\sigma(v)$ for a variable $v$ from the marginal $\mu_v$ induced by $\mu$ on $v$ fast (in $poly(\log n)$ time) in expectation. Of course this has to be done in some consistent fashion overall which is very LCAish in flavor and I will not detail that here. Rest assured the featured paper rises up to the challenge.

New Direct Sum Tests by Alek Westover, Edward Yu, Kai Zheng (arXiv) We say that a $\mathbb{F}_2$-valued function $f$ over the $d$-dimensional grid $f \colon [n]^d \to \mathbb{F}_2$ is a direct sum if there are $d$ one-dimensional functions $L_i \colon [n] \to \mathbb{F}_2$ such that $f(x) = \sum_{i=1}^d L_i(x_i)$. In a paper we covered in October 2019, Dinur and Golubev presented an algorithm for direct sum testing and left its analysis for future research. The featured paper analyzes this test and shows that it is indeed a bonafide direct sum tester. I omit the description of the tester. I will just leave with one note — this looks like an appetizing read for Boolean Fourier Analysis aficionados.

Directed Hypercube Routing, a Generalized Lehman-Ron Theorem, and Monotonicity Testing by Deeparnab Chakrabarty and C. Seshadhri (arXiv) This is a very refreshing read with (our very own) Seshadhri as one of the authors. Chakrabarty and Seshadhri have teamed up (often in a trio with Hadley Black) in their attempts to understand the deep dark secrets of testing boolean monotonicity over the boolean hypercube. The featured paper conjectures a bold generalization of Lehman-Ron theorem over the hypercube and suggests that the conjectured routing once established could pave the path forward towards a more transparent understanding of directed isoperimetric inequalities over the boolean hypercube. In my opinion, it is worth your while to check this one out.

Lempel-Ziv (LZ77) Factorization in Sublinear Time by Dominik Kempa and Tomasz Kociumaka (arXiv) A pretty unique topic. The main result of this paper is what you see in the title — after 50 years since its introduction, finally we can factorize LZ77 in sublinear time. If you are like me, you might be asking but what does that mean? Quoting from the abstract

Lempel-Ziv (LZ77) factorization is a fundamental problem in string processing: Greedily partition a given string T from left to right into blocks (called phrases) so that each phrase is either the leftmost occurrence of a letter or the longest prefix of the unprocessed suffix that has another occurrence earlier in T.

Indeed, the abstract has more fascinating information about this problem. Quoting again.

Sublinear-time algorithms are known for nearly all other fundamental problems on
strings, but LZ77 seems resistant to all currently known techniques.

Looks like new year came early for PTReview readers. This paper promises to be fun by the contents I sampled so far.

Computing String Covers in Sublinear Time by Jakub Radoszewski and Wiktor Zuba (arXiv) A substring (actually prefix) $C$ is called a cover of text $T$ is $T$ can be constructed by concatenations and superpositions of $C$. Suppose the text string $T$ contains $n$ symbols from some alphabet of fixed size. The main theorem of the featured paper asserts that given a string $T$, you can find a representation of all of its covers (at most $n$ of them — they are all prefixes) in merely $O(n/\log_{\sigma} n)$ time. This bound is also shown to be optimal — indeed the authors show that you cannot compute a representation (in a certain model formulated by Charalampopoulos et al in FOCS 2020) for the shortest cover in less than $o(n/log n)$ time.

Quantum Channel Testing in Average-Case Distance by Gregory Rosenthal, Hugo Aaronson, Sathyawageeswar Subramanian, Animesh Datta and Tom Gur (arXiv) This paper considers a new diversion in quantum land. Namely, it considers the task of testing quantum channels. The setup is this: I have a $d$-dimensional quantum system which is just a $\mathbb{C}^{d \times d}$ psd matrix over complex numbers with trace $1$. A quantum channel is a just a linear transformation that maps $d_{in}$ dimensional quantum systems to a $d_{out}$-dimensional quantum system. With repsect to some appropriate norm (the so-called diamond norm), one of the results in this paper proves a $poly(d_{in})/\varepsilon$ lower bound for testing identity to any fixed channel in diamond distance. This lower bound is shown to hold in a very strong query model appropriate for the quantum setting.

New for August 2024

Leave a reply

Apologies for the late post! After the relative silence of last month, we’re getting back to a moderate cadence of papers. Some distribution testing, quantum testing, learning and testing. We’ve also added a non-sublinear paper on distributions that should be of interest. And here’s the roundup.

Efficient Testable Learning of General Halfspaces with Adversarial Label Noise by Ilias Diakonikolas, Daniel M. Kane, Sihan Liu, and Nikos Zarifis (arXiv). This paper is on the recent model of testable learning introduced by Rubinfeld-Vasiliyan. Consider learning halfspaces over the Gaussian distribution. We have sample access to an input function $f:\mathbf{R}^d \to \{0,1\}$, where our aim is learn the closest halfspace to $f$. The samples comes from some fixed, underlying distribution $\mathcal{D}$. But it is often infeasible to validate this distributional assumption, even in the property testing framework. A tester-learner pair will try to test this assumption, and if it accepts, apply the learning algorithm. The guarantee is: if $\mathcal{D}$ is indeed (say) Gaussian, then the learning algorithms works as desired. If the tester accepts and the learner outputs a hypothesis (say, halfspace) $h$, then it is guaranteed that $h$ is close to $f$ according to $\mathcal{D}$, even if $\mathcal{D}$ is far from Gaussian. This last part makes the whole setup interesting; the distribution tester might fail to reject the distribution, but we can trust the output hypothesis! There have been many results on learning homogenous halfspaces under the Gaussian distribution. So the hypothesis class consists of halfspaces going through the origin. This paper is about general (inhomogenous) halfspaces. At first blush, this might seem like a simple issue; we’re just adding a constant term to the halfspace. But existing techniques break down, often because they’re solving an optimization problem of minimizing error around a band close the origin. This paper gives a careful reduction to the “nearly” homogeneous case, and gives the first polynomial sample tester-learner pair for this problem.

Tolerant testing stabilizer states by Srinivasan Arunachalam and Arkopal Dutt (arXiv). Let us start with a familiar classical problem, low degree testing. Given access to a function $f:\{0,1\}^n \to \{0,1\}$, we wish to test if $f$ is a quadratic polynomial (over $\mathbb{F}_2$). There exist testers based on the Gowers-norm: basically, compute various (constant dimensional) discrete derivatives and check they are consistent with a quadratic function. These discrete derivatives can be analyzed using Fourier analysis, and the main aim is show that a function that “locally” (in terms of derivatives) behaves like a quadratic is indeed globally close to being one. This method can be used to get tolerant testers for quadratic polynomials. This paper is on a quantum generalization of this testing result. The input is a qubit $|\psi_f\rangle$, promised to be a “phase state”. A phase state has a Boolean function $f$ embedded in it, because one can write a phase state as a linear combination of $2^n$ “base” states, with the coefficients being Boolean. A “stabilizer state” is one where the function $f$ is a quadratic (or so I believe, I’m probably exposing my ignorance of quantum mechanics). This paper uses the Gowers norm techniques to give the first quantum tolerant tester for stabilizer states.

Improved Bounds for High-Dimensional Equivalence and Product Testing using Subcube Queries by Tomer Adar, Eldar Fischer, and Amit Levi (arXiv). Consider distribution testing on high-dimensional data, so the domain is $\{0,1\}^n$. A nice model for distribution tester is the subcube conditioning model of Bhattacharyya and Chakraborty. Suppose we fix any subset $S \subseteq [n]$ coordinates with $x_S$. We can generate a sample $y$ from the distribution, conditioned on $y_S = x_S$ (meaning, $y$ agrees with $x$ on $S$). The problem is to perform equivalence testing of distributions on this model. Previous results got $O(n^2/\varepsilon^2)$ query algorithms, and this paper give a significantly improved algorithm making $O(n/\varepsilon^2)$ queries. Interestingly, the algorithm only makes weaker queries. One distribution is only accessed by “marginal queries”. So, given $x_S$ as before, but we only sample the marginal distribution over coordinate $i$, conditioned on $S$ being fixed as $x_S$. (Hence, the output is a single bit. Also, we note that the other distribution is accessed by prefix queries, a weaker version of subcube queries.) These generalizations lead to more results, on testing equivalence in the interval query model, and for testing the property of product distributions. This paper also proves a $\Omega(\sqrt{n}/\varepsilon^2)$ lower bound for testing product distributions.

Parallel Sampling via Counting by Nima Anari, Ruiquan Gao, and Aviad Rubinstein (arXiv). This isn’t a “typical sublinear algorithm” per se, but I think it is quite interesting to those of us who think about sublinearity, adaptivity, and distributions. This result has connections to the previous paper. Our aim is to sample from an unknown distribution $\mu$ over $[q]^n$. We have access to “marginal queries” as described above. This problem appears in large language models, wherein neural nets are trained on various marginals (next word), but the output is a sentence (list of words). Observe there is a simple “$O(n)$ round” algorithm. Without any fixing, query the marginal of the first coordinate. Fix the first coordinate, query the marginal of the second coordinate. Fix the first two coordinates, rinse and repeat. This requires $n$ rounds with the marginal query. In this model, polynomially many marginal queries can be made in a single round, and the aim is to minimize the number of rounds (basically, bounding the adaptivity). This paper gives a $\widetilde{O}(n^{2/3})$ round procedure for sampling, and shows an $\Omega(n^{1/3})$ lower bound.

Property Testing Review

The latest in property testing and sublinear time algorithms

Announcing WoLA 2025 in Chicago

News for April 2025

News for March 2025

News for February 2025

News for January 2025

News for December 2024

News for November 2024

News for October 2024

News for September 2024

New for August 2024