News for March 2025

After a good spell last month, the community saw no reason to relent. This month, we had five papers. From tolerant testers for fundamentally important graph problems, to privately testing distributions, to more explorations in the conditional sampling model, to counting and sampling motifs. Also included as a bonus is a result on mulitpass streaming lower bounds for the approximating max-cut.

A Tolerant Independent Set Tester (arXiv) (by Cameron Seth) Consider the setting of testing graph properties in the dense graph model where you have access to the the adjacency matrix of the graph and you may check whether a pair \((i,j)\) is connected by an edge or not. In this model, consider the following task: I want you to design a sublinear time procedure which on input a dense graph, runs in time \(poly(\rho/\varepsilon)\) and returns YES if the graph is \(\varepsilon_1 = \widetilde{\Omega}(\varepsilon)\)-close to having an independent set of size at least \(\rho n\) and returns NO if the graph is \(\varepsilon_2 = \varepsilon\)-far from all graphs that have an independent set of size at least \(\rho n\). The verdict is required in both cases to hold with probability at least \(2/3\). Alright, that’s what the title pretty much gave away already. So, how does the paper establish this result?

To this end, the paper uses a remarkably simple algorithm. You can pick a random subset \(S \subseteq V\) of size \(s = \widetilde{O}(\rho^3/\varepsilon^2)\) vertices and foucs on counting how many edges you see in various subsets of \(S\) all of size \(\rho s\). If you see some subset with size \(\rho s\) which induces \(\varepsilon_1 s^2\) edges, you accept \(G\). If this condition fails, you reject \(G\). The analysis presented in the paper is intricate and proceeds by proving a new container lemma for sparse graphs (rather than independent sets). It is a good read and you should totally take a look at this paper.

Better Private Distribution Testing by Leveraging Unverified Auxiliary Data (arXiv) (by Maryam Aliakbarpour, Arnav Burudgunte, Clément Canonne (our own!), and Ronitt Rubinfeld) If our January post is fresh in your mind, you might recall the theme: assuming you know a little about the distribution \(\mathbf{p}\) presented to you allows you give improved algorithms for classic distribution testing tasks such as identity testing and uniformity testing (I am assuming support of my distributions is \([n]\)). This paper considers these problems in with the setting of differential privacy. So, you want to design private algorithms for these classic tasks where you get to assume that you know something about the target distribution — that is, you assume it is not completely unknown just like we had in our January posting. The paper presents sample efficient private algorithms both for identity testing and closeness testing.

Optimal mass estimation in the conditional sampling model (arXiv) (by Tomer Adar, Eldar Fischer, and Amit Levi) As the title suggests, this paper considers the task of estimating the probability mass of all elements in the support of a probability distribution sitting on a finite universe \(\mathcal{U}\). Of course, you have to assume stronger oracle access for this task and the oracle people assume access to is the conditional sampling oracle. You get to specify a subset \(S \subseteq \mathcal{U}\) and you will get a sample from \(S\) with the correct conditional probability of element being in \(S\) relative to the underlying distribution. Meel-Kumar-Pote gave an algorithm for this task which requests \(poly(\log n)/poly(\varepsilon)\) samples. The featured paper, on the other hand, presents algorithms for this problem which request only \(O(\log(\log n)/poly(\varepsilon))\) samples. The authors also show a mathcing lower bound effectively closing this problem.

Approximately Counting and Sampling Hamiltonian Motifs in Sublinear Time (arXiv) (by Talya Eden, Reut Levi, Dana Ron, and Ronitt Rubinfeld) Suppose I give you a large graph \(G = (V,E)\) and a small subgraph \(H\) whose number of occurences in \(G\) I wish to (approximately) count. You may assume that you have access to an oracle which can give you a random vertex, the \(i\)-th neighbor of your favorite vertex, degree of a vertex, and can also answer whether a particular pair of vertices is connected by an edge. This is the standard model we use for coming up with sublinear time algorithms for graph problems. It is contrasted sometimes with the augmented model where in addition to all of the above, you also get to assume access to an oracle which can give you uniformly random edges in the graph \(G\). As you can imagine, the augmented model allows you to count the number of occurences of \(H\) for a much richer class of small subgraphs \(H\). Indeed, this used to be the status-quo till this paper. Indeed, before this result came out, we knew that in the augmented model, we can pretty much count the number of occurences of all small subgraphs \(H\). Whereas, in the standard model, we only knew how to count edges, stars and cliques. The featured paper makes progress towards this divide and presents algorithms for approximately counting the number of copies of \(H\) for a much richer class of small subgraphs \(H\). One thing you might want to try is to simulate the algorithms in the augmented model by hacking up algorithms for returning uniformly random edges. However, that approach is a no go as the running time to sample u.a.r edges can be prohibitive for a sublinear time implementation in the standard model. This paper develops ideas which bypass such annoyances.

Multi-Pass Streaming Lower Bounds for Approximating Max-Cut (arXiv) (by Yumou Fei, Dor Minzer, Shuo Wang) This result, as the title revealed is from the streaming literature. But it should be of interest to our readers and so we cover it here. The essential summary is this: If you want to get a better than \(1/2\) approximation algorithm for the max cut problem in the streaming setting — like a \(1/2 + \varepsilon\) approximation — be ready to pay up \(poly(n)\) passes. In more detail, the technical statement reads out as follows: Any randomized, \(k\)-pass \(1/2 + \varepsilon\) approximation algorithm for max-cut requires \(\Omega(n^{1/3}/k)\) space. The lower bound results in this paper are inspired by the communication complexity lower bound of a certain problem investigated in the seminal work of Kapralov-Krachun from 2019 where the authors showed that any single pass streaming algorithm for max-cut aiming to produce a better than \(1/2\) approximation must cough up \(\Omega(n)\) space. This requires a lot of care as in the multipass streaming setting, you cannot hope to show your communication complexity lower bounds by a naive application of discrepancy method. The delicate finesse needed is hijacked in by a clever use of hypercontractivity. Looks like this will be an interesting treat for our intrepid readers.

Property Testing Review

The latest in property testing and sublinear time algorithms

Leave a Reply Cancel reply