Monthly Archives: February 2026

News for January 2026

After some insanely busy months, we have a merely busy month. A couple of neat results on sublinear graph algorithms (a subject dear to me!), DNF testing, and two expositions that should be of interest to our readers.

When Local and Non-Local Meet: Quadratic Improvement for Edge Estimation with Independent Set Queries by Tomer Adar, Yahel Hotam, Amit Levi (arXiv). We’ve seen a lot of recent activity of the fundamental problem of edge estimation in graph. Given a simple, connected, undirected graph \(G = (V,E)\) with \(n\) vertices, we want to get a \((1+\varepsilon)\)-approximation to the number of edges \(m\). In the standard adjacency list access model, a classic result of Goldreich-Ron proves that the sample complexity of this problem is \(\Theta(n/\sqrt{m})\) (ignoring \(poly(\varepsilon^{-1} \log n)\) factors). A completely different access model is the Independent Set (IS) model. For any set \(S\), an oracle outputs a single bit indicating whether an edge is present in \(S\). Again, in this model, the optimal bound is \(\Theta(n/\sqrt{m})\). This paper shows that with access to all oracles (standard adjacency list and IS queries), one gets a quadratic improvement and the complexity is \(\Theta(\sqrt{n/\sqrt{m}})\).

Spectral Clustering in Birthday Paradox Time by Michael Kapralov, Ekaterina Kochetkova, Weronika Wrzos-Kaminska (arXiv). Consider a (bounded degree) graph \(G\) that is can be partitioned into \(k\) roughly equal “expanding clusters”. This means that one can remove an \(\varepsilon\)-fraction of the edges to get \(k\) connected components of size around \(n/k\), each of which has expansion at least \(\phi\). The aim is to get a sublinear oracle that determines the cluster that a vertex belongs to. The origins of this problem go back to problems in expansion testing and local expansion reconstruction. From a spectral perspective, the ideal “cluster classifier” is to simply take the bottom \(k\) eigenvectors of the Laplacian, and project every vertex onto this vector space. The corresponding embeddings of vertices in the same cluster will be close. This paper effectively implements this procedure in sublinear time; specifically \(\widetilde{O}(\sqrt{n})\) time, using a birthday paradox type collision argument to estimate the embedding similarities.

DNF formulas are efficiently testable with relative error by Xi Chen, William Pires, Toniann Pitassi, Rocco A. Servedio (arXiv). The relative error model for Boolean functions was recently proposed as an analogy to sparse graph testing. The standard notion of distance between two functions \(f, g: \{0,1\}^n \to \{0,1\}\) is just \(\| f – g\|_0/2^n\). But this is not meaningful when \(|f^{-1}(1)| \ll 2^n\). A natural notion of distance is to consider the sets \(f^{-1}(1)\) and \(g^{-1}(1)\) and measure their symmetric difference. One can now define distances to properties analogously. There have been numerous property testing results with this notion of distance, called the “relative error” setting. This paper considers the property of being a (small) DNF. Learning an \(s\)-term DNF requires \(n^{O(\log(s/\varepsilon))}\) queries. This paper shows that the (relative error) testing of \(s\)-term DNFs can be done in \(poly(s/\varepsilon)\) queries.

A short note on (distribution) testing lower bounds via polynomials by ClĂ©ment Canonne (ECCC). As the title says, this is a short note on a fundamental question of proving lower bounds for distribution testing. For your favorite distribution testing problem, to prove a lower bound, you start with a “Yes” distribution \(\mathcal{Y}\) and a “No” distribution \(\mathcal{N}\). So \(\mathcal{Y}\) would satisfy the desired property (say, uniformity) and \(\mathcal{N}\) would not. Then, you need to argue some sort of “statistical indistinguishability” of samples. So the distribution of the set \(s\) samples from \(\mathcal{Y}\) is almost the same as that from \(\mathcal{N}\). How does one prove the latter? A convenient lemma shows that if the first \(\ell\) moments of the distributions are the same, then we get a \(\Omega(k^{1-1/\ell})\) sample lower bound (\(k\) is the support size). The key insight is that such “matching moment” distributions/random variables can be generated by looking at specific univariate polynomials. It’s a really clever trick that looks at the powers of roots scaled by the derivative at those points. A really nice read overall!

Mathematical and computational perspectives on the Boolean and binary rank and their relation to the real rank by Michal Parnas (arXiv). This is long survey on the notion of Boolean and binary rank, and an overview of their use in TCS as well as methods to bound this rank. Section 7.3 gives an excellent discussion of property testing problems on Boolean rank. It also gives some of the main open questions regarding property testing low Boolean rank.