News for June 2025

Another (sigh) delayed post. A moderately busy month, with 4 papers with sublinear graph algorithms and, of course, distribution testing.

A 0.51-Approximation of Maximum Matching in Sublinear \(n^{1.5}\) Time by Sepideh Mahabadi, Mohammad Roghani, and Jakub Tarnawski (arXiv). The title says it all. This paper gives a 0.51-approximation for the maximum matching in, well, \(O(n^{1.5})\) time. Here’s the back story. There are two kinds of sublinear maximum matching algorithms. The first kind (achieved after a long line of impressive results) get \((1- \varepsilon)\)-approximations in time \(O(n^{2 – \Omega_{\varepsilon}(1)})\). This means that that in sublinear \(o(n^2)\) time, one can get arbitrarily good approximations. The second kind beats the simple \(0.5\)-approximation (obtained from maximal matchings), but tries for closer to \(O(n)\) time. BehnezhadRoghaniRubinsteinSaberi gave a \((0.5+\varepsilon)\)-approximation algorithm running in \(n^{1+\varepsilon}\) time. But the largest value of \(\varepsilon\) possible in these results is tiny. Can one get a “significant” improvement over 0.5-approximations in time “significantly less” than \(n^2\)? The answer is yes, as the title explains.

Testing (Conditional) Mutual Information by Jan Seyfried, Sayantan Sen, and Marco Tomamichel (arXiv, ECCC). Consider a distribution over pairs \((A, B)\). The mutual information \(I(A,B)\) of \(A\) and \(B\) is the minimum KL-divergence between the original distribution and any product distribution over the support of \(A\) and \(B\). So the mutual information is zero iff \((A,B)\) is a product distribution, or equivalently \(A\) and \(B\) are independent. A natural distribution testing question is to distinguish \(A\) and \(B\) being independent from \(I(A,B) > \varepsilon\). One can take this problem a step further. Suppose the distribution is over triples \((A,B,C)\). Then can one distinguish conditional independence (\(I(A, B | C) = 0\)) from sufficient conditional dependence, so \(I(A, B | C) > \varepsilon\)? This problem was studied by (including our very own) Canonne-Diakonikolas-Kane-Stewart, and is known that \(O(d^{7/4})\) samples suffice. Here, \(d\) denotes the maximum alphabet size for each coordinate, so the overall domain size is \(d^3\). This paper gives a much more nuanced complexity, that depends separately on the support sizes for each coordinate. It is interesting that each coordinate plays a different role in the final complexity.

Tight simulation of a distribution using conditional samples by Tomer Adar (arXiv). Consider a distribution of \(\{0,1\}^n\). Many distribution testing results would not yield useful algorithms for this setting, since the domain is exponentially large. Inspired by this setting, the subcube conditional model was introduced by Bhattacharyya-Chakraborty. In this setting, we can generate samples conditioned on any subcube. The aim of this paper goes beyond property testing. Can we actually estimate specific values of the distribution with a few queries? Can we simulate the distribution, which means to “locally compute” a distribution that is close to the input? Specifically, given domain point \(x\), we want to compute a value \(\mu(x)\) such that \(\mu\) represents a distribution close to the input. This paper gives such a result, where each \(\mu(x)\) can be computed in \(O(n^2/\varepsilon^2)\) queries. The final distribution has KL-divergence \(\varepsilon^2\) from the original.

Sampling and Identity-Testing Without Approximate Tensorization of Entropy by William Gay, William He, Nicholas Kocurek, and Ryan O’Donnell (arXiv). Again, consider a distribution \(D\) over a large product domain \(\Sigma^n\). Our aim is identity testing, so we wish to determine if \({D} = {D}’\) or \(|{D} – {D}’| > \varepsilon\), where \({D}’\) is some explicitly known distribution. The model is much weaker than subcube condition access. Essentially, we can only get samples from subcubes of dimension (at least) \(n-1\). Blanca-Chen-Štefankovič-Vigoda studied identity testing where \({D}\) satisfies a condition called “approximate tensorization of entropy” (ATE). By Jensen’s inequality, if \({D}\) is a product distribution, the entropy of \({D}\) is at most the sum of entropies of the marginal distributions. If the entropy of \({D}\) is (say) at most a constant factor of this sum, we say it satisfies the ATE. Blanca-Chen-Štefankovič-Vigoda proves that \(\widetilde{O}(n/\varepsilon)\) suffices for identity testing. This paper studies input distributions that are mixtures of (say, a constant number of) distributions satisfying ATE. Interestingly, these mixtures do not satisfy the ATE. But this paper shows that one can still get \(\widetilde{O}(n/\varepsilon)\) query identity testers.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.