# News for November 2022

All the best to everyone for a Happy 2023. The holiday season is ripe with lots of papers to engage our readers. So, we have nine papers and we hope some of those will catch your fancy. As a new year treat, we also feature Gilmer’s constant lower bound on the union-closed sets problem of Frankl. On we go.

Sublinear Time Algorithms and Complexity of Approximate Maximum Matching by Soheil Behnezhad, Mohammad Roghani, Aviad Rubinstein (arXiv) This paper makes significantly advances our understanding of the maximum matching problem in the sublinear regime. Your goal is to estimate the size of the maximum matching and you may assume that you have query access to the adjacency list of your graph. Our posts from Dec 2021 and June 2022 reported some impressive progress on this problem. The upshot from these works essentially said that you can beat greedy matching and obtain a $$\frac{1}{2} + \Omega(1)$$ approximate maximum matching in sublinear time. Let me first go over the algorithmic results from the current paper. The paper shows the following two algorithmic results:

(1) An algorithm that runs in time $$n^{2 – \Omega_{\varepsilon}(1)}$$ and returns a $$2/3 – \varepsilon$$ approximation to maximum matching in general graphs, and

(2) An algorithm that runs in time $$n^{2 – \Omega_{\varepsilon}(1)}$$ and returns a $$2/3 + \varepsilon$$ approximation to maximum matching size in bipartite graphs.

The question remained — can we show a lower bound that grows superlinearly with $$n$$. The current work achieves this and shows that even on bipartite graphs, you must make at least $$n^{1.2 – o(1)}$$ queries to the adjacency list to get a better than $$2/3 + \Omega(1)$$ approximation. (An aside: A concurrent work by Bhattacharya-Kiss-Saranurak from December also obtains similar algorithmic results for approximating the maximum matching size in general graphs).

Directed Isoperimetric Theorems for Boolean Functions on the Hypergrid and an $$\widetilde{O}(n \sqrt d)$$ Monotonicity Tester by Hadley Black, Deeparnab Chakrabarty, C. Seshadhri (arXiv) Boolean Monotonicity testing is as classic as classic gets in property testing. Encouraged by the success of isoperimetric theorems over the hypercube domain and the monotonicity testers powered by these isoperimetries (over the hypercube), one may wish to obtain efficient monotonicity testers for the hypergrid $$[n]^d$$. Indeed, the same gang of authors as above showed in a previous work that a Margulis style directed isoperimetry can be extended from the lowly hypercube to the hypergrid. This resulted in a tester with $$\widetilde{O}(d^{5/6})$$ queries. The more intricate task of proving a directed Talagrand style isoperimetry that underlies the Khot-Minzer-Safra breakthrough was a challenge. Was. The featured work extends this isoperimetry from the hypercube to the hypergrid and this gives a tester with query complexity $$\widetilde{O}(n \sqrt d)$$ which is an improvement over the $$d^{5/6}$$ bound for domains where $$n$$ is (say) some small constant. But as they say, when it rains, it pours. This brings us to a concurrent paper with the same result.

Improved Monotonicity Testers via Hypercube Embeddings by Mark Braverman, Subhash Khot, Guy Kindler, Dor Minzer (arXiv) Similar to the paper above, this paper also obtains monotonicity testers over the hypergrid domain, $$[n]^d$$, with $$\widetilde{O}(n^3 \sqrt d)$$ queries. This paper also presents monotonicity testers over the standard hypercube domain — $$\{0,1\}^d$$ in the $$p$$-biased setting. In particular, their tester issues $$\widetilde{O}(\sqrt d)$$ queries to successfully test monotonicity on the $$p$$-biased cube. Coolly enough, this paper also proves directed Talagrand style isoperimetric inequalities both over the hypergrid and the $$p$$-biased hypercube domains.

Toeplitz Low-Rank Approximation with Sublinear Query Complexity by Michael Kapralov, Hannah Lawrence, Mikhail Makarov, Cameron Musco, Kshiteej Sheth (arXiv) Another intriguing paper for the holiday month. So, take a Toeplitz matrix. Did you know that any psd Toeplitz matrix admits a (near-optimal in the Frobenius norm) low-rank approximation which is itself Toeplitz? This is a remarkable statement. The featured paper proves this result and uses it to get more algorithmic mileage. In particular, suppose you are given a $$d \times d$$ Toeplitz matrix $$T$$. Armed with the techniques from the paper you get algorithms that return a Toeplitz matrix $$\widetilde{T}$$ with rank slightly bigger than $$rank(T)$$ which is a very good approximation to $$T$$ in the Frobenius norm. Moreover, the algorithm only issues a number of queries sublinear in the size of $$T$$.

Sampling an Edge in Sublinear Time Exactly and Optimally by Talya Eden, Shyam Narayanan and Jakub Tětek (arXiv) Regular readers of PTReview are no strangers to the fundamental task of sampling a random edge from a graph which you can access via query access to its vertices. Of course, you don’t have direct access to the edges of this graph. This paper considers the task of sampling a truly uniform edge from the graph $$G = (V,E)$$ with $$|V| = n, |E| = m$$. In STOC 22, Tětek and Thorup presented an algorithm for a relaxation of this problem where you want an $$\varepsilon$$-approximately unifrom edge. This algorithm runs in time $$O\left(\frac{n}{\sqrt{m}} \cdot \log(1/\varepsilon) \right)$$. The featured paper presents an algorithm that samples an honest to goodness uniform edge in expected time $$O(n/\sqrt{m})$$. This closes the problem as we already know a matching lower bound. Indeed, just consider a graph with $$O(\sqrt m)$$ vertices which induce a clique and all the remaining components are singletons. You need to sample at least $$\Omega(n/\sqrt m)$$ vertices before you see any edge.

Support Size Estimation: The Power of Conditioning by Diptarka Chakraborty, Gunjan Kumar, Kuldeep S. Meel (arXiv) This work considers the classic problem of support size estimation with a slight twist. You are given access to a stronger (conditioning based) sampling oracle. Let me highlight one of the results from this paper. So, you are given a distribution $$D$$ where $$supp(D) \subseteq [n]$$. You want to obtain an estimate to $$supp(D)$$ that lies within $$supp(D) \pm \varepsilon n$$ with high probability. Suppose you are also given access to the following sampling oracle. You may choose any subset $$S \subseteq [n]$$ and you may request a sample $$x \sim D\vert_S$$. An element $$x \in S$$ is returned with probability $$D\vert_S(x) = D(x)/D(S)$$ (for simplicity of this post, let us assume $$D(S) > 0$$). In addition, this oracle also reveals for you the value $$D(x)$$. The paper shows that the algorithmic task of obtaining a high probability estimate to the support size (to within $$\pm \varepsilon n$$) with this sampling oracle admits a lower bound of $$\Omega(\log (\log n)$$ calls to the sampling oracle.

Computing (1+epsilon)-Approximate Degeneracy in Sublinear Time by Valerie King, Alex Thomo, Quinton Yong (arXiv) Degeneracy is one of the important graph parameters which is relevant to several problems in algorithmic graph theory. A graph $$G = (V,E)$$ is $$\delta$$-degenerate if all induced subgraphs of $$G$$ contain a vertex with degree at most $$\delta$$. The featured paper presents algorithms for a $$(1 + \varepsilon)$$-approximation to degeneracy of $$G$$ where you are given access to $$G$$ via its adjacency list.

Learning and Testing Latent-Tree Ising Models Efficiently by Davin Choo, Yuval Dagan, Constantinos Daskalakis, Anthimos Vardis Kandiros (arXiv) Ising models are emerging as a rich and fertile frontier for Property Testing and Learning Theory researchers (at least to the uninitiated ones like me). This paper considers latent-tree ising models. These are ising models that can only be observed at their leaf nodes. One of the results in this paper gives an algorithm for testing whether the leaf distributions attached to two latent-tree ising models are close or far in the TV distance.

A constant lower bound for the union-closed sets conjecture by Justin Gilmer (arXiv) The union-closed sets conjecture of Frankl states that for any union closed set system $$\mathcal{F} \subseteq 2^{[n]}$$, it holds that there is a mysterious element $$i \in [n]$$ that shows up in at least $$c = 1/2$$ of the sets in $$\mathcal{F}$$. Gilmer took a first swipe on this problem and gave a constant lower bound of $$c = 0.01$$. This has already been improved by at least four different groups to $$\frac{3-\sqrt{5}}{2}$$, a bound which is the limit of Gilmer’s method (which takes all of only 9 pages!).

The key lemma Gilmer proves is the following. Suppose you sample two sets: $$A, B \sim \mathcal{D}_n$$ (iid) from some distribution $$\mathcal{D}_n$$ over the subsets of $$[n]$$. Suppose for every index $$i \in [n]$$, it holds that the probability that the element $$i$$ shows up in the random set $$A$$ is at most $0.01$. Then you have $$H(A \cup B) \geq 1.26 H(A)$$. This is all you need to finish Gilmer’s proof (of $$c = 0.01$$). The remaining argument is as follows. Suppose, by the way of contradiction, that no element shows up in at least $$0.01$$ fraction of sets in the union closed family $$\mathcal{F}$$. An application of the key lemma would then give $$H(A \cup B) > H(A)$$ which is a contradiction if $$A,B$$ are chosen uniformly from $$\mathcal{F}$$. The proof of the key lemma is also fairly slick and uses pretty simple information theoretic tools.