Category Archives: Monthly digest

News for July 2022

Last month saw a flurry of activity in Property Testing. We had thirteen papers!! Without further ado, let us dig in.

Testing of Index-Invariant Properties in the Huge Object Model (by Sourav Chakraborty, Eldar Fischer, Arijit Ghosh, Gopinath Mishra, and Sayantan Sen)(arXiv) This paper explores a class of distribution testing problems in the Huge Object Model introduced by Goldreich and Ron (see our coverage of the model here). A quick refresher of this model: so, suppose you want to test whether a distribution \(\mathcal{D}\) supported over, say the boolean hypercube \(\{0,1\}^n\) has a certain property \(\mathcal{P}\). You pick a string \(x \sim \mathcal{D}\) where the length of \(x\) is \(n\). In situations where \(n\) is really large, you might not want to read all of \(x\) and you may instead want to read only a few bits from it. To this end, Goldreich and Ron formulated a model where you have query access to the strings you sample. The distribution \(\mathcal{D}\) is deemed to be \(\varepsilon\)-far from \(\mathcal{P}\) if \(EMD(\mathcal{D}, \mathcal{P}) \geq \varepsilon\) (here \(EMD\) denotes the earthmover distance with respect to the relative Hamming distance between bitstrings). In this model, one parameter of interest is the query complexity of your tester.

One of the results in the featured paper above shows the following: Let \(\sf{MONOTONE}\) denote the class of monotone distributions supported over \(\{0,1\}^n\) (a distribution \(D\) belongs to the class \(\sf{MONOTONE}\) if \(D(x) \leq D(y)\) whenever \(0^n \preceq x \preceq y \preceq 1^n\)). Let \(\mathcal{B}_d\) denote the class of distributions supported over \(\{0,1\}^n\) whose supports have VC dimension at most \(d\). Let \(\mathcal{P} = \sf{MONOTONE} \cap \mathcal{B}_d\). Then, for any \(\varepsilon > 0\), you can test whether a distribution \(\mathcal{D} \in \mathcal{P}\) or whether it is \(\varepsilon\) far from \(\mathcal{P}\) with query complexity \(poly(1/\varepsilon)\). In fact, the paper shows this for a much richer class \(\mathcal{P}\) which is the class of so-called index-invariant distributions with bounded VC-dimensions. The paper also shows the necessity of both of these conditions for efficient testability. Do check it out!

Identity Testing for High-Dimensional Distributions via Entropy Tensorization (by Antonio Blanca, Zongchen Chen, Daniel Štefankovič, and Eric Vigoda)(arXiv)

This paper considers a classic in distribution testing. Namely, the problem of testing whether the hidden input distribution \(\pi\) is identical to an explicitly given distribution \(\mu\). Both distributions are supported over a set \(\Omega\). The caveat is \(\Omega\) is some high dimensional set (think \(\Omega = [k]^n\)) and that it has a size that grows exponentially in \(n\). In this case, identity testing has sample complexity \(\Omega(k^{n/2})\) even when \(\mu\) is the uniform distribution. In an attempt to overcome this apparent intractability of identity testing in high dimensions, this paper takes the following route: in addition to the standard sample access to \(\pi\), you also assume access to a stronger sampling oracle from \(\pi\). And now you would like to understand for which class of explicitly given distributions \(\mu\) can you expect algorithms with efficient sample complexity (assuming the algorithm is equipped with this stronger sampling oracle). For any \(i \in [n]\) and \(\omega \in \Omega\), the stronger oracle considered in this work allows you to sample \(x \sim \pi_{\omega(-i)}\) where \(\pi_{\omega(-i)}\) denotes the conditional marginal distribution of \(\pi\) over the \(i\)-th coordinate when the remaining coordinates have been fixed according to \(\omega\).

The paper shows if the known distribution \(\mu\) satisfies some approximate tensorization of entropy criterion, then identity testing with such distributions \(\mu\) can be done with \(\tilde{O}(n/\varepsilon)\) queries. Thanks to the spectral independence toolkit pioneered by Anari et al, it turns out that the approximate tensorization property holds for a rich class of distributions. (A side note to self: It looks like I am running out of reasons to postpone learning about the new tools like Spectral Independence.)

Near-Optimal Bounds for Testing Histogram Distributions (by Clément L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Sihan Liu)(arXiv) Histograms comprise one of the most natural and widely used ways for summarizing some relevant aspects of massive datasets. Let \(\Omega\) denote an \(n\)-element dataset (with elements being \(\{1,2, \ldots, n \}\)). A \(k\)-histogram is a function that is piecewise constant over \(k\) interval pieces. This paper studies the sample complexity of the following fundamental task: given a distribution \(\mathcal{P}\) supported over \(\Omega\), is \(\mathcal{P}\) a \(k\)-histogram or is \(\mathcal{P}\) far from being a \(k\)-histogram. The main result of the paper is a (near) sample optimal algorithm for this problem. Specifically, this paper shows that \(k\)-histogram testing has sample complexity \(\Theta\left(\sqrt{nk}/\varepsilon + k/\varepsilon^2 + \sqrt{n}/\varepsilon^2\right)\).

Comments on “Testing Conditional Independence of Discrete Distributions” (by Ilmun Kim)(arXiv) Probability is full of subtleties and conditional probability is perhaps the biggest landmine of subtleties in this venerable discipline. The featured paper closely examines some subtleties in Theorem 1.3 of the CDKS18 paper on testing conditional independence of discrete distributions. Essentially, this theorem undertakes the following endeavor: you would like to test whether a bivariate discrete distribution has independent marginals conditioned on values assumed by a third random variable. Theorem 1.3 of CDKS18 asserts that there exists a computationally efficient tester for conditional independence with small sample complexity. The featured paper fixes the sample complexity bound claimed in Theorem 1.3 of CDKS18.

Cryptographic Hardness of Learning Halfspaces with Massart Noise (by Ilias Diakonikolas, Daniel M. Kane, Pasin Manurangsi, and Lisheng Ren)(arXiv) The study of robust supervised learning in high dimensions has seen a lot of impressive progress in the last few years. The paper under review presents sample complexity lower bounds for the task of learning halfspaces in this overarching framework. Let us unpack this paper slowly. So, let us recall the classic task of learning halfspaces in \(\mathbb{R}^n\). You know the drill. I have a known concept class \(\mathcal{C}\) (comprising of boolean functions) in my hand. Unbeknownst to you, I have a boolean function \(f \in \mathcal{C}\). You get as input a multiset \(\{x_i, f(x_i)\}_{i \in [s]}\) of labeled examples from a distribution \(\mathcal{D}\) where \(x_i \sim \mathcal{D}_x\) and \(\mathcal{D}_x\) is fixed but arbitrary. Your goal is to develop an algorithm that returns a hypothesis with a small misclassification rate. The classic stuff.

Now, consider the same setup with a little twist: the so-called Massart noise setup. The labels \(f(x_i)\) are no longer reliable and the label on each \(x_i\) gets flipped adversarially with probability \(\eta_i \leq \eta < 1/2\). In a breakthrough Diakonikolas, Gouleakis, and Tzamos made the first algorithmic progress on this problem and gave algorithms with running time \(poly(n/\varepsilon)\) and misclassification rate \(\eta + \varepsilon\). The current paper shows a lower-bound result. Assuming the hardness of the so-called “Learning With Errors” problem, this paper shows that under Massart Noise, it is not possible for a polynomial time learning algorithm to achieve a misclassification rate of \(o(\eta)\).

Locally-iterative (Δ+1)-Coloring in Sublinear (in Δ) Rounds (by Xinyu Fu, Yitong Yin, and Chaodong Zheng)(arXiv) A time-honored problem in Distributed Computing is Distributed graph coloring. Let us first understand what problem this paper studies. So, you are given a graph \(G = (V,E)\) with maximum degree \(\Delta\). In a seminal work, Szegedy and Vishwanathan introduced the framework of locally-iterative algorithms as a natural family of distributed graph coloring algorithms. These algorithms proceed in \(r\) rounds. In each round, you update the color of a vertex \(v\) where the new color of \(v\) is a function of the current color of \(v\) and the current color of its neighbors. The current paper shows that you can in the locally-iterative framework, you can in fact, obtain a proper coloring of \(G\) with \(\Delta(G) + 1\) colors in \(r = O(\Delta^{3/4} \log \Delta) + \log^* n\) rounds.

Learning Hierarchical Structure of Clusterable Graphs (by Michael Kapralov, Akash Kumar, Silvio Lattanzi, Aida Mousavifar)(arXiv) [Disclaimer: I am one of the authors of this paper.] Hierarchical clustering of graph data is a fundamentally important task in the current big data era. In 2016, Dasgupta introduced the notion of Dasgupta cost which essentially allows one to measure the quality of a hierarchical clustering. This paper presents algorithms that can estimate the Dasgupta Cost of a graph coming from a special family of \(k\)-clusterable graphs in the semi-supervised setting. These graphs have \(k\) clusters. These clusters are essentially subsets of vertices that induce expanders and these clusters are sparsely connected to each other. We are given query access to the adjacency list of \(G\). Also, for an initial “warmup” set of randomly chosen vertices, we are told the clusters they belong to. Armed with this setup, this paper presents algorithms that run in time \(\approx \sqrt{n}\) and return an estimate to the Dasgupta Cost of \(G\) which is within a \(\approx \sqrt{\log k}\) factor of the optimum cost.

Finding a Hidden Edge (by Ron Kupfer and Noam Nisan)(arXiv) Let us consider as a warmup (as done in the paper) the following toy problem. You have a graph on \(n\) vertices whose edge set \(E\) is hidden from you. Your objective is to return any \((i,j) \in E\). The only queries you are allowed are of the following form. You may consider any subset \(Q \subseteq V \times V\) and you can ask whether \(Q\) contains any edge. A simple binary search solves this question with \(\log m\) queries (where \(m = {n \choose 2}\)). However, if you want a non-adaptive algorithm for this problem (unlike binary search) you can show that any deterministic algorithm must issue \(m\) non-adaptive queries. Turns out randomness can help you get away with only \(O(\log^2m)\) non-adaptive queries for this special toy problem. Now, let me describe the problem considered in this work in earnest. Suppose the only queries you are allowed are of the following form: you may pick any \(S \subseteq V\) and you may ask whether the graph induced on \(S\) contains an edge. The paper’s main result is that there is an algorithm for finding an edge in \(G\) which issues nearly linear in \(n\) many non-adaptive queries. The paper also presents an almost matching lower bound.

On One-Sided Testing Affine Subspaces (by Nader Bshouty)(ECCC) Dictatorship testing is one of the classics in property testing of boolean functions. A more generalized problem considers testing whether the presented function is a \(k\)-monomial. If you are a regular reader of the posts on PTReview, you might have seen this problem essentially asks you to test whether a boolean function \(f \colon \mathcal{F}^n \to {0,1}\) is an indicator of an \((n-d)\) dimensional affine/linear subspace of \(\mathcal{F}^n\) (here \(\mathcal{F}\) denotes a finite field). Namely, you would like to test whether the set \(f^{-1}\) is an \((n-k)\) dimensional affine subspace of \(\mathcal{F}^n\). The paper under review improves the state-of-the-art query complexity for this problem from a previous value of \(O\left(|\mathcal{F}|/\varepsilon\right)\) to \(\tilde{O}\left(1/\varepsilon\right)\).

Non-Adaptive Edge Counting and Sampling via Bipartite Independent Set Queries (by Raghavendra Addanki, Andrew McGregor, and Cameron Musco)(arXiv) If you have been around the PTReview corner for a while, you know that sublinear time estimation of graph properties is one of our favorite pastimes here. Classic work in this area considers the following queries: vertex degree queries, \(i\)-th neighbor queries, and edge existence queries. This classic query model has received a lot of attention and thanks to the work of Eden and Rosenbaum we know algorithms for near-uniform edge sampling with query complexity \(O(n/\sqrt{m}) \cdot poly(\log n) \cdot poly(1/\varepsilon)\). Motivated by a desire to obtain more query-efficient algorithms, Beame et al. introduced an augmented query model where you are also allowed the following queries: you may pick \(L, R \subseteq V\) and you get a yes/no response indicating whether there exists an edge in \(E(L, R)\). These are also called the bipartite independent set (BIS) queries. The featured paper shows that with (BIS) queries you get non-adaptive algorithms for near-uniform edge sampling with query complexity being a mere \(\widetilde{O}(\varepsilon^{-4} \log^6 n)\). The main result of the paper gives a non-adaptive algorithm for estimating the number of edges in \(G\) with query complexity (under BIS) being a mere \(\widetilde{O}(\varepsilon^{-5} \log^5 n)\).

A Query-Optimal Algorithm for Finding Counterfactuals (by Guy Blanc, Caleb Koch, Jane Lange, Li-Yang Tan)(arXiv) Given an abstract space \(X^d\), an instance \(x^* \in X^d\) and a model \(f\) (which you think of as a boolean function over \(X^d\)), a point \(x’ \in X^d\) is called a counterfactual to \(x^*\) if \(x^*, x’\) differ in few features (i.e., have a small Hamming distance) and \(f(x^*) \neq f(x’)\). Ideally, you would like to find counterfactuals that are as close to each other in Hamming Distance. The main result of this paper is the following: Take a monotone model \(f \colon \{0,1\}^d \to \{0,1\}\), an instance \(x^* \in \{0,1\}^d\) with small sensitivity (say \(\alpha\)). Then there exists an algorithm that makes at most \(\alpha^{\Delta(x^*)}\) queries to \(f\) and returns all optimal counterfactuals of \(f\). Here \(\Delta(x^*) = \min_{x \in \{0,1\}^d} \{\Delta_H(x, x^*) \colon f(x) \neq f(x^*) \}\). The paper also proves a matching lower bound on query complexity which is obtained by some monotone model \(f\).

A Sublinear-Time Quantum Algorithm for Approximating Partition Functions (by Arjan Cornelissen and Yassine Hamoudi)(arXiv) For the classical Hamiltonian \(H \colon \Omega \to \{0,1, \ldots, n\}\), at inverse temperature \(\beta\), the probability, under the so-called Gibbs distribution, assigned to a state \(x \in \Omega\) is proportional to \(\exp(-\beta H(x))\). The partition function is given by \(Z(\beta) = \sum_{x \in \Omega} \exp(-\beta H(x))\). At high temperatures (or low values of \(\beta\)) the partition function is typically easy to compute. However, the low-temperature regime is often challenging. You use MCMC methods to compute \(Z(\infty)\). In particular, you write this as the following telescoping product \(Z(\infty) = Z(0) \cdot \prod_{i = 0}^{i = \ell – 1} \frac{Z(\beta_{i+1})}{Z(\beta_i)}\) where \(0 = \beta_1 < \beta_2 < \ldots < \beta_{\ell} = \infty\) is some increasing sequence of inverse temperatures with limited fluctuations in Gibbs distribution between two consecutive values and you use MCMC methods to estimate each of the \(\ell\) ratios in the above product. The main result of this paper presents a quantum algorithm that on input a Gibbs distribution generated by a Markov Chain with a large spectral gap performs sublinearly few steps (in size of the logarithm of the state space) of the quantum walk operator and returns a \(\pm \varepsilon Z(\infty)\) additive estimate to \(Z(\infty)\).

A Near-Cubic Lower Bound for 3-Query Locally Decodable Codes from Semirandom CSP Refutation (by Omar Alrabiah, Venkatesan Guruswami, Pravesh Kothari, and Peter Manohar)(ECCC) If you made it till here, it is time for a treat. Let us close (hopefully, I did not miss any papers this time!) with a breakthrough in Locally Decodable Codes. So, for 2-query LDCs, we know fairly tight bounds on the block length. For 3-query LDCs, on the other hand, we know a sub-exponential upper bound on the block length. However, the best-known lower bound on the block length was merely quadratic. The featured paper improves this to a cubic lower bound on the block length. The main tool used to achieve this is a surprising connection between the existence of locally decodable codes and the refutation of Boolean CSP instances with limited randomness. This looks like a fantastic read to close off this month’s report!

News for June 2022

We have four papers this month — three on sublinear-time graph algorithms and one on distribution testing!

Beating Greedy Matching in Sublinear Time, by Soheil Behnezhad, Mohammad Roghani, Aviad Rubinstein, and Amin Saberi (arXiv). Designing sublinear-time algorithms to estimate the size of maximum matching in a graph is a well-studied problem. This paper gives the first \(\frac{1}{2} + \Omega(1)\) approximation algorithm that runs in time sublinear in the size of the input graph. Specifically, given a graph on \(n\) vertices and maximum degree \(\Delta\) in the adjacency list model, and a parameter \(\epsilon >0\), the algorithm runs in time \(\tilde{O}(n + \Delta^{1+\epsilon})\) and produces a \(\frac{1}{2} + f(\epsilon)\) approximation to the maximum matching for some function \(f\). It must be noted that a seminal work of Yoshida, Yamamoto and Ito (STOC, 2009) also gives a better than \(\frac{1}{2}\) approximation sublinear-time algorithm for the same problem. However, the result of Yoshida et al. requires assumptions on the maximum degree of the input graph. An additional point worth mentioning is that the authors do not believe that their techniques will yield an approximation guarantee better than \(0.51\), i.e., \(f(\epsilon) < 0.01\) for all \(\epsilon\).

Sublinear-Time Clustering Oracle for Signed Graphs, by Stefan Neumann and Pan Peng (arXiv). Consider a large signed graph on \(n\) vertices where vertices represent users of a social network and signed edges (+/-) denote the type of interactions (friendly or hostile) between users. Assume that the vertices of the social network can be partitioned into \(O(\log n)\) large clusters, where each cluster has a sparse cut with the rest of the graph. Further, each cluster is a minimal set (w.r.t. inclusion) that can be partitioned into roughly equal-sized opposing sub-communities, where a sub-community opposes another sub-community if most of the edges going across are negatively signed and most of the edges within the sub-communities are positively signed. This work provides a local oracle that, given probe access to a signed graph with such a hidden cluster structure, answers queries of the form “What cluster does vertex \(v\) belong to?” in time \(\tilde{O}(\sqrt{n} \cdot \text{poly}(1/\epsilon))\) per query. This result is a generalization of the same problem studied for unsigned graphs (Peng, 2020). The authors additionally show that their method works well in practice using both synthetic and real-world datasets. They also provide the first public real-world datasets of large signed graphs with a small number of large ground-truth communities having this property.

Sublinear Algorithms for Hierarchical Clustering, by Arpit Agarwal, Sanjeev Khanna, Huan Li, and Prathamesh Patil (arXiv). Consider a weighted graph \(G = (V,E,w)\), where the set \(V\) of vertices denotes datapoints and the weight \(w(e) > 0\) of edge \(e \in E\) denotes the similarity between the endpoints of \(e\). A hierarchical clustering of \(V\) is a tree \(T\) whose root is the set \(V\) and leaves are the singleton sets corresponding to individual vertices. An internal node of the tree corresponds to a cluster containing all the leaf vertices that are descendants of that node. A hierarchical clustering tree provides us with a scheme to cluster datapoints at multiple levels of granularity. The cost of a hierarchical clustering tree is \(\sum_{(u,v) \in E} |T_{u,v}| \cdot w(u,v)\), where \(T_{u,v}\) denotes the lowest common ancestor of the leaves \(u\) and \(v\). In this paper, the authors present sublinear algorithms for determining a hierarchical clustering tree with the minimum cost. In the query model with degree queries and neighbor queries to the graph, they give an algorithm that outputs an \(\tilde{O}(1)\)-approximate hierarchical clustering and makes \(\tilde{O}(n^{4-2\gamma})\) queries, when the number of edges \(m = \Theta(n^{\gamma})\) for \(1.5 \geq \gamma > 4/3\). When the input graph is sparse, i.e., \(\gamma \leq 4/3\), the algorithm makes \(\tilde{O}(\max\{n, m\})\) queries, and when the graph is dense, i.e., \(\gamma >1.5\), the algorithm makes \(\tilde{O}(n)\) queries. They complement their upper bounds with nearly tight lower bounds. In order to obtain their upper bounds, they design a sublinear-time algorithm for the problem of obtaining a weak cut sparsifier that approximates cuts sizes upto an additive term in addition to the usual multiplicative factor. They also design sublinear algorithms for hierarchical clustering in the MPC and streaming models of computation.

Sharp Constants in Uniformity Testing via the Huber Statistic, by Shivam Gupta and Eric Price (arXiv). This paper revisits the fundamental problem of uniformity testing — i.e., to decide whether an unknown distribution over \(n\) elements is uniform or \(\epsilon\)-far from uniform. This problem is known to be solvable optimally with probability at least \(1 – \delta\) using \(s = \Theta\left(\frac{\sqrt{n \log (1/\delta)}}{\epsilon^2} + \frac{\log (1/\delta)}{\epsilon^2}\right)\) independent samples from the unknown distribution. Multiple testers are known for the problem and they all compute a statistic of the form \(\sum_{i \in [n]} f(s_i)\), where \(s_i\) for \(i \in [n]\) and \(f\) is some function and make their decision based on whether or not the value of the statistic is above or below a threshold. For instance, the earliest known uniformity tester (Batu, Fortnow, Rubinfeld, Smith and White 2000; Goldreich and Ron 2011), also called the collisions tester, uses \(f(k) = \frac{k(k-1)}{2}\). The current paper proposes a new tester based on the Huber loss. For \(\beta > 0\), let \(h_\beta(x) := \min\{x^2, 2\beta x – \beta^2\}\). The statistic that the authors use in their test is defined by the function \(f(k) := k – s/n\), where \(s\) is the number of samples and \(n\) is the support size of the distribution. The authors show that their tester is better than all previously known testers as they achieve the best constants in the sample complexity.

News for May 2022

The crazy numbers from last month are not quite gone: we have five papers this month, not bad at all!

Codes! Distributed computing! Probability distributions!

Improved local testing for multiplicity codes, by Dan Karliner and Amnon Ta-Shma (ECCC). Take the Reed–Muller code with parameters \(m, d\), whose codewords are the evaluation tables of all degree-\(m\) polynomials over \(\mathbb{F}^d\). RM codes are great, they are everywhere, and they are locally testable: one can test whether a given input \(x\) is a valid codeword (or far from every codeword) with only very few queries to \(x\). Now, take the multiplicity code: instead of just the evaluation table of the polynomial themselves, a codeword includes the evaluations of all its derivatives, up to order \(s\). These beasts generalize RM codes: are they also locally testable? Yes they are! And this work improves on our understanding of this aspect, by providing better bounds on the locality (how few queries are necessary to test), and simplifies the argument from previous work by Karliner, Salama, and Ta-Shma (2022).

Overcoming Congestion in Distributed Coloring, by Magnús M. Halldórsson, Alexandre Nolin, Tigran Tonoyan (arXiv). Two of the main distributed computing models, LOCAL and CONGEST, differ in how they model the bandwidth constraints. In the former, nodes can send messages of arbitrary size, and the limiting quantity is the number of rounds of communications; while in the latter, each node can only send a logarithmic number of bits at each round. This paper introduces a new technique that allows for communication-efficient distributed (coordinated) sampling, which as a direct applications enables porting several LOCAL algorithms to the CONGEST model at a small cost: for instance, \((\Delta+1)\)-List Coloring. This new technique also has applications beyond these distributed models, to graph property testing – in a slightly non-standard setting where we define farness from the property in a “local” sense (detect vertices or edges which contribute to many violations, i.e., are “locally far” from the property considered).

Robust Testing in High-Dimensional Sparse Models, by Anand Jerry George and Clément L. Canonne (arXiv). In the Gaussian mean testing problem, you are given samples from a high-dimensional Gaussian \(N(\mu, I_d)\), where \(\mu\) is either zero or has \(\ell_2\) norm greater than \(\varepsilon\), and you want to decide which of the two holds. This “mean testing” equivalent (due to, erm, “standard facts”) to testing in total variation distance, and captures the setting where one wantss to figure out whether an underlying signal \(\mu\), subject to white noise, is null or significant. Now, what if this \(\mu\) was promised to be \(s\)-sparse? Can we test more efficiently? But what if a small fraction of the samples were arbitrarily corrupted — how much harder does the testing task become? For some related tasks, it is known that being robust against adversarial corruptions makes testing as hard as learning… This paper addresses this “robust sparse mean testing” question, providing matching upper and lower bounds; as well as the related question of (robust, sparse) linear regression.

Sequential algorithms for testing identity and closeness of distributions, by Omar Fawzi, Nicolas Flammarion, Aurélien Garivier, and Aadil Oufkir (arXiv). Consider the two “usual suspects” of distribution testing, identity and closeness testing, where we must test if an unknown distribution is equal to some reference one or \(\varepsilon\)-far (in total variation distance) from it; or, the same thing, but with two unknown distributions (no reference one). These are, by now, quite well understood… but the algorithms for them take a worst-case number of samples, function of the distance parameter \(\varepsilon\). But if the two distributions are much further apart than \(\varepsilon\), fewer samples should be required! This is the focus of this paper, showing that with a sequential test one can achieve this type of guarantees: a number of samples which, in the “far” case, depends on the actual distance, not on its worst-case lower bound \(\varepsilon\). One could achieve this by combining known algorithms with a “doubling search;” however, this still would lose some constant factors in the sample complexity. The authors provide sequential tests which improve on this “doubling search technique” by constant factors, and back this up with empirical evaluations of their algorithms.

Estimation of Entropy in Constant Space with Improved Sample Complexity, by Maryam Aliakbarpour, Andrew McGregor, Jelani Nelson, and Erik Waingarten (arXiv). Suppose that, given samples from an unknown distribution \(p\) over \(n\) elements, your task is to estimate its (Shannon) entropy \(H(p)\) up to \(\pm\Delta\). You’re in luck! We know that \(\Theta(n/(\Delta\log n)+ (\log^2 n)/\Delta^2)\) samples are necessary and efficient. But what if you had to do that under strict memory constraints? Say, using only a constant number of words of memory? Previous work by Acharya, Bhadane, Indyk, and Sun (2019) shows that it is still possible, but the number of samples required shoots up, with their algorithm now requiring (up to polylog factors) \(n/\Delta^3\) samples. This works improves upon the dependence on \(\Delta\), providing a constant-memory algorithm with sample complexity \(O(n/\Delta^2 \cdot \log^4(1/\Delta))\); they further conjecture this to be optimal, up to the polylog factors.

News for April 2022

We have…I don’t know, I’ve lost count of the number of papers this month. It’s a big bonanza. Sublinear algorithms for edit distance, planar graphs, distributions, bipartite graphs, groups, error correcting codes, Bayesian nets, polynomials…

Let’s proceed with the spread.

Improved Sublinear-Time Edit Distance for Preprocessed Strings by Karl Bringmann, Alejandro Cassis, Nick Fischer, and Vasileios Nakos (arXiv). The edit distance between strings is a classic and important problem in algorithms. You might recall that classic \(O(n^2)\) algorithm to compute the edit distance between strings of length \(n\). It has been show that getting a \(O(n^{2-\delta})\) time algorithm is SETH-hard. But what can be done in sublinear time? This paper considers the preprocessed version: suppose we can perform near-linear preprocessing on the strings. We now want to distinguish between edit distance between \(\leq k\) and \(\geq k\cdot n^{o(1)}\). This paper shows that with near-linear preprocessing on strings, one can solve this problem in \(k \cdot n^{o(1)}\) time.

Optimal Closeness Testing of Discrete Distributions Made Complex Simple by (our own) Clément L. Canonne and Yucheng Sun (arXiv). Given two distributions \(p, q\) over support \([k]\), the aim is to distinguish between (i) the distributions being equal, and (ii) the total variation distance between \(p, q\) being at least \(\epsilon\). The tester should has a failure probability of at most \(\delta\). A recent work nails down the sample complexity with respect to all parameters. This paper gives a simpler proof of the main result. Earlier proofs used Poissonization tricks and fairly clever arguments about Poisson random variables. This proof is much more transparent, and uses an identity that relates the expectation of a random variable to its characteristic function. A nice feature of this proof is that it works directly with the multinomial distribution, which means a fixed number of samples (rather than choosing the number of samples from a distribution).

Tolerant Bipartiteness Testing in Dense Graphs by Arijit Ghosh, Gopinath Mishra, Rahul Raychaudhury, and Sayantan Sen (arXiv). Testing bipartiteness of dense graphs is about as a classic as it gets. We wish to distinguish a bipartite graph from one that requires \(\varepsilon n^2\) edge removals to make it bipartite. Readers of this blog should know that there is a \(\widetilde{O}(\varepsilon^{-2})\)-query property tester for this problem. (Ok, so now you know.) This paper studies the tolerant version of bipartiteness testing. Note that this is equivalent to approximating the maxcut, up to additive error \(\varepsilon n^2\). Classic approximation algorithms show that the latter can be done in \(\widetilde{O}(\varepsilon^{-6})\) queries and \(\exp(\widetilde{O}(\varepsilon^{-2}))\) time. This paper considers the easier problem of distinguishing whether the distance to bipartiteness is at most \(\varepsilon\) or at least \(2 \varepsilon\). This problem is solved in \(\widetilde{O}(\varepsilon^{-3})\) queries and \(\exp(\widetilde{O}(\varepsilon^{-1}))\).

Properly learning monotone functions via local reconstruction by Jane Lange, Ronitt Rubinfeld, Arsen Vasilyan (arXiv). Ah yes, monotone functions. An ongoing love (obsession? interest?) for property testing people. This paper studies the problem of proper learning of Boolean valued monotone functions over the Boolean hypercube. Given access to uniform random evaluations of a monotone function \(f:\{0,1\}^n \to \{0,1\}\), we wish to compute a monotone function \(g\) that approximates the original function. Classic results from Fourier analysis show that an approximation can be learned using \(\exp(\sqrt{n}/\varepsilon)\) queries. But this approximation function might not be monotone, and only yields improper learning. This paper gives a proper learner that outputs a monotone approximation, in roughly the same query complexity. This result directly gives a constant tolerance monotonicity tester for Boolean functions. The paper uses recent results from distributed algorithms and local computation. It also leads to tolerant testers for monotonicity over posets with small diameter.

Massively Parallel Computation and Sublinear-Time Algorithms for Embedded Planar Graphs by Jacob Holm and Jakub Tětek (arXiv). Sublinear algorithms for planar graphs is another ongoing love (at least for me). This paper considers a new take of this problem: suppose we have access to a geometric embedding of a planar graph \(G\). Can we get sublinear algorithms for a variety of problems? This paper first shows how to construct a convenient decomposition, called an \(r\)-division, in sublinear time. This division can be used to approximate Lipschitz graph parameters, such as maximum matching sizes, maximum independent set, etc. The paper also shows how to compute an \(r\)-division in the MPC model, which solves many classic graph problems (connected components, matchings, etc.) in \(O(1)\) rounds. There is a (conditional) lower bound showing that, without an embedding, it is not possible to solve such problems in \(O(1)\) rounds (and sublinear space per processor).

Independence Testing for Bounded Degree Bayesian Network by Arnab Bhattacharyya, Clément L. Canonne (again, our own), and Joy Qiping Yang (arXiv). Given a distribution \(\mathcal{P}\) on the Boolean hypercube \(\{0,1\}^n\), the problem is to determine whether \(\mathcal{P}\) is a product distribution. In general, this problem requires \(\Omega(2^n)\) samples. Suppose \(\mathcal{P}\) has a sparse, “efficient” description. Can we do better? This paper shows that when \(\mathcal{P}\) is generated by a Bayesian network (with bounded indegree), then the independence testing problem can be solved with a \(\widetilde{O}(n/\varepsilon^2)\) samples. Think of a Bayesian network as a DAG, where each vertex generates a Bernoulli random variable. The variable at a vertex depends only the outcomes at its neighborhood.

Low Degree Testing over the Reals by Vipul Arora, Arnab Bhattacharyya, Noah Fleming, Esty Kelman, and Yuichi Yoshida (arXiv, ECCC). The problem testing low degree polynomials goes back to the birth of property testing. This paper studies real valued polynomials, in the distribution free setting. Formally, we have query access to a function \(f: \mathbb{R}^d \to \mathbb{R}\). The distance is measured with respect to an unknown distribution \(\mathcal{D}\) over the domain. This paper shows that the real low degree testing problem can be solved in \(poly(d\varepsilon^{-1})\) queries (under some reasonableness conditions on the distribution). The approach is go to via the “self-correct and test” approach: try to compute a low degree polynomial that fits some sampled data, and then check how far the self-corrected version is from another sample.

Testing distributional assumptions of learning algorithms by Ronitt Rubinfeld and Arsen Vasilyan (arXiv). Consider the problem of learning a halfspace over \(\mathbb{R}^n\). If the underlying distribution is Gaussian, then this class can be learned in \(n^{poly(\varepsilon^{-1})}\) samples. If the distribution is arbitrary, no \(2^{o(n)}\) algorithm is known despite much research. This paper introduces the notion of having a tester-learner pair. The tester first checks if the input distribution is “well-behaved” (Gaussian-like). If the tester passes, then we run the learner. Indeed, this perspective goes back to some of the original motivations for property testing (when is testing faster than learning). The intriguing aspect of this problem is that we do not have efficient testers for determining if an input distribution is Gaussian. This paper circumvents that problem by estimating certain moments of the distribution. If these moments agree with the moments of a Gaussian, then the learner is guaranteed to succeed. We get the best of both worlds: if the input distribution is Gaussian, the learning is done correctly. If the learner succeeds, then then output (hypothesis) is guaranteed to be correct, regardless of the input distribution.

Testability in group theory by Oren Becker, Alexander Lubotzky, and Jonathan Mosheiff (arXiv). This paper is the journal version of a result of the authors, and it gives a group theoretic presentation of a property testing result. Consider the following problem. The input is a pair permutations \((\sigma_1, \sigma_2)\) over \([n]\). The aim is to test whether they commute: \(\sigma_1 \sigma_2 = \sigma_2 \sigma_1\). Another result of the authors gives a tester that makes \(O(\varepsilon^{-1})\) queries. They refer to this problem as “testing the relation” \(XY = YX\). This paper gives a grand generalization of that result, best explained by another example. Consider another relation/property denoted \(\{XZ = ZX, YZ = ZY\}\). This property consists of all triples of permutations \((\sigma_1, \sigma_2, \sigma_3)\), where \(\sigma_3\) commutes with the other two. A consequence of the main theorem is that this property is not testable with query complexity independent of \(n\). The main result of this paper is a characterization of testable relations, which goes via studying the expansion of an infinite graph associated with the relation.

Testing Positive Semidefiniteness Using Linear Measurements by Deanna Needell, William Swartworth, and David P. Woodruff (arXiv). The input is a \(d \times d\) real, symmetric matrix \(M\) and we wish to determine if it is positive semidefinite (all eigenvalues are positive). For the testing problem, we reject when the minimum eigenvalue is at most \(-\varepsilon \|M\|_2\). (The paper also considers general Schatten \(p\)-norms.) This paper gives a list of results for non-adaptive vs adaptive, and one-sided vs two-sided testers. There are two access models considered: a single query consists of either a (i) matrix-vector product \(Mx\) or (ii) vector-matrix-vector product \(y^TMx\). Typical models that query entries of the matrix require strong bounds on the entries, which is less reasonable in practical situations. An interesting discovery is that the non-adaptive, one-sided complexity is \(\Theta(\sqrt{d}\varepsilon^{-1})\) while the two-sided bound is independent of \(d\).

Relaxed Locally Decodable and Correctable Codes: Beyond Tensoring by Gil Cohen and Tal Yankovitz (ECCC). Locally decodable and correctable codes are a fundamental object of study in property testing (and TCS in general). Consider a locally correctable code (LCC). Given a string \(x\), the decoder/corrector makes \(q\) queries to \(x\), and outputs a symbol. We can think of the output collectively as a string \(y\). If \(x\) is a codeword, then \(y = x\). Otherwise, \(dist(y,z) \leq \varepsilon\), where \(z\) is some codeword close to \(x\). In the relaxed version, the corrector is allowed to output \(\bot\), denoting that it has discovered corruption. The distance is only measured in the coordinates where the corrector does not output \(\bot\). Thus, the corrector gets a “free pass” if it outputs \(\bot\). But note that when \(x\) is a codeword, the output must be exactly \(x\). This paper gives a Relaxed LCC with query complexity \((\log n)^{O(\log\log\log n)}\), a significant improvement over the previous best \((\log n)^{O(\log\log n)}\). It is know from previous work that the query complexity must be \(\Omega(\sqrt{\log n})\).

Verifying The Unseen: Interactive Proofs for Label-Invariant Distribution Properties by Tal Herman and Guy Rothblum (arXiv). This paper considers the distribution testing problem in the context of interactive proofs. The verifier, who wishes to test a property of a distribution \(\mathcal{P}\), interacts with a prover who knows the distribution. The guarantee required is the standard one for interactive proof systems: in the YES case, an honest prover should be able to convince the verifier. In the NO case, no prover can convince the verifier with high probability. There are two important parameters of interest: the sample complexity of the verifier, and the communication complexity of the messages. It is useful to consider the two extremes. In one extreme, the verifier can simply solve the problem herself, ignoring the prover. This could require \(\Theta(n/\log n)\) queries (for the hardest properties like entropy and support size). Another extreme is for the honest prover to simply send an approximate description of the distribution, which takes \(O(n)\) bits. The prover can just test equality to the prover message, which only takes \(\Theta(\sqrt{n})\) queries. This paper shows a 2-round protocol for any (label-invariant) property where both the communication and the sample complexity can be made \(\Theta(\sqrt{n})\). This result shows the power of interaction for distribution testing problems.

News for March 2022

This was a relatively sleepy month with only two property testing papers. Do let us know if we missed any. Let us dig in. (EDIT: Two updates.)

  1. I missed two papers. One on the estimation of quantum entropies and the other on algorithms and lower bounds for estimating MST and TSP costs.
  2. Finally, I forgot to welcome our new editor. Welcome onboard, Nithin Varma!!

Private High-Dimensional Hypothesis Testing by Shyam Narayanan (arXiv) This paper continues the novel study of distribution testing under the constraints brought forth by differential privacy extending the work of Canonne-Kamath-McMillan-Ullman-Zakynthinou (henceforth CKMUZ, covered in our May 2019 post). In particular, the paper presents algorithms with optimal sample complexity for private identity testing of \(d\)-dimensional Gaussians. In more detail, the paper shows that can be done with a mere \(\widetilde{O}\left( \frac{d^{1/2}}{\alpha^2} + \frac{ d^{1/3} }{ \alpha^{4/3} \cdot \varepsilon^{2/3}} + \frac{1}{\alpha \cdot \varepsilon} \right)\). Here \(\alpha\) is the proximity parameter and \(\varepsilon\) is the privacy parameter. Combined with a previous result of Acharya-Sun-Zhang, the paper proves that private identity testing of \(d\)-dimensional Gaussians is doable with a sample complexity smaller than that of private identity testing of discrete distributions over a domain of size \(d\) thereby refuting a conjecture of CKMUZ.

Differentially Private All-Pairs Shortest Path Distances: Improved Algorithms and Lower Bounds by Badih Ghazi, Ravi Kumar, Pasin Manurangsi and Jelani Nelson (arXiv) Adam Sealfon considered the classic All Pairs Shortest Path Problem (the APSP problem) with privacy considerations in 2016. In the \((\varepsilon, \delta)\)-DP framework, Sealfon presented an algorithm which on input an edge-weighted graph \(G=(V,E,w)\) adds Laplace noise to all edge weights and computes the shortest paths on this noisy graph. The output of the algorithm satisfies that the estimated distance between every pair is within an additive \(\pm O(n \log n/\varepsilon)\) of the actual distance (the absolute value of this parameter is called the accuracy of the algorithm). Moreover, this error is tight up to a logarithmic factor if the algorithm is required to release the shortest paths. The current paper shows you can privately release all the pairwise distances while suffering only a sublinear accuracy if you additionally release the edge weights (in place of releasing the shortest paths). In particular, this paper presents an \(\varepsilon\)-DP algorithm with sublinear \(\widetilde{O}(n^{2/3})\) accuracy.

Quantum algorithms for estimating quantum entropies by Youle Wang, Benchi Zhao, Xin Wang (arXiv) So, remember our post from December on sublinear quantum algorithms for estimation of quantum (von Neumann) entropy? The current paper begins by noting that the research so far (along the lines of the work above) assumes access to a quantum query model for the input state which we do not yet know how to construct efficiently. This paper addresses this issue and gives quantum algorithms to estimate the von Neumann entropy of a \(n\)-qubit quantum state \(\rho\) by using independent copies of the input state.

Sublinear Algorithms and Lower Bounds for Estimating MST and TSP Cost in General Metrics by Yu Chen, Sanjeev Khanna, Zihan Tan (arXiv) As mentioned in the title, this paper studies sublinear algorithms for the metric MST and the metric TSP problem. The paper obtains a wide assortment of results and shows that both these problems admit an \(\alpha\)-approximation algorithm which uses \(O(n/\alpha)\) space. This algorithm assumes that the input is given as a stream of \(n \choose 2\) metric entries. Under this model, the paper also presents an \(\Omega(n/\alpha^2)\) space lower bound. Let me highlight one more result from the paper. In a previous news (from June 2020), we covered a result detailing a better than \(2\)-approximation for the graphic TSP and \((1,2)\) TSP which runs in sublinear time. This paper extends this result and obtains better than \(2\)-approximation for TSP on a relaively richer class of metrics.

News for February 2022

This month has seen a flurry of activity in sublinear algorithms and a diverse collection of papers have come up, with topics ranging from differentially private sublinear algorithms to local testers for multiplicity codes. Apologies to the readers for the delay in putting this post together!

Almost-Optimal Sublinear-Time Edit Distance in the Low Distance Regime by Karl Bringmann, Alejandro Cassis, Nick Fischer and Vasileios Nakos (arXiv)

This paper considers the problem of gap edit distance, i.e., of determining if the edit distance between two strings \(x\) and \(y\) is at most \(k\) or at least \(K\). Their main result is an algorithm that runs in time \(O(n/k + \text{poly}(k))\) and solves the problem for \(K = k \cdot 2^{\tilde{O}(\sqrt{\log k})}\). The paper improves upon earlier results of Goldenberg, Krauthgamer and Saha (2019) and Kociumaka and Saha (2020) who solved the problem for \(K = k^2\) with the same asymptotic guarantee on the query complexity.

One of the interesting takeaways from the paper is that the complexity of solving the gap Hamming distance and gap edit distance are similar in the low distance regime. For both, the complexity of solving the \((k, k^{1+o(1)})\)-gap problem is \(n/k^{1 \pm o(1)}\). This needs to be contrasted with the fact that solving \((k, \Omega(n))\)-gap edit distance requires \(\Omega(\sqrt{k})\) queries as shown by Batu, Ergun, Kilian, Magen and Raskhodnikova (2003), whereas \((k, \Omega(n))\)-gap Hamming distance can be solved in \(O(1)\) time.

These results are incomparable to those obtained by Goldenberg, Kociumaka, Krauthgamer and Saha (which was discussed in our November 2021 post), where they give a nonadaptive algorithm with complexity \(O(n/k^{1.5})\) for \((k, k^2)\)-gap edit distance problem. The algorithm in the present paper is adaptive and works faster for smaller values of \(k\).

Privately Estimating Graph Parameters in Sublinear time by Jeremiah Blocki, Elena Grigorescu, Tamalika Mukherjee (arXiv)

Differentially private approximation algorithms for optimization problems on graphs is a well-studied topic. This paper opens up an exciting research direction by initiating a systematic study of the design of differentially private sublinear-time algorithms. The setting is that graphs are viewed as databases and two graphs are neighboring if they differ in an edge (or a node). An algorithm \(A\) is \(\epsilon\)-differentially private if for every pair of edge-neighboring(or node-neighboring) graphs \(G, G’\) and for every subset \(S\) of outputs, \(\Pr[A(G) \in S] \leq \exp(\epsilon) \cdot \Pr[A(G’) \in S]\).

The paper presents \(\epsilon\)-differentially private sublinear-time algorithms for well-studied problems such as estimating the average degree, the size of a min vertex cover and the size of a maximum matching. These algorithms access the input graphs via neighbor queries and degree queries.

In addition to providing a strong privacy guarantee, their algorithms nearly match the approximation and complexity guarantees of their non-differentially private counterparts. The main idea seems to be the formalization of a sensitivity notion, which they refer to as Global Coupled Sensitivity, and bounding it for the known sublinear-time algorithms for the aforementioned problems. Finally, they add Laplace noise calibrated with this sensitivity value to the output of the algorithms to make them differentially private.

Testability and Local Certification of Monotone Properties in Minor-closed Classes by Louis Esperet And Sergey Norin (arXiv)

One of the major interests in graph property testing is to characterize which properties are testable, i.e, can be \(\epsilon\)-tested with query complexity that depends only on the parameter \(\epsilon\). The question of testability is well-understood in the dense graph model as well as the bounded degree model. This paper concerns itself with testability questions in the general model or the sparse model of graph property testing, where graphs are represented as adjacency lists with no bound on the maximum degree.

The authors prove that every monotone property of minor-closed graph classes is testable with one-sided error, where a property is monotone if it is closed under taking subgraphs and a graph class is minor-closed if it is closed under taking minors. A crucial fact to be noted here is that a tester is allowed to make only uniformly random nerighbor queries.

This result is a significant generalization of a 2019 result by Czumaj and Sohler, who proved that for every finite set of graphs \(\mathcal{H}\), every \(\mathcal{H}\)-free property of minor-closed graph classes is testable with one-sided error, where a graph satisfies \(\mathcal{H}\)-freeness if none of its subgraphs belong to \(\mathcal{H}\).

They show an interesting consequence of their results to designing a short local certification scheme for monotone properties of minor-closed graph classes. Roughly speaking, they show the existence of a prover-verifier system for the aforementioned testing problem where proofs of length \(O(\log n)\) are assigned to each vertex and that verifier needs to observe only the proofs assigned to a vertex and its neighbors.

The plane test is a local tester for Multiplicity Codes by Dan Karliner, Roie Salama, and Amnon Ta-Shma (ECCC)

Multiplicity codes are a generalization of Reed-Muller codes and was first studied by Guruswami and Wang (2013) and Kopparty, Saraf and Yekhanin (2014). The messages here are polynomials of degree \(d\) over \(m\) variables and the codeword corresponding to a polynomial \(p\) is the evaluation of \(p\) and of all of its directional derivatives of order upto \(s\) over all the points in \(\mathbb{F}_q^m\), where \(q\) is a prime power.

Even though multiplicity codes are known to be locally decodable, it was open whether they are locally testable. Local testers for Reed Muller codes work by restricting the evaluations to a uniformly random line in \(\mathbb{F}_q^m\) and checking whether it corresponds to the evaluations of a degree \(d\) univariate polynomial. The authors first show that such a tester does not work for the case of multiplicity codes when \(d\) is large. They then show that a plane test is a good local tester for multiplicity codes even for larger values of \(d\). Specifically, a plane test checks whether the restriction of a given word, which is purportedly the evaluation of a polynomial of degree \(d\) and of its derivatives, to a uniformly random plane in \(\mathbb{F}_q^m\) is a bivariate multiplicity code of degree \(d\).

We conclude the post with a short note from Nader Bshouty and Oded Goldreich on a fundamental characterization result in property testing.

On properties that are non-trivial to test by Nader H. Bshouty and Oded Goldreich (ECCC)

A property on binary strings is nontrivial if for infinitely many \(n\), the property contains at least one string of length \(n\) and at most \(2^{n – \Omega(n)}\) strings of length \(n\). The note shows that every nontrivial property requires \(\Omega(1/\epsilon)\) queries to \(\epsilon\)-test.

News for January 2022

A slow month to start 2022, as far as property testing (and myself) are concerned — “only” 3 papers, and a delay of several days in posting this. Let’s jump in with quantum testing!

Testing matrix product states, by Mehdi Soleimanifar and John Wright (arXiv). Suppose you are given a state \(|\psi\rangle\) of \(n\) qubits, and want to know “how entangled” this whole thing is: for instance, is \(|\psi\rangle\) a product state (no entanglement between the \(n\) qudits)? More generally, the “amount of entanglement” allowed is captured by an integer \(r\), the bond dimension, where product state corresponds to \(r=1\), and larger \(r\) allows for more entanglement. This paper then considers the following property testing question: how many copies of \(|\psi\rangle\) are needed to test whether it has bond dimension at most \(r\), or is \(\varepsilon\)-far from every such state (in trace distance)? While the case \(r=1\) had been previously considered, this paper considers the general case; and, in particular, shows a qualitative gap between \(r=1\) (for which a constant number of copies, \(O(1/\varepsilon^2)\), suffice) and \(r\geq 2\) (for which they show the number of states is \(\Omega(\sqrt{n}/\varepsilon^2)\), and \(O(n r^2/\varepsilon^2)\)).

Constant-time one-shot testing of large-scale graph states, by Hayata Yamasaki and Sathyawageeswar Subramanian (arXiv). In this paper, the authors consider the task of testing if the physical error rate of a given system is below a given threshold — namely, the threshold below which fault-tolerant measurement-based quantum computation (MBQC) becomes feasible. Casting this into the framework of property testing, the paper shows that measuring very few (a constant number!) of the input state is enough to test whether the error rate is low.

And, to conclude, a paper which escaped us in December, on private distribution testing:

Pure Differential Privacy from Secure Intermediaries, by Albert Cheu and Chao Yan (arXiv). Throwback to April 2020 and August 2021, which covered results on distribution testing (uniformity testing!) under the shuffle model of differential privacy. Namely, there was an upper bound of $$ O( k^{2/3}/(\alpha^{4/3}\varepsilon^{2/3})\log^{1/3}(1/\delta) + k^{1/2}/(\alpha\varepsilon) \log^{1/2}(1/\delta) + k^{1/2}/\alpha^2)$$ samples for testing uniformity of distributions over \([k]\), to distance \(\alpha\), under \((\varepsilon,\delta)\)shuffle privacy (so, approximate privacy: \(\delta>0\)). A partial lower bound existed for pure differential privacy, i.e., when \(\delta=0\): however, no upper bound was known for pure shuffle privacy.
Until now: this new paper shows that pure DP basically comes at no cost, by providing an \((\varepsilon,0)\)-shuffle private testing algorithm with sample complexity $$ O( k^{2/3}/(\alpha^{4/3}\varepsilon^{2/3}) + k^{1/2}/(\alpha\varepsilon) + k^{1/2}/\alpha^2)$$ The paper actually does a lot more, focusing on a different problem, private summation; and the testing upper bound is a corollary of the new methods they develop in the process.

News for December 2021

Happy 2022 everyone! We now cover the last papers of 2021. Our readers have pointed out a number of papers from previous months that we missed (sorry). To adequately cover (and publicize) these papers, we’ve decided to just incorporate those papers on this post rather than update previous posts.

We have a breakthrough on Locally Testable Codes, sublinear algorithms for testing pattern freeness, new thoughts on convexity testing, and a perspective on sampling community structure. And here’s the final list for 2021.

Locally Testable Codes with constant rate, distance, and locality by Irit Dinur, Shai Evra, Ron Livne, Alexander Lubotzky, and Shahar Mozes (ECCC, posted Nov 2021). A Locally Testable Code (LTC) is one where the set of codewords has a property tester. The study of LTCs started with the PCP theorem, but blossomed into a deep research agenda. The holy grail is a \(c^3\)-LTC: one with constant rate, constant distance, and constant query property tester. Just to recall, the rate \(r \in (0,1)\) is the ratio of the message length to the encoding length. The distance \(\delta \in (0,1)\) is the (normalized) Hamming distance between codewords (we typically expect this to be constant for an error correcting code). And, most familiar to us, \(q\) is the number of queries required to property test codewords. The aim is to get all these parameters to be constants independent of the message length. The classic linearity test of BLR basically proves that the Hadamard code is 3-query testable. But the Hadamard code has poor (exponentially small) rate. Low-degree tests prove that Reed-Muller codes are constant-query testable, but they have inverse polynomial rate. Further work constructed LTCs with inverse polylogarithmic rate, meaning that the encoding has length \(n\cdot poly(\log n)\) for message length \(n\). There was even some belief that any LTC must have a rate going to zero. This paper invalidates that belief, and gives a truly \(c^3\)-construction. The construction is based on a 2-dimensional simplicial complex based on Cayley graphs. This is basically a graph with hyperedges corresponding to “squares” of vertices. Concurrent and independent work of Breuckmann-Ebedhardt and Panteleev-Kalachev also have similar constructions, though their perspective is of quantum codes.

On the Locally Testable Code of Dinur et al. (2021) by Oded Goldreich (ECCC). As the title says, this short paper gives a high level discussion of the previous result. It abstracts out some of the graph theoretic aspects, to explain connections to more combinatorial constructions. The construction and analysis is based on expander codes. These codes (e.g. low-density parity check, LDPC codes) have constant rate and distance, but we do not know how to test them. A string far from a codeword might only fail on (say) one parity check constraint. Interpreting the Dinur et al as an expander code, we essentially get a collection of correlated parity constraints. Hence, a string that is far from a codeword violates many constraints, leading to a tester. (Of course, this is an extremely watered down version of the analysis idea.) The construction takes two Cayley graphs on a non-Abelian group (which form the vertices). Specific 4-cycles in this graph (together with a constant sized tensor code) are used to make the final codewords.

Strongly Sublinear Algorithms for Testing Pattern Freeness by Ilan Newman and Nithin Varma (arXiv, posted Jun 2021). Pattern freeness is a major generalization of monotonicity testing. The input is (query access to) a function \(f: [n] \mapsto \mathbb{R}\). Let \(\pi: [k] \mapsto [k]\) be a permutation. The input is \(pi\)-free if no restriction of \(f\) to \(k\) elements induces the permutation \(pi\). Newman-Rabinovich-Rajendraprasad-Sohler introduced this problem. When \(k=2\), this property is precisely monotonicity, for which (many) \(O(\varepsilon^{-1} \log n)\) property testers are known. Is such a bound achievable for \(k \geq 3\)? Interestingly, for non-adaptive algorithms, an optimal bound of \(\Theta(n^{1-1/(k-1)})\) (ignoring, \(\varepsilon, k\) dependencies) was achieved by Ben-Eliezer and Canonne. This papers shows that there is an adaptive algorithm that makes \(n^{o(1)}\) queries, thereby proving a separation between adaptive and non-adaptive algorithms. A key idea is to visualize the input as a set of points in the plane, and construct a coarse grid that partitions the points. There is a highly non-trivial case analysis on top of this that leads to a \(O(\sqrt{n})\)-query algorithm (this already beats the non-adaptive lower bound).

Parameterized Convexity Testing by Abhiruk Lahiri, Ilan Newman, and Nithin Varma (arXiv, posted Oct 2021). Given a function \(f:[n] \mapsto \mathbb{R}\), the property of interest is (discrete) convexity. The discrete derivatives should be monotonically non-decreasing. It is known that there is a \(O(\varepsilon^{-1}\log n)\) query non-adaptive tester for this property. This paper studies this problem from the perspective of parameterized property testing, where the range is restricted. The main result is that if the number of distinct discrete derivates is at most \(s\), there is a \(O(\varepsilon^{-1} \log s)\) query non-adaptive tester. Moreover, this bound is optimal even for adaptive algorithms. This picture parallels that of monotonicity. Interestingly, when there are only two distinct discrete derivatives, the tester can be made deterministic.

Modularity and edge sampling by Colin McDiarmid and Fiona Skerman (arXiv). Not your typical sublinear algorithms paper, but certainly of interest to us. The main problem is to understand when a random samples of edges in a graph can reveal information about the “community structure”. The modularity of a graph is a measure of this structure. Essentially, a partition has low modularity if the number of edges crossing the partition is close to that of a random graph with the same degree distribution. A partition of high modularity has significant fewer edges, and is thus a valid “clustering” of the graph. The modularity of the graph is obtained by maximizing over partitions. (The normalization of modularity is chosen so that the partition generally has a constant number of sets.) This paper proves a concentration inequality for the modularity of a constant sample of uniform random edges. The modularity of a sample of \(\Theta(\varepsilon^{-5})\) edges is an additive \(\varepsilon\)-approximation of the overall modularity, whp.

News for November 2021

The holiday season is just around the corner. Best wishes to you and your loved ones in advance from us here at PTReview. This month, we had four five papers in all. To prepare for the festive mood associated with the onset of December, also included are brief updates on the recently disproven implicit graph conjecture. Let’s dig in. (And please, do let us know if we missed your paper).

(EDIT: Thanks to our readers for pointing out a paper we missed.)

Downsampling for Testing and Learning in Product Distributions by Nathaniel Harms, Yuichi Yoshida (arXiv). Contemplating on connections in algorithms used for testing a diverse collection of function properties, this paper provides a unified and generalized view of a technique: which the authors call downsampling. As the paper explains, the name is motivated by analogy to image/signal processing tasks. Very often, these tasks involve two steps. In the first step, you break the input domain into “grid cells”. You use oracle calls to the function to obtain a crude approximation over all these cells. In the second step, you learn this gridded-up or coarsened function cell-by-cell.

This two-step attack could just be what the doctor ordered for your favorite function property testing problem: in particular, it has been put to work for approximately testing visual properties, approximating distance to monotonicity in high dimensions, testing \(k\)-monotone functions, and more. However, if you wanted to obtain property testers using this approach in the distribution-free setup, your ordeal might be far from over. The unknown distribution your domain is equipped with can mess with geometric arguments your gridding approach hopes to exploit. This is precisely the setup considered in the paper (i.e, distribution-free testing of function properties).

The essence of downsampling, is captured by a slick proposition that prescribes coarsening as your goto weapon if the
1) fraction of cells on which \(f\) is not constant, and
2) a measure of how non-uniform the unknown distribution D your domain is equipped with is

are both small.

Equipped with this machinery, the paper tackles the task of designing distribution-free testers for boolean monotonicity with the underlying domain being \(\mathbb{R}^d\). The argument is pretty short and improves upon the sample complexity of the corresponding result in the paper by Black-Chakrabarti-Seshadhri. Do check it out, looks like some nice ammo to add to your toolkit.

Let’s stay on board for sublinear time algorithms for gap-edit distance.

Gap Edit Distance via Non-Adaptive Queries: Simple and Optimal by Elazar Goldenberg, Tomasz Kociumaka, Robert Krauthgamer, Barna Saha (arXiv). The edit distance problem needs no introduction. This paper studies the problem of approximating edit distance in sublinear time. In particular, this is formalized by introducing the gapped version of the problem. This entails the following computational task: Fix \(k \in \mathbb{N}\) and some \(c > 0\). Given a pair \(x, y\) of strings over a finite alphabet, decide whether the edit distance between \(x,y\) is at most \(k\) or whether it is at least \(k^c\). This paper resolves the non-adaptive query complexity of edit distance and proves that the above gapped version can be decided in at most \(O\left(\frac{n}{k^{c – 1/2}}\right)\) queries. The paper also proves that this bound is almost optimal up to some polylog factors.

Next up, we have time-optimal algorithms for maximum matching and vertex cover! Sweet, right?

Time-Optimal Sublinear Algorithms for Matching and Vertex Cover by Soheil Behnezhad (arXiv). This paper gives algorithms for the classic problem of estimating the size of vertex cover and maximum matching in graphs in sublinear time (in both adjacency matrix and adjacency list models). This is another nice read for the holiday season — after all the paper obtains time-optimal algorithms (up to polylog factors) in both of these models for a multiplicative-additive \((2, \varepsilon n)\) algorithms. Let me set up some historical context to appreciate the maximum matching result better.

In a classic work, Parnas and Ron gave sublinear time algorithms that obtain a \((2, \varepsilon n)\) estimate to the size of a maximum matching in a graph using query access to the adjacency list in sublinear time. Their sublinear time algorithm is inspired by ideas with roots in distributed algorithms. In particular, their algorithm returns in time
\(\Delta^{O(\log {\frac{\delta}{\varepsilon}})}\) (where \(\Delta = \Delta(G)\) denotes the degree bound) an estimate \(\text{EST}\) to the size of the max-matching where \(OPT \leq \text{EST} \leq 2 \cdot OPT + \varepsilon n\). Unfortunately, algorithms in this Parnas-Ron Framework must necessarily suffer a quasi-polynomial running time because of lower bounds from distributed algorithms. Using a randomized greedy approach to estimating the size of a maximum matching, Yoshida, Yamamoto, and Ito gave the first algorithm which returned a \((2, \varepsilon n)\)-estimate to maximum matching in time \(poly(\Delta/\varepsilon)\). This is where the story essentially froze — though there were results that improved the dependence on \(\Delta\) to \(O(\Delta^2)\). Unfortunately, this is not truly sublinear for large \(\Delta\). In another direction, Kapralov-Mitrovic-Norouzi Fard-Tardos gave an algorithm that estimates the maximum matching size in time \(O(\Delta)\) (which is truly sublinear) but it no longer guarantees a \((2, \varepsilon n)\)-approximation and instead returns a \((O(1), \varepsilon n)\)-approximation. The current paper, as remarked above achieves the best of both worlds.

Final stop, updates on the implicit graph conjecture.

The Implicit Graph Conjecture is False by Hamed Hatami, Pooya Hatami (arXiv). While not a property testing paper per se, it is always good to see an old conjecture resolved one way or the other. In this paper, the Hatami brothers (Pooya and Hamed) refute a three decade old conjecture — the one mentioned in the title. Here is a brief history. Sampath Kannan, Moni Naor and Steven Rudich defined the notion of implicit representation of graphs. Take a family of \(\mathcal{F}\) of graphs and consider a \(n\) vertex graph \(G \in \mathcal{F}\). You say this family admits an efficient implicit representation if there exists an assignment of \(O(\log n)\) length labels to all the vertices in \(V(G)\) such that the adjacencies between every pair of vertices is a function of the labels of the corresponding pair. Crucially, the labeling function may depend on the family, but not on individual graphs in the family. What is cool about families admitting such an efficient implicit representation, is that the number of \(n\)-vertex graphs in this family cannot be any bigger than \(2^{O(n \log n)}\) — that is such families have at most factorial growth rate. Implicit graph conjecture asserts that for every hereditary graph family, the converse also holds. Namely, if the hereditary family has at most factorial growth rate, then the family admits efficient implicit representations. The key, as shown in this paper (which spans all of six pages!) is to choose as your hereditary graph class, the closure of a random collection of graphs. The authors show that now your hand is forced and your representation will no longer be efficient. However, annoyingly, your hereditary graph class need not have a factorial growth rate as taking the closure of a random collection expands to contain all graphs and has growth rate \(2^{\Omega(n^2)}\). The cool thing is, you can avoid this issue by choosing a random collection of slightly sparse random graphs (with \(n^{2-\varepsilon}\) edges). Interestingly, this gives you enough ammo to finally control the growth rate which in turn allows the authors to slay this conjecture.

(EDIT: Omitted Stop, added retroactively).

Sublinear quantum algorithms for estimating von Neumann entropy by Tom Gur, Min-Hsiu Hsieh, Sathyawageeswar Subramanian (arXiv). This paper presents quantum algorithms for the problem of obtaining multiplicative estimates of the Shannon entropy of the familiar classical distributions and the more exotic von Neumann entropy of mixed quantum systems. In particular, the paper presents an \(\widetilde{O}(n^{\frac{1+\eta }{2\gamma^2 }})\)-query quantum algorithm that achieves a \(\gamma\)-multiplicative approximation algorithm for the Shannon Entropy of an input distribution \(\mathbf{p}\) supported on a universe of size \(n\) where \(H(\mathbf{p}) \geq \gamma/\eta\). As if this were not already cool enough, the paper also presents sublinear quantum algorithms for estimating Von Neumann Entropy as well. This is supplemented by a lower bound of \(\Omega(n^{\frac{1}{3 \gamma^2}})\) queries for achieving a \(\gamma\)-factor approximation for any of these two kinds of entropies.

News for October 2021

The month of September was quite busy, with seven papers, spanning (hyper)graphs, proofs, probability distributions, and sampling.

Better Sum Estimation via Weighted Sampling, by Lorenzo Beretta and Jakub Tětek (arXiv). This paper considers the following question: “given a large universe of items, each with an unknown weight, estimate the total weight to a multiplicative \(1\pm \varepsilon\).” The key is in the type of access you have to those items: here, the authors consider the setting where items can be sampled proportionally to their unknown weights, and show improved bounds on the sample/query complexity in this model. And there something for everyone: they also discuss connections to edge estimation in graphs (assuming random edge queries) and to distribution testing (specifically, in the “dual” or “probability-revealing” models of Canonne–Rubinfeld and Onak–Sun).

This gives us an easy segue to distribution testing, which is the focus of the next two papers.

As Easy as ABC: Adaptive Binning Coincidence Test for Uniformity Testing, by Sudeep Salgia, Qing Zhao, and Lang Tong (arXiv). Most of the work in distribution testing (from the computer science community) focuses on discrete probability distributions, for several reasons. Including a technical one: total variation distance is rather fickle with continuous distributions, unless one makes some assumption on the unknown distribution. This paper does exactly this: assuming the unknown distribution has a Lipschitz density function, it shows how to test uniformity by adaptively discretizing the domain, achieving (near) sample complexity.

Exploring the Gap between Tolerant and Non-tolerant Distribution Testing, by Sourav Chakraborty, Eldar Fischer, Arijit Ghosh, Gopinath Mishra, and Sayantan Sen (arXiv). It is known that tolerant testing of distributions can be much harder than “standard” testing – for instance, for identity testing, the sample complexity can blow up by nearly a quadratic factor, from \(\sqrt{n}\) to \(\frac{n}{\log n}\)! But is it the worse that can happen, in general, for other properties? This work explores this question, and answers it in some notable cases of interest, such as for label-invariant (symmetric) properties.

And now, onto graphs!

Approximating the Arboricity in Sublinear Time, by Talya Eden, Saleet Mossel, and Dana Ron (arXiv). The arboricity of a graph is the minimal number of spanning forests required to cover all its edges. Many graph algorithms, especially sublinear-time ones, can be parameterized by this quantity: which is very useful, but what do you do if you don’t know the arboricity of your graph? Well, then you estimate it. Which this paper shows how to do efficiently, given degree and neighbor queries. Moreover, the bound they obtain — \(\tilde{O}(n/\alpha)\) queries to obtain a constant-factor approximation of the unknown arboricity \(\alpha\) — is optimal, up to logarithmic factors in the number of vertices \(n\).

Sampling Multiple Nodes in Large Networks: Beyond Random Walks, by Omri Ben-Eliezer, Talya Eden, Joel Oren, and Dimitris Fotakis (arXiv). Another thing which one typically wants to do with very large graphs is sample nodes from them, either uniformly or according to some prescribed distribution. This is a core building block in many other algorithms; unfortunately, approaches to do so via random walks will typically require a number of queries scaling with the mixing time \(t_{\rm mix}(G)\) of the graph \(G\), which might be very small for nicely expanding graphs, but not so great in many practical settings. This paper proposes and experimentally evaluates a different algorithm which bypasses this linear dependence on \(t_{\rm mix}(G)\), by first going through a random-walk-based “learning” phase (learn something about the structure of the graph) before using this learned structure to perform faster sampling, focusing on small connected components.

Why stop at graphs? Hypergraphs!

Hypergraph regularity and random sampling, by Felix Joos, Jaehoon Kim, Daniela Kühn, Deryk Osthus (arXiv). The main result in this paper is a hypergraph analogue of a result of Alon, Fischer, Newman and Shapira (for graphs), which roughly states that if a hypergraph satisfies some regularity condition, then so does with high probability a randomly sampled sub-hypergraph — and conversely. This in turn has direct implications to characterizing which hypergraph properties are testable: see the companion paper, by the same authors.
(Note: this paper is a blast from the past, as the result it shows was originally established in the linked companion paper, from 2017; however, the authors split this paper in two this October, leading to this new, standalone paper.)

And, to conclude, Arthur, Merlin, and proofs:

Sample-Based Proofs of Proximity, by Guy Goldberg, Guy Rothblum (ECCC). Finally, consider the setting of interactive proofs of proximities (IPPs), where the prover is as usual computationally unbounded, but the verifier must run in sublinear time (à la property testing). This has received significant interest in the past years: but what if the verifier didn’t even get to make queries, but only got access to uniformly random locations of the input? These “SIPP” (Sample-based IPPs), and their non-interactive counterpart SAMPs (Sample-based Merlin-Arthur Proofs of Proximity) are the object of study of this paper, which it introduces and motivates in the context, for instance, of delegation of computation for sample-based algorithms.