Author Archives: Seshadhri

News for March 2024

We have a rich bounty of papers this March (with a Feb 29 submission that got associated with this month). Local Computations Algorithms, Error Correction, PageRank computations, Quantum testing, Distribution testing, Graph property testing, and nice note on a statement we all knew (but never bothered to prove!). (Ed: We missed a paper on graph property testing, and updated the blurb on “A basic lower bound for property testing”.)

Average-Case Local Computation Algorithms by Amartya Shankha Biswas, Ruidi Cao, Edward Pyne, and Ronitt Rubinfeld (arXiv). Readers of this blog are likely familiar with Local Computation Algorithms (LCA) model. Imagine the input is (say) a large undirected graph \(G\), to which an algorithm has query access to. The aim is to give sublinear access to a large, linear sized output, such as a graph spanner. So each bit/coordinate/edge of the output should be computable with only sublinear access to \(G\) (these are called probes). There is a rich line of results for computing \(k\)-spanners, which are sparse subgraphs of \(G\) that approximate the shortest path distances up to a factor \(k\). Previous LCA lower bounds give a barrier of \(\Omega(\sqrt{n})\) probes per spanner query. To beat this bound, this paper introduces average-case LCAs. Imagine that \(G\) is generated according to graph distribution, like Erdős-Rényi, and the LCA only needs to succeed with high probability over the input. For such settings, this paper gives a number of LCAs for different parameter settings that beats the \(\sqrt{n}\) probe barrier. An additional concept introduced is that of joint sampling. Here, the LCA is expected to generate the random input \(G\) and gives access to the spanner. It is shown that one can beat the trivial bound obtained by just coupling together LCAs for input generation and spanner construction.

Local Correction of Linear Functions over the Boolean Cube by Prashanth Amireddy, Amik Raj Behera, Manaswi Paraashar, Srikanth Srinivasan, and Madhu Sudan (ECCC, arXiv). Speaking of LCAs, a classic example of an LCA is a local error correcting code (LCC). The input is a function \(f:\{0,1\}^n \to G\) where \(G\) is an Abelian group. Think of codewords as linear functions over this space. A \((\delta, q)\)-LCC would provide local access to the closest keyword to \(f\) with distance \(< \delta\) from the property of linear functions. Each coordinate of the closest keyword is obtained with \(q\) queries/probes to \(f\). Classic results provide a \((\Omega(1), O(1))\)-LCC when \(G = \mathbb{Z}_2\). The aim of this paper is to consider general range groups, such as the reals. Not much is known in this space for LCCs. This paper gives a \((1/4-\varepsilon, \widetilde{O}(\log n))\)-LCC for this setting. (Note that \(1/4\) is the decoding radius.) Most previous LCC results use “common” symmetries among the (hypercube) domain and range, while this result works when the range is totally unrelated to the domain. Moreover the \(\widetilde{O}(\log n)\) query complexity is nearly matches the latex \(\Omega(\log n)\)-query lower bound for this problem.

Revisiting Local Computation of PageRank: Simple and Optimal by Hanzhi Wang, Zhewei Wei, Ji-Rong Wen, and Mingji Yang (arXiv). PageRank is one of the fundamental computations in modern network science. Given a (possibly) directed graph \(G = (V,E)\), PageRank defines a “truncated random walk” that works as follows. It starts from the uniform distribution. At each step, independently with probability \(\alpha\), the walk just stops. With the remaining probability, it performs a single random walk step by going to a random (out) neighbor. (This definition is equivalent to the stationary distribution of the random walk Markov Chain with a teleport probability \(\alpha\).) The PageRank value of vertex \(t\), denoted \(\pi(t)\), is the probability of ending at \(t\). The obvious algorithm for estimating \(t\) is to simply perform a bunch of random walks, requiring \(\Theta(1/\pi(t))\) to get non-trivial estimates. The average PageRank value is \(1/n\), so this leads to a linear time algorithm for an average vertex. This is a rich history of work on beating this bound, and getting a sublinear query complexity for (almost) all vertices. The approach is to get an estimator with running time (ideally) \(O(n \pi(t))\), By combining both estimators, we get \(O(\sqrt{n})\) time algorithms for estimating PageRank. This paper gets as close as possible to this bound, and achieves (the almost optimal) bound of \(O(\sqrt{n} \cdot m^{1/4})\). The actual bound is more involved and depends on the maximum in and out degrees. It is worth noting that random walks on directed graphs are nasty objects, so getting these algorithms is extremely challenging. Moreover, this paper nails down the upper and lower bounds in terms of the \(n, m\), and the max degrees.

Efficient Algorithms for Personalized PageRank Computation: A Survey by Mingji Yang, Hanzhi Wang, Zhewei Wei, Sibo Wang, and Ji-Rong Wen (arXiv). This survey isn’t meant for a TCS audience per se, but has bearing to the previous paper. And it has many explicit mentions to sublinear algorithms for PageRank computations. This survey focuses more on Personalized PageRank (PPR), wherein the walks starts from a single source vertex. Section 7 talks about sublinear algorithms for estimating individual PPR values, and discusses the techniques involved in these algorithms. This is a good survey for getting a sense of the interesting problems in estimating (P)PR values.

New Graph and Hypergraph Container Lemmas with Applications in Property Testing by Eric Blais and Cameron Seth (arXiv). The container method is a combinatorial approach that is now seeing many property testing applications. The best way to motivate this is to consider the classic problem of property testing if a dense graph \(G\) has a clique of size \(\rho n\). The canonical tester samples a small set of vertices (of size \(f(\rho)/poly(\varepsilon)\), where \(f(\cdot)\) is some explicit function), computes the largest clique in this set, and then extrapolates that to guess the largest clique size in \(G\). If \(G\) has a clique of size at least \(\rho n\) (the YES case), the analysis is an easy exercise. If \(G\) is \(\varepsilon\)-far from having a large clique (the NO case), the analysis needs to deal with probability that small cliques in \(G\) lead to large cliques in the sample (just by random deviation). This argument needs a union bound over “all possible (small) cliques” of \(G\). And this is always the hard part. The container method is about proving that there is a small (polynomial) number of “containers”, that contain all small cliques. The union bound then works out with stronger parameters. This was introduced to property testing in a beautiful, previous work of the authors, which led to substantial improvements for many property testing problems over dense graphs. In this paper, they generalize the container method to more settings, and prove stronger property testing bounds for satisfiability and \(k\)-colorability.

Hamiltonian Property Testing by Andreas Bluhm, Matthias C. Caro, and Aadil Oufkir (arXiv). This paper is on property testing the quantum notion of Hamiltonian locality. This description will only highlight my poor knowledge of quantum physics, but let me try. A Hamiltonian is an operator on \(n\) qubits, each of which should be thought of as a complex number with magnitude \(1\) (you can think of this a 2D vector, with each axis representing a classical bit). On a single qubit, there are four possible operations: identity, negate the complex part, switch the real and complex parts, or negate the complex and switch real/complex parts. These operations can be represented as \(2 \times 2\) matrices (the qubit is a 2D vector), and form a basis. We can generalize this basis to \(n\) qubits as follows. Pick a string in \(\{0,1,2,3\}^n\), where \(0\) indicates the identity operation, and the others map to the three non-trivial qubit transforms above. Apply the corresponding operation to the corresponding qubit. Mathematically, we represent this operation as a tensor product of the corresponding \(2 \times 2\) matrices. These operators form the Pauli basis; the locality of a basis operator is the number of non-trivial qubit operators used. Any Hamiltonian can be written in the Pauli basis, and it is called \(k\)-local if it is spanned by at most \(k\)-local basis operators. This paper sets up a natural property testing question. Distinguish the input Hamilton being \(k\)-local from the input being \(\varepsilon\)-far from being \(k\)-local. Access to the Hamiltonian (and exponentially large object to “write down”) is achieved by applying the operator/transformation to a vector of qubits. If distance between operators is measured by \(l_\infty\)-norm, then the property testing problem requires exponentially many queries to the Hamiltonian. The main result is that if distance is measure by \(l_2\)-norm, then only polynomially many queries suffice to test \(k\)-locality, for constant \(k\).

Equivalence Testing: The Power of Bounded Adaptivity by Diptarka Chakraborty, Sourav Chakraborty, Gunjan Kumar, Kuldeep S. Meel (arXiv). We all know the problem of equivalence/identity testing of distributions. This work is on the conditional sampling model. Consider two input distributions \(\mathcal{P}\) and \(\mathcal{Q}\) over domain \([n]\). For any subset \(S\) of the domain, we can generate a sample from the input distributions, conditioned on being in \(S\). In this model, existing results show that equivalence (\(\mathcal{P} = \mathcal{Q}\)) can property tested using \(O(\log\log n)\) samples (ignoring \(\varepsilon\) dependencies), a significant demonstration of the power of conditional sampling. But this is a sequential, and hence adaptive, algorithm. The best known bound for non-adaptive adaptive algorithms is \(O(\log^{12}n)\), while the known lower bound is \(\Omega(\log n)\). This paper shows that with a single round of adaptivity, one can get a \(\widetilde{O}(\log n)\)-query equivalence tester. This result opens up a rich source of problems for the conditional sampling model, wherein we look at the power of bounded adaptivity.

On the query complexity of testing local graph properties in the bounded-degree graph model by Oded Goldreich (arXiv). Consider property testing in bounded degree graph model. The input graph \(G = ([n], E)\) is represented as an adjacency list. In the dense graph setting, a vast majority of “interesting” properties are testable in time independent to graph size. For bounded-degree graphs, the story is much more complicated. A fundamental question is to characterize such “constant-time” testable properties. A local property (in this context) is one that is basically defined as subgraph freeness, with some augmentations by vertex degree. Previous work suggested that all local properties can be tested in time independent on graph size. This conjecture was subsequently refuted, but with a non-constructive argument that does not give an explicit query lower bound. This paper overcomes the non-constructive barrier, and gives a query lower bound of \(\Omega(\log^{(4)} n)\) for an explicit local property (\(\log^{(i)}\) is the iterated log). In a slightly weaker model, where the algorithm is not given the exact size of \(G\), the lower bound can be significantly improved to \(\Omega(\log n)\).

A basic lower bound for property testing by Eldar Fischer (arXiv). We end with a fundamental statement that you must have known. In any (discrete) setting, consider some non-trivial property \(\mathcal{P}\) with distance measured by Hamming distance. The setting that covers almost all property testing settings. Of course, any property tester requires \(\Omega(1/\varepsilon)\) queries. Intuitively, if a tester makes less \(O(1/\varepsilon)\) input queries, it might not “touch” the portion of the input that helps it in the distinguishing task. This basic fact has not been proven explicitly, and this note resolves that issue. The short paper gives a full self-contained proof of this fact, and does so without resorting to Yao’s minimax lemma. The proof is not difficult, but needs some care because the tester could be adaptive. Great read for students. (Further addendum: Nader Bshouty and Oded Goldreich pointed out that a previous note by Bshouty and Goldreich also proves such a lower bound. These results differ on the precise notation of a “non-trivial property”. Fischer’s definition is more direct and slightly stronger than the Bshouty-Goldreich definition. The Bshouty-Goldreich proof uses a construction of a “sufficiently separated” set of strings, which is different from the proof in this note. Again, both proofs are excellent reads for students to learn more about property testing. These results give a good perspective on crafting the right definitions.)

News for December 2023

Happy New Year! This post covers the final property testing results for 2023. We have just two three papers, one on group testing, a new bound on sample-based testers, and a distribution testing result over higher dimensions.

On Testing Group Properties by Oded Goldreich and Laliv Tauber (ECCC). Group testing is a classic problem going back to the early days of property testing. We are given access to a “multiplication table” \(f: [n]^2 \to [n]\). Each element in \([n]\) is (supposedly) an element of a group, and \(f(i,j)\) is the product of elements \(i\) and \(j\). Our aim is to determine, in the property testing sense, whether \(f\) is the multiplication table of a group. The earliest result, from the seminal Spot-checkers paper, gave an \(\widetilde{O}(n^{3/2}/\varepsilon)\) time tester (where \(\varepsilon\) is the proximity parameter). This paper significantly improves that classic bound with a one-sided \(\widetilde{O}(n/\varepsilon)\) tester. Moreover, this tester can be adapted for testing of \(f\) is Abelian. The result is obtained by a series of testers, starting from extremely simple (yet time inefficient) to more complex versions. The best known lower bound is just \(\Omega(\log n)\), and that leaves a tantalizing (and wide open) gap to be reduced.

A Structural Theorem for Local Algorithms with Applications to Coding, Testing, and Verification by Marcel Dall’Agnol, Tom Gur, and Oded Lachish (arXiv). One of the powers of property testers is their choice of query. This is unlike the typical setting in learning, where one only gets access to random samples (or evaluations). Sample-based testers were defined to bridge this gap; so it is a property tester that only has access to (evaluations of) uniform random domain points. A sample-based tester simply checks if the sample is consistent with the property. It is natural to ask if the existence of a (vanilla) property tester implies the existence of a sample based tester. For the simplest setting, consider a \(q\)-query non-adaptive tester for some property \(\mathcal{P}\). One can visualize this as a collection of query sets of size \(q\) in a universe of size \(n\) (the domain). Naively, one might hope that with a \(q\)-way collision argument, a random sample of size \(O(n^{1-1/q})\) would contain a query set, yielding a sample based tester. Previous work showed that, for any property with a \(q\)-query non-adaptive tester, there is a sample-based tester with complexity \(O(n^{1-1/(q^2\log q)})\). Remarkably, this work gives such a bound for even adaptive testers (the best previous bound was \(O(n^{1-1/\exp(q)})\)). The result is placed in a broader framework of robust local algorithms, which subsume \(q\) query property testers, locally decodable codes (LDC), and MA proofs of proximity.

Testing Closeness of Multivariate Distributions via Ramsey Theory by Ilias Diakonikolas, Daniel M. Kane, and Sihan Liu (arXiv). (Missed from last month. -Ed) This paper considers distribution testing where the universe is \(\mathbb{R}^d\). The notion of closeness is called \(\mathcal{A}_k\) distance: we cover the universe with \(k\) axis-parallel rectangles, and “reduce” the distribution to a discrete universe of size \(k\). We then take TV-distance over these reduced distributions. When \(d=1\), the complexity of testing closeness was known to be \(\Theta(k^{4/5}/poly(\epsilon))\). For \(d > 1\), this paper gives the first non-trivial closeness testing result. The (optimal in \(k\)) bound achieved is \(\Theta(k^{6/7}/poly(\epsilon))\). Interestingly, there is a jump in the exponent on \(k\) from \(d=1\) to \(d=2\), but no jump for larger \(d\).

News for July 2023

We’re now back to our regular cadence of 4+ monthly papers on property testing and sublinear time algorithms. July brings us a new sublinear PageRank computation and numerous results on our trending topics of distribution and monotonicity testing. Without further delay, let’s see the July spread.

Estimating Single-Node PageRank in \(\widetilde{O}(\min(d_t,\sqrt{m}))\) Time by Hanzhi Wang and Zhewei Wei (arXiv). PageRank is one of the most important quantities computed on graphs, especially in today’s networked world. Consider an undirected graph \(G = (V,E)\). The PageRank value of vertex \(t\) is the probability that a short random walk, starting from the uniform distribution, reaches the vertex \(t\). (Ok ok, I’m fudging details, but this is close enough to the truth.) The aim is to estimate this probability to within additive error \(O(1/n)\), where \(n\) is the number of vertices. A standard simulation would give an \(O(n)\) time algorithm; can we do sublinear in graph size? Previous work (which actually has implementations!) has led to \(O(\sqrt{n\cdot d_t})\) for undirected graphs [LBGS15] and (roughly) \(O(n^{2/3} d^{1/3}_{max})\) for all graphs [BPP18]. Here, \(d_t\) is the degree of vertex \(t\) and \(d_{max}\) is the maximum degree. This paper gets a bound of \(\widetilde{O}(\min(d_t,\sqrt{m}))\). A lower bound is still open for the fundamental problem! (A nice problem for any students reading…?)

Directed Poincare Inequalities and \(L_1\) Monotonicity Testing of Lipschitz Functions by Renato Ferreira Pinto Jr. (arXiv, ECCC). We all (or at least me) love monotonicity testing. The paper studies the continuous version, where \(f:[0,1]^n \to \mathbb{R}\). To have a reasonable notion of distance and testers, we will require functions to be differentiable and \(L\)-Lipschitz. We measure distance using \(\ell_1\) distance, so the distance between \(f,g\) is \(\int_{[0,1]^n} |f-g| d\nu\) (where we integrate over the uniform measure) [BRY14]. To access \(f\), we are also provided a “derivative oracle”: given a point \(x\) and a direction \(\vec{v}\), we can query the directional derivative along \(\vec{v}\) at \(x\). One of the key insights in (discrete, Boolean) monotonicity testing is the connection to directed isoperimetric inequalities. These inequalities are directed versions of classic inequalities that relate boundaries (or gradients) to volumes. For \(L\)-Lipschitz functions, the classic Poincare inequality states that \(E[\|\nabla f\|_1] \geq \textrm{var}(f)\), where \(\nabla f\) is the gradient and \(\textrm{var}(f)\) is (up to constant factors) the distance to being constant. This paper proves the directed version \(E[\|\nabla^- f\|_1] \geq \varepsilon(f)\). Roughly speaker, \(\nabla^-\) is the “negative gradient” (\(min(\nabla f, 0)\)) and \(\varepsilon(f)\) is the \(L_1\)-distance to monotonicity. This result directly yields a \(O(n/\varepsilon)\) query tester for monotonicity. The paper asks the tantalizing question: can we do better?

A Strong Composition Theorem for Junta Complexity and the Boosting of Property Testers by Guy Blanc, Caleb Koch, Carmen Strassle, and Li-Yang Tan (arXiv). It is best to go backwards in discussing this paper, starting from the implications and going to the core results. The problem at hand is the classic one of junta testing. Given \(f:\{0,1\}^n \to \{0,1\}\), distinguish if \(f\) only depends on \(r\) variables (an \(r\)-junta) or is \(\varepsilon\)-far from having that property. This problem is well studied, has optimal (efficient) testers, and even has results in the distribution-free setting. In the latter setting, there exists some (unknown) distribution \(\mathcal{D}\) on the domain according to which distance is defined. The tester can access to queries according to \(\mathcal{D}\). The clarity of junta testing disappears when we consider tolerant testing: can we distinguish \(f\) is (say) \(1/4\) close to an \(r\)-junta from being \(1/3\)-far (where distances are measured according to \(\mathcal{D}\))? A remarkable consequence of this paper is that this tolerant testing version is unlikely to have a \(poly(n)\) time algorithm. (Note that the sample complexity might still be small.) The main tool is a composition theorem that gives reductions between low \(\varepsilon\) testers and constant \(\varepsilon\) testers. The details are intricate, but here’s the gist. Suppose we can construct composing functions \(g\) such that the distance to “junta-ness” of \(g \circ f\) is much larger than the distance of \(f\). Then a tester (that can only deal with large \(\varepsilon\)) on \(g \circ f\) can effectively property test \(f\). (Obviously, I’m greatly oversimplifying and mischaracterizing. So go read the paper!)

The Full Landscape of Robust Mean Testing: Sharp Separations between Oblivious and Adaptive Contamination by Clément L. Canonne, Samuel B. Hopkins, Jerry Li, Allen Liu, and Shyam Narayanan (arXiv). Consider the fundamental hypothesis testing problem of Gaussian testing. We wish to distinguish the \(d\)-dimensional, zero-mean, unit covariance Gaussian \(\mathcal{N}(0,I)\), from the “shifted” version \(\mathcal{N}(\mu,I)\), where \(\mu\) is a vector of length at least $latex  \alpha$. Existing results give a simple algorithm using \(\Theta(\sqrt{d}/\alpha^2)\) samples. Suppose there is an adversary who can corrupt samples. The adaptive adversary can look at the samples generated, and arbitrarily change an \(\varepsilon\) fraction of entries. The weaker, oblivious adversary can also arbitrarily change an \(\varepsilon\) fraction of entries, but cannot look at the samples generated. Can we perform Gaussian testing in this setting? A previous result gave an optimal bound for adaptive adversaries [N22]. This paper gives the optimal sample complexity bound for oblivious adversaries. The expression is somewhat complicated. But the punchline is that for many regimes of parameters \(d, \alpha, \varepsilon\), the oblivious adversary is strictly weaker. Meaning, there is a testing algorithm (against an oblivious adversary) whose sample complexity is strictly less than the optimal bound against adaptive adversaries. This comes as a surprise, because in typical distribution testing settings, these adversaries are basically equivalent.

Uniformity Testing over Hypergrids with Subcube Conditioning by Xi Chen and Cassandra Marcussen (arXiv). A problem of generalizing from hypercube to hypergrids, an issue I have much obsessed over. Consider the problem of uniformity testing, where the domain is the hypergrid \([m]^n\). We wish to test if a distribution \(\mathcal{D}\) over the hypergrid is uniform. Unlike the standard distribution testing setting, the domain size is exponentially large. To aid us, we can perform conditional sampling: we can select any sub-hypergrid, and get samples from \(\mathcal{D}\) restricted to this sub-hypergrid. Previous work solved this problem for the hypercube domain (when \(m=2\)), by providing a tester with sample complexity \(O(\sqrt{n}/\varepsilon^2)\) [CCKLW22]. The previous work did not work for any other hypergrid (even \(m=3\)). This paper gives the first solution for uniformity testing on hypergrids with a tester of sample complexity \(O(poly(m)n/\varepsilon^2)\). The dependence on \(m\) has to be at least \(\sqrt{m}\), from existing results. One of the important tools is getting robust versions of classic isoperimetric inequalities over the hypergrid. An important open question is to determine the optimal dependence on \(\mathcal{m}\).

Learning and Testing Latent-Tree Ising Models Efficiently by Davin Choo, Yuval Dagan, Constantinos Daskalakis, and Anthimos Vardis Kandiros (arXiv). Depending on your standpoint, one may or may not consider this a “sublinear time” paper (it does not show that testing \(\ll\) learning). But given the interest in distribution testing, I think it’s germane to our blog. We have a high-dimensional distribution \(\mathcal{D}\) over \((x_1, x_2, \ldots, x_n)\). Without any assumption, learning (or even identity testing) requires exponential in \(n\) many samples. This paper assumes that \((x_1, x_2, \ldots, x_n)\) is generated from a latent-tree Ising model. There is a tree where each node is associated with a number (think \(x_i\)), and the joint distribution satisfies some dependencies according to the edges. We only observe the values of the leaves; hence, the term “latent” model. The main result shows that identity testing can be done in with polynomial in \(n\) samples. The authors also prove that one can learn the distribution in (albeit larger) \(poly(n)\) samples.

News for December 2022

Happy New Year! Apologies for the late post; I was stuck for far too long in holiday mode. We haven’t missed much. There were only two papers in the month of December, since, I’m assuming, many of us were entering holiday mode.

Testing in the bounded-degree graph model with degree bound two by Oded Goldreich and Laliv Tauber (ECCC). One of great, central results in graph property testing is that all monotone properties are testable (with query complexity independent on graph size) on dense graphs. The sparse graph universe is far, far, more complicated and interesting. Even for graphs with degree bound 3, natural graph properties can have anywhere from constant to linear (in \(n\)) query complexity. This note shows that when considering graphs with degree bound at most 2, the landscape is quite plain. The paper shows that all properties are testable in \(poly(\varepsilon^{-1})\). Any graph with degree at most 2 is a collection of paths and cycles. In \(poly(\varepsilon^{-1})\) queries, one can approximately learn the graph. (After which the testing problem is trivial.) The paper gives a simple \(O(\varepsilon^{-4})\) query algorithm, which is improved to the nearly optimal \(\widetilde{O}(\varepsilon^{-2})\) bound.

On the power of nonstandard quantum oracles by Roozbeh Bassirian, Bill Fefferman, and Kunal Marwaha (arXiv). This paper is on the power of oracles in quantum computation. An important question in quantum complexity theory is whether \(QCMA\) is a strict subset of \(QMA\). The former consists of languages decided by Merlin-Arthur quantum protocols with a classical witness (the string that Merlin provides). The latter class allows Merlin to be a quantum witness. This paper shows a property testing problem where such a separation is shown. The property is essentially graph non-expansion (does there exist a set of low conductance?). The input graph should be thought of as an even (bounded) degree with “exponentially many” vertices. So it has \(N = 2^n\) vertices. The graph is represented through a special “graph-coded” function. The paper shows that there is a \(poly(n)\)-sized quantum witness for non-expansion that can be verified in \(poly(n)\) time, which includes queries to the graph-coded function. On the other hand, there is no classic \(poly(n)\)-sized witness that can be verified in \(poly(n)\) queries to the graph-coded function. (Informally speaking, any \(QCMA\) protocol needs exponentially many queries to the graph.)

News for August 2022

A eerily quiet month, this August of ’22. We only found a single paper to cover!

Training Overparametrized Neural Networks in Sublinear Time by Hang Hu, Zhao Song, Omri Weinstein, Danyang Zhuo (arXiv). Think of a classification problem where the inputs are in \(\mathbb{R}^d\). We have \(n\) such points (with their true labels, as training data) and wish to train a Neural Network. A two layer Rectified Linear Unit (ReLU) Neural Network (NN) works as follows. The first layer has \(m\) vertices, where each vertex has vector weight \(\vec{w}_i \in \mathbb{R}^d\). The second “hidden layer” has \(m\) vertices, each with a scalar weight \(a_1, a_2, \ldots, a_m\). This network is called overparametrized when \(m \gg n\). The output of this NN on input vector \(\vec{x}\) is (up to scaling) \(\sum_{i \leq m} a_i \phi(\vec{w_i} \cdot \vec{x})\) (where \(\phi\) is a thresholded linear function). Observe that to compute the value on a single input takes \(O(md)\) time, so the total time to compute all values on \(n\) training inputs takes \(O(mnd)\) time. The training is done by gradient descent methods; given a particular setting of weights, we compute the total loss, and then modify the weights along the gradient. Previous work showed how a single iteration can be done in time \(O(mnd + n^3)\). When \(m \gg n^2\), this can be thought of as linear in computing the loss function (which requires evaluating the NN on all the \(n\) points). This paper shows how to implement a single iteration in \(O(m^{1-\alpha}nd + n^3)\) time, for some \(\alpha > 0\). Hence, the time for an iteration is sublinear in the trivial computation. The techniques used are sparse recovery methods and random projections.

News for April 2022

We have…I don’t know, I’ve lost count of the number of papers this month. It’s a big bonanza. Sublinear algorithms for edit distance, planar graphs, distributions, bipartite graphs, groups, error correcting codes, Bayesian nets, polynomials…

Let’s proceed with the spread.

Improved Sublinear-Time Edit Distance for Preprocessed Strings by Karl Bringmann, Alejandro Cassis, Nick Fischer, and Vasileios Nakos (arXiv). The edit distance between strings is a classic and important problem in algorithms. You might recall that classic \(O(n^2)\) algorithm to compute the edit distance between strings of length \(n\). It has been show that getting a \(O(n^{2-\delta})\) time algorithm is SETH-hard. But what can be done in sublinear time? This paper considers the preprocessed version: suppose we can perform near-linear preprocessing on the strings. We now want to distinguish between edit distance between \(\leq k\) and \(\geq k\cdot n^{o(1)}\). This paper shows that with near-linear preprocessing on strings, one can solve this problem in \(k \cdot n^{o(1)}\) time.

Optimal Closeness Testing of Discrete Distributions Made Complex Simple by (our own) Clément L. Canonne and Yucheng Sun (arXiv). Given two distributions \(p, q\) over support \([k]\), the aim is to distinguish between (i) the distributions being equal, and (ii) the total variation distance between \(p, q\) being at least \(\epsilon\). The tester should has a failure probability of at most \(\delta\). A recent work nails down the sample complexity with respect to all parameters. This paper gives a simpler proof of the main result. Earlier proofs used Poissonization tricks and fairly clever arguments about Poisson random variables. This proof is much more transparent, and uses an identity that relates the expectation of a random variable to its characteristic function. A nice feature of this proof is that it works directly with the multinomial distribution, which means a fixed number of samples (rather than choosing the number of samples from a distribution).

Tolerant Bipartiteness Testing in Dense Graphs by Arijit Ghosh, Gopinath Mishra, Rahul Raychaudhury, and Sayantan Sen (arXiv). Testing bipartiteness of dense graphs is about as a classic as it gets. We wish to distinguish a bipartite graph from one that requires \(\varepsilon n^2\) edge removals to make it bipartite. Readers of this blog should know that there is a \(\widetilde{O}(\varepsilon^{-2})\)-query property tester for this problem. (Ok, so now you know.) This paper studies the tolerant version of bipartiteness testing. Note that this is equivalent to approximating the maxcut, up to additive error \(\varepsilon n^2\). Classic approximation algorithms show that the latter can be done in \(\widetilde{O}(\varepsilon^{-6})\) queries and \(\exp(\widetilde{O}(\varepsilon^{-2}))\) time. This paper considers the easier problem of distinguishing whether the distance to bipartiteness is at most \(\varepsilon\) or at least \(2 \varepsilon\). This problem is solved in \(\widetilde{O}(\varepsilon^{-3})\) queries and \(\exp(\widetilde{O}(\varepsilon^{-1}))\).

Properly learning monotone functions via local reconstruction by Jane Lange, Ronitt Rubinfeld, Arsen Vasilyan (arXiv). Ah yes, monotone functions. An ongoing love (obsession? interest?) for property testing people. This paper studies the problem of proper learning of Boolean valued monotone functions over the Boolean hypercube. Given access to uniform random evaluations of a monotone function \(f:\{0,1\}^n \to \{0,1\}\), we wish to compute a monotone function \(g\) that approximates the original function. Classic results from Fourier analysis show that an approximation can be learned using \(\exp(\sqrt{n}/\varepsilon)\) queries. But this approximation function might not be monotone, and only yields improper learning. This paper gives a proper learner that outputs a monotone approximation, in roughly the same query complexity. This result directly gives a constant tolerance monotonicity tester for Boolean functions. The paper uses recent results from distributed algorithms and local computation. It also leads to tolerant testers for monotonicity over posets with small diameter.

Massively Parallel Computation and Sublinear-Time Algorithms for Embedded Planar Graphs by Jacob Holm and Jakub Tětek (arXiv). Sublinear algorithms for planar graphs is another ongoing love (at least for me). This paper considers a new take of this problem: suppose we have access to a geometric embedding of a planar graph \(G\). Can we get sublinear algorithms for a variety of problems? This paper first shows how to construct a convenient decomposition, called an \(r\)-division, in sublinear time. This division can be used to approximate Lipschitz graph parameters, such as maximum matching sizes, maximum independent set, etc. The paper also shows how to compute an \(r\)-division in the MPC model, which solves many classic graph problems (connected components, matchings, etc.) in \(O(1)\) rounds. There is a (conditional) lower bound showing that, without an embedding, it is not possible to solve such problems in \(O(1)\) rounds (and sublinear space per processor).

Independence Testing for Bounded Degree Bayesian Network by Arnab Bhattacharyya, Clément L. Canonne (again, our own), and Joy Qiping Yang (arXiv). Given a distribution \(\mathcal{P}\) on the Boolean hypercube \(\{0,1\}^n\), the problem is to determine whether \(\mathcal{P}\) is a product distribution. In general, this problem requires \(\Omega(2^n)\) samples. Suppose \(\mathcal{P}\) has a sparse, “efficient” description. Can we do better? This paper shows that when \(\mathcal{P}\) is generated by a Bayesian network (with bounded indegree), then the independence testing problem can be solved with a \(\widetilde{O}(n/\varepsilon^2)\) samples. Think of a Bayesian network as a DAG, where each vertex generates a Bernoulli random variable. The variable at a vertex depends only the outcomes at its neighborhood.

Low Degree Testing over the Reals by Vipul Arora, Arnab Bhattacharyya, Noah Fleming, Esty Kelman, and Yuichi Yoshida (arXiv, ECCC). The problem testing low degree polynomials goes back to the birth of property testing. This paper studies real valued polynomials, in the distribution free setting. Formally, we have query access to a function \(f: \mathbb{R}^d \to \mathbb{R}\). The distance is measured with respect to an unknown distribution \(\mathcal{D}\) over the domain. This paper shows that the real low degree testing problem can be solved in \(poly(d\varepsilon^{-1})\) queries (under some reasonableness conditions on the distribution). The approach is go to via the “self-correct and test” approach: try to compute a low degree polynomial that fits some sampled data, and then check how far the self-corrected version is from another sample.

Testing distributional assumptions of learning algorithms by Ronitt Rubinfeld and Arsen Vasilyan (arXiv). Consider the problem of learning a halfspace over \(\mathbb{R}^n\). If the underlying distribution is Gaussian, then this class can be learned in \(n^{poly(\varepsilon^{-1})}\) samples. If the distribution is arbitrary, no \(2^{o(n)}\) algorithm is known despite much research. This paper introduces the notion of having a tester-learner pair. The tester first checks if the input distribution is “well-behaved” (Gaussian-like). If the tester passes, then we run the learner. Indeed, this perspective goes back to some of the original motivations for property testing (when is testing faster than learning). The intriguing aspect of this problem is that we do not have efficient testers for determining if an input distribution is Gaussian. This paper circumvents that problem by estimating certain moments of the distribution. If these moments agree with the moments of a Gaussian, then the learner is guaranteed to succeed. We get the best of both worlds: if the input distribution is Gaussian, the learning is done correctly. If the learner succeeds, then then output (hypothesis) is guaranteed to be correct, regardless of the input distribution.

Testability in group theory by Oren Becker, Alexander Lubotzky, and Jonathan Mosheiff (arXiv). This paper is the journal version of a result of the authors, and it gives a group theoretic presentation of a property testing result. Consider the following problem. The input is a pair permutations \((\sigma_1, \sigma_2)\) over \([n]\). The aim is to test whether they commute: \(\sigma_1 \sigma_2 = \sigma_2 \sigma_1\). Another result of the authors gives a tester that makes \(O(\varepsilon^{-1})\) queries. They refer to this problem as “testing the relation” \(XY = YX\). This paper gives a grand generalization of that result, best explained by another example. Consider another relation/property denoted \(\{XZ = ZX, YZ = ZY\}\). This property consists of all triples of permutations \((\sigma_1, \sigma_2, \sigma_3)\), where \(\sigma_3\) commutes with the other two. A consequence of the main theorem is that this property is not testable with query complexity independent of \(n\). The main result of this paper is a characterization of testable relations, which goes via studying the expansion of an infinite graph associated with the relation.

Testing Positive Semidefiniteness Using Linear Measurements by Deanna Needell, William Swartworth, and David P. Woodruff (arXiv). The input is a \(d \times d\) real, symmetric matrix \(M\) and we wish to determine if it is positive semidefinite (all eigenvalues are positive). For the testing problem, we reject when the minimum eigenvalue is at most \(-\varepsilon \|M\|_2\). (The paper also considers general Schatten \(p\)-norms.) This paper gives a list of results for non-adaptive vs adaptive, and one-sided vs two-sided testers. There are two access models considered: a single query consists of either a (i) matrix-vector product \(Mx\) or (ii) vector-matrix-vector product \(y^TMx\). Typical models that query entries of the matrix require strong bounds on the entries, which is less reasonable in practical situations. An interesting discovery is that the non-adaptive, one-sided complexity is \(\Theta(\sqrt{d}\varepsilon^{-1})\) while the two-sided bound is independent of \(d\).

Relaxed Locally Decodable and Correctable Codes: Beyond Tensoring by Gil Cohen and Tal Yankovitz (ECCC). Locally decodable and correctable codes are a fundamental object of study in property testing (and TCS in general). Consider a locally correctable code (LCC). Given a string \(x\), the decoder/corrector makes \(q\) queries to \(x\), and outputs a symbol. We can think of the output collectively as a string \(y\). If \(x\) is a codeword, then \(y = x\). Otherwise, \(dist(y,z) \leq \varepsilon\), where \(z\) is some codeword close to \(x\). In the relaxed version, the corrector is allowed to output \(\bot\), denoting that it has discovered corruption. The distance is only measured in the coordinates where the corrector does not output \(\bot\). Thus, the corrector gets a “free pass” if it outputs \(\bot\). But note that when \(x\) is a codeword, the output must be exactly \(x\). This paper gives a Relaxed LCC with query complexity \((\log n)^{O(\log\log\log n)}\), a significant improvement over the previous best \((\log n)^{O(\log\log n)}\). It is know from previous work that the query complexity must be \(\Omega(\sqrt{\log n})\).

Verifying The Unseen: Interactive Proofs for Label-Invariant Distribution Properties by Tal Herman and Guy Rothblum (arXiv). This paper considers the distribution testing problem in the context of interactive proofs. The verifier, who wishes to test a property of a distribution \(\mathcal{P}\), interacts with a prover who knows the distribution. The guarantee required is the standard one for interactive proof systems: in the YES case, an honest prover should be able to convince the verifier. In the NO case, no prover can convince the verifier with high probability. There are two important parameters of interest: the sample complexity of the verifier, and the communication complexity of the messages. It is useful to consider the two extremes. In one extreme, the verifier can simply solve the problem herself, ignoring the prover. This could require \(\Theta(n/\log n)\) queries (for the hardest properties like entropy and support size). Another extreme is for the honest prover to simply send an approximate description of the distribution, which takes \(O(n)\) bits. The prover can just test equality to the prover message, which only takes \(\Theta(\sqrt{n})\) queries. This paper shows a 2-round protocol for any (label-invariant) property where both the communication and the sample complexity can be made \(\Theta(\sqrt{n})\). This result shows the power of interaction for distribution testing problems.

News for December 2021

Happy 2022 everyone! We now cover the last papers of 2021. Our readers have pointed out a number of papers from previous months that we missed (sorry). To adequately cover (and publicize) these papers, we’ve decided to just incorporate those papers on this post rather than update previous posts.

We have a breakthrough on Locally Testable Codes, sublinear algorithms for testing pattern freeness, new thoughts on convexity testing, and a perspective on sampling community structure. And here’s the final list for 2021.

Locally Testable Codes with constant rate, distance, and locality by Irit Dinur, Shai Evra, Ron Livne, Alexander Lubotzky, and Shahar Mozes (ECCC, posted Nov 2021). A Locally Testable Code (LTC) is one where the set of codewords has a property tester. The study of LTCs started with the PCP theorem, but blossomed into a deep research agenda. The holy grail is a \(c^3\)-LTC: one with constant rate, constant distance, and constant query property tester. Just to recall, the rate \(r \in (0,1)\) is the ratio of the message length to the encoding length. The distance \(\delta \in (0,1)\) is the (normalized) Hamming distance between codewords (we typically expect this to be constant for an error correcting code). And, most familiar to us, \(q\) is the number of queries required to property test codewords. The aim is to get all these parameters to be constants independent of the message length. The classic linearity test of BLR basically proves that the Hadamard code is 3-query testable. But the Hadamard code has poor (exponentially small) rate. Low-degree tests prove that Reed-Muller codes are constant-query testable, but they have inverse polynomial rate. Further work constructed LTCs with inverse polylogarithmic rate, meaning that the encoding has length \(n\cdot poly(\log n)\) for message length \(n\). There was even some belief that any LTC must have a rate going to zero. This paper invalidates that belief, and gives a truly \(c^3\)-construction. The construction is based on a 2-dimensional simplicial complex based on Cayley graphs. This is basically a graph with hyperedges corresponding to “squares” of vertices. Concurrent and independent work of Breuckmann-Ebedhardt and Panteleev-Kalachev also have similar constructions, though their perspective is of quantum codes.

On the Locally Testable Code of Dinur et al. (2021) by Oded Goldreich (ECCC). As the title says, this short paper gives a high level discussion of the previous result. It abstracts out some of the graph theoretic aspects, to explain connections to more combinatorial constructions. The construction and analysis is based on expander codes. These codes (e.g. low-density parity check, LDPC codes) have constant rate and distance, but we do not know how to test them. A string far from a codeword might only fail on (say) one parity check constraint. Interpreting the Dinur et al as an expander code, we essentially get a collection of correlated parity constraints. Hence, a string that is far from a codeword violates many constraints, leading to a tester. (Of course, this is an extremely watered down version of the analysis idea.) The construction takes two Cayley graphs on a non-Abelian group (which form the vertices). Specific 4-cycles in this graph (together with a constant sized tensor code) are used to make the final codewords.

Strongly Sublinear Algorithms for Testing Pattern Freeness by Ilan Newman and Nithin Varma (arXiv, posted Jun 2021). Pattern freeness is a major generalization of monotonicity testing. The input is (query access to) a function \(f: [n] \mapsto \mathbb{R}\). Let \(\pi: [k] \mapsto [k]\) be a permutation. The input is \(pi\)-free if no restriction of \(f\) to \(k\) elements induces the permutation \(pi\). Newman-Rabinovich-Rajendraprasad-Sohler introduced this problem. When \(k=2\), this property is precisely monotonicity, for which (many) \(O(\varepsilon^{-1} \log n)\) property testers are known. Is such a bound achievable for \(k \geq 3\)? Interestingly, for non-adaptive algorithms, an optimal bound of \(\Theta(n^{1-1/(k-1)})\) (ignoring, \(\varepsilon, k\) dependencies) was achieved by Ben-Eliezer and Canonne. This papers shows that there is an adaptive algorithm that makes \(n^{o(1)}\) queries, thereby proving a separation between adaptive and non-adaptive algorithms. A key idea is to visualize the input as a set of points in the plane, and construct a coarse grid that partitions the points. There is a highly non-trivial case analysis on top of this that leads to a \(O(\sqrt{n})\)-query algorithm (this already beats the non-adaptive lower bound).

Parameterized Convexity Testing by Abhiruk Lahiri, Ilan Newman, and Nithin Varma (arXiv, posted Oct 2021). Given a function \(f:[n] \mapsto \mathbb{R}\), the property of interest is (discrete) convexity. The discrete derivatives should be monotonically non-decreasing. It is known that there is a \(O(\varepsilon^{-1}\log n)\) query non-adaptive tester for this property. This paper studies this problem from the perspective of parameterized property testing, where the range is restricted. The main result is that if the number of distinct discrete derivates is at most \(s\), there is a \(O(\varepsilon^{-1} \log s)\) query non-adaptive tester. Moreover, this bound is optimal even for adaptive algorithms. This picture parallels that of monotonicity. Interestingly, when there are only two distinct discrete derivatives, the tester can be made deterministic.

Modularity and edge sampling by Colin McDiarmid and Fiona Skerman (arXiv). Not your typical sublinear algorithms paper, but certainly of interest to us. The main problem is to understand when a random samples of edges in a graph can reveal information about the “community structure”. The modularity of a graph is a measure of this structure. Essentially, a partition has low modularity if the number of edges crossing the partition is close to that of a random graph with the same degree distribution. A partition of high modularity has significant fewer edges, and is thus a valid “clustering” of the graph. The modularity of the graph is obtained by maximizing over partitions. (The normalization of modularity is chosen so that the partition generally has a constant number of sets.) This paper proves a concentration inequality for the modularity of a constant sample of uniform random edges. The modularity of a sample of \(\Theta(\varepsilon^{-5})\) edges is an additive \(\varepsilon\)-approximation of the overall modularity, whp.

News for September 2021

A nice array of papers for this month, ranging from distribution testing, dense graph property testing, sublinear graph algorithms, and various mixtures of them. Let us now sample our spread! (Adding another semistreaming paper who achieved similar results to another posted paper. -Ed)

A Lower Bound on the Complexity of Testing Grained Distributions by Oded Goldreich and Dana Ron (ECCC). A discrete distribution is called \(m\)-grained if all probabilities are integer multiples of \(1/m\). This paper studies the complexity of testing this property of distributions. For simplicity, consider the property of being \(n/2\)-grained, where the support size is \(n\). The classic lower bound for testing uniformity shows that \(\Omega(\sqrt{n})\) samples are required to distinguish the uniform distribution from a distribution uniform on \(n/2\) elements. Thus, we get a lower bound of \(\Omega(\sqrt{n})\) for testing \(n/2\)-grainedness (if I am permitted to use that word). This paper proves a lower bound of \(\Omega(n^c)\), for all constant \(c < 1\). It is conjectured that the lower bound is actually \(\Omega(n/\log n)\), which would match the upper bound (for any label-invariant property).

Testing Distributions of Huge Objects by Oded Goldreich and Dana Ron (ECCC). This paper introduced a new model that marries distribution testing with property testing on strings. The “object” of interest is a distribution \(\mathcal{D}\) over strings of length \(n\). We wish to test if \(\mathcal{D}\) possesses some property. The tester can get a random string \(x\) from the distribution, and can query any desired index of \(x\). The distance between distributions is defined using the earthmover distance (where we use the Hamming distance between strings). This model is called the DoHO (Distributions of Huge Objects) model. There are many questions posed and connections drawn to classical property testing and distribution testing. What I find interesting is a compelling application: the distribution \(\mathcal{D}\) may represent noisy or perturbed versions of a single object. The DoHO model gives a natural generalization of standard property testing to noisy objects. This paper considers problems such as testing if \(\mathcal{D}\) is: a random perturbation of a string, or a random cyclic shift, or a random isomorphism of a graph.

Sublinear Time and Space Algorithms for Correlation Clustering via Sparse-Dense Decompositions by Sepehr Assadi and Chen Wang (arXiv). Correlation clustering is a classic problem where edges in a graph are labeled ‘+’ or ‘-‘, denoting whether these edges should be uncut or cut. The aim is to cluster the graph minimizing the total “disagreements” (cut ‘+’ edges or uncut ‘-‘ edges). This paper gives an \(O(1)\)-approximation algorithm that runs in \(O(n\log^2n)\) time; this is the first sublinear time approximation algorithm for this problem. Correlation clustering has seen results for the property testing/sublinear algorithms community, first by Bonchi, Garcia Soriano, and Kutzkov. But previous results were essentially on the dense graph model, giving \(O(\epsilon n^2)\) error assuming adjacency matrix input. This paper considers access to the adjacency list of ‘+’ edges. Interestingly (from a technical standpoint), the key tool is a new sparse-dense decomposition. Such decompositions emerged from the seminal work of Assadi-Chen-Khanna for sublinear \((\Delta+1)\)-colorings, and it is great to see applications beyond coloring.

Sublinear-Time Computation in the Presence of Online Erasures by Iden Kalemaj, Sofya Raskhodnikova, and Nithin Varma (arXiv). Can property testing be done when portions of the input are hidden? This question was first raised by Dixit-Raskhodnikova-Thakurta-Varma, who gave a model of erasure-resilient testing. There is an adversary who hides (erases) part of the input function; queries to those parts just yield a dummy symbol. This paper defines an online version of this model. There is an erasure parameter \(t\). On each query by the property tester, the adversary can erase \(t\) values of the function. Consider the property of classic linearity of functions \(f:\{0,1\}^d \rightarrow \{0,1\}\). The BLR tester queries triples of pairs \((x,y, x \oplus y)\). Observe how this tester is easily defeated by our adversary, by erasing the value \(f(x\oplus y)\). One of the main results of this paper is a \(O(1/\varepsilon)\)-query tester for linearity, that works for any constant erasure parameter \(t\). Note that this matches the bound for the standard setting. There are a number of results for other classic properties, such as monotonicity (sortedness) and Lipschitz.

Sublinear Time Eigenvalue Approximation via Random Sampling by Rajarshi Bhattacharjee, Cameron Musco, and Archan Ray (arXiv). Consider the problem of estimating all the eigenvalues of a real, symmetric \(n \times n\) matrix \(M\) with bounded entries, in sublinear time. The main result shows that the eigenvalues of a uniform random \(O(\epsilon^{-4}\log n)\) principal submatrix can be used to approximate all eigenvalues of \(M\) up to additive error \(\epsilon n\). One can think of this as a sort of concentration inequality for eigenvalues. This result follows (and builds upon) work of Bakshi-Chepurko-Jayaram on property testing semidefiniteness. The key idea is that eigenvectors corresponding to large eigenvalues have small infinity norm: intuitively, since all entries in \(M\) are bounded, such an eigenvector must have its mass spread out among many coordinates. Hence, we can get information about it by randomly sampling a few coordinates. The paper also shows that approach of taking principal submatrices requires taking \(\Omega(\epsilon^{-2})\) columns/rows.

Deterministic Graph Coloring in the Streaming Model by Sepehr Assadi, Andrew Chen, and Glenn Sun (arXiv). This is technically not a sublinear algorithms paper (well ok, streaming is sublinear, but we tend not to cover the streaming literature. Maybe we should? – Ed.) But, I think that the connections are of interest to our community of readers. The main tool of sublinear \((\Delta+1)\)-coloring algorithm of Assadi-Chen-Khanna is the palette sparsification lemma (\(\Delta\) is the maximum degree). This lemma shows that vertices can randomly shorten their ”palette” of colors, after which all colorings from these palettes lead to very few monochromatic edges. This is an immensely powerful tool, since one can get immediately sublinear complexity algorithms in many models: adjacency list, streaming, distributed. Is the randomness necessary? Note that these algorithms run in \(\Omega(n)\) time/space, so it is conceivable that deterministic sublinear algorithms exists. This paper shows that randomization is necessary in the semi-streaming model (space \(O(n poly(\log n))\)). Indeed, there exist no deterministic semi-streaming algorithms that can achieve even \(\exp(\Delta^{o(1)})\) colorings.

Adversarially Robust Coloring for Graph Stream by Amit Chakrabarti, Prantar Ghosh, and Manuel Stoeckl (arXiv). This paper studies the same problem as the above, but presents the results in a different way. In a randomized algorithm, we normally think of an adversary that fixes a (hard) input, and the algorithm then makes its random decisions. An adaptive adversary is one that changes the input (stream) based on the decisions of an algorithm. In this definition, a robust algorithm is one that can give correct answers, even for adversarially generated output. A deterministic algorithm is automatically robust. This paper show that there do not exist semi-streaming algorithms that can achieve \((\Delta+1)\)-colorings. The quantitative lower bound is weaker (\(\Omega(\Delta^2)\) colors), but it is against a stronger adversary.

News for June 2021

A quieter month after last month’s bonanza. One (applied!) paper on distribution testing, a paper on tolerant distribution testing, and a compendium of open problems. (Ed: alas, I missed the paper on tolerant distribution testing, authored by one of our editors. Sorry, Clément!)

Learning-based Support Estimation in Sublinear Time by Talya Eden, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, and Tal Wagner (arXiv). A classic problem in distribution testing is that of estimating the support size \(n\) of an unknown distribution \(\mathcal{D}\). (Assume that all elements in the support have probability at least \(1/n\).) A fundamental result of Valiant-Valiant (2011) proves that the sample complexity of this problem is \(\Theta(n/\log n)\). A line of work has emerged in trying to reduce this complexity, with additional sources of information. Canonne-Rubinfeld (2014) showed that, if one can query the exact probabilities of elements, then the complexity can be made independent of \(n\). This paper studies a robust version of this assumption: suppose, we can get constant factor approximations to the probabilities. Then, the main result is that we can get a query complexity of \(n^{1-1/\log(\varepsilon^{-1})} \ll n/\log n\) (where the constant \(\varepsilon\) denotes the additive approximation to the support size). This paper also does empirical experiments to show that the new algorithm is indeed better in practice. Moreover, it shows that existing methods degraded rapidly with poorer probability estimates, while the new algorithm maintains its accuracy even with such estimates.

The Price of Tolerance in Distribution Testing by Clément L. Canonne, Ayush Jain, Gautam Kamath, and Jerry Li (arXiv). While we have seen many results in distribution testing, the subject of tolerance is one that hasn’t received as much attention. Consider the problem of testing if unknown distribution \(\mathcal{p}\) (over domain \([n]\)) is the same as known distribution \(\mathcal{q}\). We wish to distinguish \(\varepsilon_1\)-close from \(\varepsilon_2\)-far, under total variation distance. When \(\varepsilon_1\) is zero, this is the standard property testing setting, and classic results yield \(\Theta(\sqrt{n})\) sample complexity. If \(\varepsilon_1 = \varepsilon_2/2\), then we are looking for a constant factor approximation to the distance. And the complexity is \(\Theta(n/\log n)\). Surprisingly, nothing was known in better. Until this paper, that is. The main result gives a complete characterization of sample complexity (up to log factors), for all values of \(\varepsilon_1, \varepsilon_2\). Remarkably, the sample complexity has an additive term \((n/\log n) \cdot (\varepsilon_1/\varepsilon^2_2)\). Thus, when \(\varepsilon_1 > \sqrt{\varepsilon_2}\), the sample complexity is \(\Theta(n/\log n)\). When \(\varepsilon_1\) is smaller, the main result gives a smooth dependence on the sample complexity. One the main challenges is that existing results use two very different techniques for the property testing vs constant-factor approximation regimes. The former uses simpler \(\ell_2\)-statistics (e.g. collision counting), while the latter is based on polynomial approximations (estimating moments). The upper bound in this paper shows that simpler statistics based on just the first two moments suffice to getting results for all regimes of \(\varepsilon_1, \varepsilon_2\).

Open Problems in Property Testing of Graphs by Oded Goldreich (ECCC). As the title clearly states, this is a survey covering a number of open problems in graph property testing. The broad division is based on the query model: dense graphs, bounded degree graphs, and general graphs. A reader will see statements of various classic open problems, such as the complexity of testing triangle freeness for dense graphs and characterizing properties that can be tested in \(poly(\varepsilon^{-1})\) queries. Arguably, there are more open problems (and fewer results) for testing in bounded degree graphs, where we lack broad characterizations of testable properties. An important, though less famous (?), open problem is that of the complexity of testing isomorphism. It would appear that the setting of general graphs, where we know the least, may be the next frontier for graph property testing. A problem that really caught my eye: can we transform testers that work for bounded degree graphs into those that work for bounded arboricity graphs? The latter is a generalization of bounded degree that has appeared in a number of recent results on sublinear graph algorithms.