Author Archives: Nithin Varma

News for November 2023

November does not seem to have been a busy month for sublinear-time algorithms! As usual, please let us know if we missed any paper.

Nearly Optimal Bounds for Sample-Based Testing and Learning of k-Monotone Functions, by Hadley Black (arXiv). (This paper is from Oct, which we unfortunately missed. -Ed) Monotone functions are a common object of study in property testing. One of the big promises of property testing is that it is potentially easier than learning. Consider a monotone function \(f:\{0,1\}^n \to \{0,1\}\), and the problem of sampled-based monotonicity testing. This means that the tester only has access to uniform random samples. The classic Bshouty-Tamon Fourier spectrum result proves that that \(\exp(O(\sqrt{d}/\varepsilon))\) samples suffice to learn \(f\) up to error \(\varepsilon\). But could testing be significantly easier? The only non-trivial lower bound for sample-based testers was \(\Omega(\sqrt{2^d/\varepsilon})\) for \(\varepsilon \ll d^{-3/2}\). Until this paper. One of the main result is a lower bound of \(\exp(\Omega(\sqrt{d}/\varepsilon))\), which is optimal up to polynomial factors. Thus, with random samples, the learning and testing complexities are within polynomials of each other. Interestingly, sampled-based testing with one-sided error requires \(2^{\Omega(d)}\) queries, and is therefore harder than two-sided testing. That runs counter to most monotonicity testing results, wherein the one-sided and two-sided tester complexities are (roughly) the same. The paper also gives analogous results for the property of \(k\)-monotonicity, a generalization of monotonicity.

Testing Intersecting and Union-Closed Families, by Xi Chen, Anindya De, Yuhao Li, Shivam Nadimpalli, Rocco A. Servedio (arXiv). A function \(f:2^{[n]} \to \{0,1\}\) is monotone if \(f^{-1}(1)\) is a monotone set system. Testing monotonicity of functions of the form \(f:2^{[n]} \to \{0,1\}\) is a cornerstone of research in property testing and the work has resulted in novel insights on isoperimetry of the Boolean hypercube. Khot, Minzer and Safra (FOCS 2015; SICOMP 2018) gave a \(\tilde{\Theta}(\sqrt{n}/\epsilon^2)\)-query nonadaptive \(\epsilon\)-tester for monotonicity, which is complemented by a \(\tilde{\Omega}(n^{1/3})\) lower bound by Chen, Waingarten and Xie (STOC 2017). The current paper studies testing the properties of `intersectingness’ and union-closedness of the set system \(f^{-1}(1)\) defined by functions of the form \(f:2^{[n]} \to \{0,1\}\). They show that neither of these properties can have nonadaptive \(\epsilon\)-testers with \(\text{poly}(n,1/\epsilon)\) query complexity, which happens to be the case for monotonicity as discussed above. In addition to the strong lower bounds they exhibit, they also give nonadaptive one-sided error \(\epsilon\)-testers with query complexity \(\text{poly }(n^{\sqrt{n \log (1/\epsilon)}},1/\epsilon)\) for these properties.

Next, we make a leap to the world of quantum sublinear-time algorithms.

(Quantum) complexity of testing signed graph clusterability, by Kuo-Chin Chen, Simon Apers, and Min Hsieu Hsieh (arXiv). A signed graph is one where edges are either labeled positive or negative and such a graph is clusterable, if there is a way to partition the vertex set so that edges with endpoints in the same part are labeled positive and edges whose endpoints are in different parts are labeled negative. This paper proves two main results on property testing clusterability of signed graphs presented in the bounded degree model. First, they show that every classical tester for the problem requires \(\Omega(\sqrt{n})\) queries. Additionally, they also prove that there is a quantum algorithm for the same problem with query complexity \(\tilde{O}(n^{1/3})\), thereby implying that quantum algorithms are more powerful than their classical counterparts for the problem of clusterability testing.

Quantum speedups for linear programming via interior point methods, by Simon Apers and Sander Gibling (arXiv). This paper gives a quantum algorithm that, given a linear program over \(d\) variables containing \(n\) constraints, outputs a feasible solution with value at most \(\epsilon\) more than that of the optimal solution, and runs in time \(\tilde{O}(\sqrt{n} \cdot\text{ poly}\log(1/\epsilon))\) for constant \(d\), i.e., in time sublinear in the “height” of the program. Other quantum algorithms for solving linear systems in the same sense have running times that either depend on \(\text{poly}(1/\epsilon)\) or on the condition number of the linear system being solved.

News for June 2023

June was a somewhat quiet month for sublinear-time algorithms with only one paper coming to our attention.

Relaxed Local Correctability from Local Testing by Vinayak M. Kumar and Geoffrey Mon (arXiv). As the name indicates, the paper proposes a scheme to construct good relaxed locally correctable codes (RLCC) from good locally testable codes (LTC). A relaxed local decoder for an RLCC is a sublinear algorithm that, given query access to a corrupted word \(w\) of length \(n\) and an index \(i \in [n]\) as input, either returns \(c[i]\), where \(c\) is the closest codeword to \(w\), or returns a failure symbol \(\perp\). In the regime of constant distance and constant rate, the prior best known lower and upper bounds on the query complexity of RLCCs is \(\tilde{\Omega}(\sqrt{\log n})\) (Dall’Agnol, Gur, and Lachish; 2021) and \((\log n)^{O(\log \log \log n)}\) (Cohen and Yankovitz; 2022), respectively. Using their scheme to convert LTCs to RLCCs, this paper improves the state of the art by providing constant distance constant rate RLCCs with query complexity \(\log^{O(1)} n\).

News for February 2023

Despite being a short month, February 2023 has witnessed a significant amount of activity under the sublinear “regime”. Let us know if we have missed anything!

Dynamic \((1 + \epsilon)\)-Approximate Matching Size in Truly Sublinear Update Time by Sayan Bhattacharya, Peter Kiss, and Thatchaphol Saranurak (arXiv). This work throws light on connections between the dynamic and query models of computation and uses them for making advances on approximating the size of a maximum cardinality matching (MCM) in a general graph. In particular, as the main technical ingredient in obtaining an improved dynamic algorithm for maintaining an approximation to the size of MCM, the authors provide a \(\pm \epsilon n\) approximation algorithm for estimating the size of MCM in a general \(n\)-vertex graph by making \(n^{2 – \Omega_{\epsilon}(1)}\) adjacency queries. Prior to this result, the state of the art (Behnezhad, Roghani & Rubinstein; STOC’23) was a \(n^{2 – \Omega(1)}\)-query algorithm for the same problem with a multiplicative approximation guarantee of \(1.5\) and an additive guarantee of \(o(n)\).

Uniformity Testing over Hypergrids with Subcube Conditioning by Xi Chen and Cassandra Marcussen (arXiv). As the name indicates, the paper studies the fundamental problem of testing uniformity of distributions supported over hypergrids \([m]^n\). The tester that they present make \(O(\text{poly}(m)\sqrt{n}/\epsilon^2)\) queries to a conditional subcube sampling oracle, which, when given a subcube of \([m]^n\), returns a point sampled from the distribution conditioned on the point belonging to the subcube. The result is a generalization of the uniformity tester for distributions supported over the \(n\)-dimensional hypercube (Canonne, Chen, Kamath, Levi and Waingarten; SODA ’21).

Easy Testability for Posets by Panna Timea Fekete and Gabor Kun (arXiv). This paper deals with testing properties of directed graphs in the adjacency matrix model. The main characters of the story are posets, or directed acyclic graphs (DAGs) that are transitively closed. Given a family \(\mathcal{F}\) of finite posets, let \(\mathcal{P}_\mathcal{F}\) denote the set of all finite posets that do not contain any element of \(\mathcal{F}\) as a subposet. The main result of the paper is an \(\epsilon\)-tester with query complexity \(\text{poly}(1/\epsilon)\) for \(\mathcal{P}_\mathcal{F}\). The authors obtain this result by proving a removal lemma for posets. The result is placed in the larger context of understanding what properties of graphs can be tested with query complexity that has a polynomial dependence on \(1/\epsilon\) in the adjacency matrix model.

Compressibility-Aware Quantum Algorithms on Strings by Daniel Gibney and Sharma V. Thankachan (arXiv). Lastly, we have a paper on quantum string algorithms that run in sublinear time. In short, the authors present quantum algorithms with optimal running times for computing the Lempel-Ziv (LZ) encoding and Burrows Wheeler Transform (BWT) of highly compressible strings. A main consequence of these results is a faster quantum algorithm for computing the longest common subsequence (LCS) of two strings when the concatenation of the strings is highly compressible. It is to be noted that sublinear-time algorithms do not exist for these problems in the classical model of computation. More details follow.

Factoring a string into disjoint substrings (factors) in an specific manner is the main step in the LZ compression algorithm. The smaller the number of factors, the more compressible the string is. This paper gives a quantum algorithm for the problem of computing the LZ factorization of a string in time \(\tilde{O}(\sqrt{nz})\), where \(z\) is the number of factors in the string. They also show that their algorithm is optimal. Using this algorithm, they obtain a fast algorithm for computing the BWT of an input string, as well as an algorithm running in time \(\tilde{O}(\sqrt{nz})\) to compute the LCS of two strings, where \(n\) is the length and \(z\) is the number of factors in the concatenation of the two strings. When \(z\) is \(o(n^{1/3})\), this algorithm gives an improvement over the previous best quantum algorithm running in time \(\tilde{O}(n^{2/3})\).

News for October 2022

Yet another month that is kind of quiet! If we missed any work, please let us know in the comments.

Gaussian Mean Testing Made Simple, by Ilias Diakonikolas, Daniel Kane and Ankit Pensia (arXiv). Consider an unknown distribution distribution \(p\) over \(\mathbb{R}^d\) that we have sample access to. The paper studies the problem of determining whether \(p\) is a standard Gaussian with zero mean or whether it is a Gaussian with large mean. More formally, the task is to distinguish between the case that \(p\) is \(\mathcal{N}(0, I_d)\) and the case that \(p\) is a Gaussian of the form \(\mathcal{N}(\mu, \Sigma)\), where \(||\mu||_2 \geq \epsilon\) and \(\Sigma\) is an unknown covariance matrix. Canonne, Chen, Kamath, Levi and Weingarten (2021) gave a sample-optimal algorithm for this problem with sample complexity \(\Theta(\sqrt{d}/\epsilon^2)\) sample complexity. The current paper gives another sample-optimal algorithm for the same problem with a simpler analysis. In addition to being sample-optimal, the algorithm in the current paper also runs in time linear in the total sample size, which is an improvement over the work of Canonne et al.

Superpolynomial lower bounds for decision tree learning and testing, by Caleb Koch, Carmen Strassle and Li-Yang Tan (arXiv). Roughly speaking, the paper studies the problems of testing if a function has a low-depth decision tree and learning a low-depth decision tree approximating a function (provided that one such tree exists). In what follows, we summarize the testing results in the paper. Given an explicit representation of a function \(f:\{0,1\}^n \to \{0,1\}\) and access to samples from a known distribution \(\mathcal{D}\) over \(\{0,1\}^n\), one can aim to determine, with probability at least \(2/3\), if \(f\) has a decision tree of depth at most \(d\) or whether \(f\) is \(\epsilon\)-far from having a decision tree of depth at most \(d\log d\), where the distance is measured with respect to \(\mathcal{D}\). The paper shows that, under the randomized exponential time hypothesis, this problem cannot be solved in time \(\exp(d^{\Omega(1)})\). An immediate corollary is that the same lower bound holds for the problem of distribution-free testing of the property of having depth-\(d\) decision trees. The bound in the current paper is an improvement over the recent work of Blais, Ferreira Pinto Jr., and Harms (2021), who give a lower bound of \(\tilde{\Omega}(2^d)\) on the query complexity of testers for the same problem. However, the advantage of the latter result is that it is unconditional, as opposed to the result in the current paper.

News for June 2022

We have four papers this month — three on sublinear-time graph algorithms and one on distribution testing!

Beating Greedy Matching in Sublinear Time, by Soheil Behnezhad, Mohammad Roghani, Aviad Rubinstein, and Amin Saberi (arXiv). Designing sublinear-time algorithms to estimate the size of maximum matching in a graph is a well-studied problem. This paper gives the first \(\frac{1}{2} + \Omega(1)\) approximation algorithm that runs in time sublinear in the size of the input graph. Specifically, given a graph on \(n\) vertices and maximum degree \(\Delta\) in the adjacency list model, and a parameter \(\epsilon >0\), the algorithm runs in time \(\tilde{O}(n + \Delta^{1+\epsilon})\) and produces a \(\frac{1}{2} + f(\epsilon)\) approximation to the maximum matching for some function \(f\). It must be noted that a seminal work of Yoshida, Yamamoto and Ito (STOC, 2009) also gives a better than \(\frac{1}{2}\) approximation sublinear-time algorithm for the same problem. However, the result of Yoshida et al. requires assumptions on the maximum degree of the input graph. An additional point worth mentioning is that the authors do not believe that their techniques will yield an approximation guarantee better than \(0.51\), i.e., \(f(\epsilon) < 0.01\) for all \(\epsilon\).

Sublinear-Time Clustering Oracle for Signed Graphs, by Stefan Neumann and Pan Peng (arXiv). Consider a large signed graph on \(n\) vertices where vertices represent users of a social network and signed edges (+/-) denote the type of interactions (friendly or hostile) between users. Assume that the vertices of the social network can be partitioned into \(O(\log n)\) large clusters, where each cluster has a sparse cut with the rest of the graph. Further, each cluster is a minimal set (w.r.t. inclusion) that can be partitioned into roughly equal-sized opposing sub-communities, where a sub-community opposes another sub-community if most of the edges going across are negatively signed and most of the edges within the sub-communities are positively signed. This work provides a local oracle that, given probe access to a signed graph with such a hidden cluster structure, answers queries of the form “What cluster does vertex \(v\) belong to?” in time \(\tilde{O}(\sqrt{n} \cdot \text{poly}(1/\epsilon))\) per query. This result is a generalization of the same problem studied for unsigned graphs (Peng, 2020). The authors additionally show that their method works well in practice using both synthetic and real-world datasets. They also provide the first public real-world datasets of large signed graphs with a small number of large ground-truth communities having this property.

Sublinear Algorithms for Hierarchical Clustering, by Arpit Agarwal, Sanjeev Khanna, Huan Li, and Prathamesh Patil (arXiv). Consider a weighted graph \(G = (V,E,w)\), where the set \(V\) of vertices denotes datapoints and the weight \(w(e) > 0\) of edge \(e \in E\) denotes the similarity between the endpoints of \(e\). A hierarchical clustering of \(V\) is a tree \(T\) whose root is the set \(V\) and leaves are the singleton sets corresponding to individual vertices. An internal node of the tree corresponds to a cluster containing all the leaf vertices that are descendants of that node. A hierarchical clustering tree provides us with a scheme to cluster datapoints at multiple levels of granularity. The cost of a hierarchical clustering tree is \(\sum_{(u,v) \in E} |T_{u,v}| \cdot w(u,v)\), where \(T_{u,v}\) denotes the lowest common ancestor of the leaves \(u\) and \(v\). In this paper, the authors present sublinear algorithms for determining a hierarchical clustering tree with the minimum cost. In the query model with degree queries and neighbor queries to the graph, they give an algorithm that outputs an \(\tilde{O}(1)\)-approximate hierarchical clustering and makes \(\tilde{O}(n^{4-2\gamma})\) queries, when the number of edges \(m = \Theta(n^{\gamma})\) for \(1.5 \geq \gamma > 4/3\). When the input graph is sparse, i.e., \(\gamma \leq 4/3\), the algorithm makes \(\tilde{O}(\max\{n, m\})\) queries, and when the graph is dense, i.e., \(\gamma >1.5\), the algorithm makes \(\tilde{O}(n)\) queries. They complement their upper bounds with nearly tight lower bounds. In order to obtain their upper bounds, they design a sublinear-time algorithm for the problem of obtaining a weak cut sparsifier that approximates cuts sizes upto an additive term in addition to the usual multiplicative factor. They also design sublinear algorithms for hierarchical clustering in the MPC and streaming models of computation.

Sharp Constants in Uniformity Testing via the Huber Statistic, by Shivam Gupta and Eric Price (arXiv). This paper revisits the fundamental problem of uniformity testing — i.e., to decide whether an unknown distribution over \(n\) elements is uniform or \(\epsilon\)-far from uniform. This problem is known to be solvable optimally with probability at least \(1 – \delta\) using \(s = \Theta\left(\frac{\sqrt{n \log (1/\delta)}}{\epsilon^2} + \frac{\log (1/\delta)}{\epsilon^2}\right)\) independent samples from the unknown distribution. Multiple testers are known for the problem and they all compute a statistic of the form \(\sum_{i \in [n]} f(s_i)\), where \(s_i\) for \(i \in [n]\) and \(f\) is some function and make their decision based on whether or not the value of the statistic is above or below a threshold. For instance, the earliest known uniformity tester (Batu, Fortnow, Rubinfeld, Smith and White 2000; Goldreich and Ron 2011), also called the collisions tester, uses \(f(k) = \frac{k(k-1)}{2}\). The current paper proposes a new tester based on the Huber loss. For \(\beta > 0\), let \(h_\beta(x) := \min\{x^2, 2\beta x – \beta^2\}\). The statistic that the authors use in their test is defined by the function \(f(k) := k – s/n\), where \(s\) is the number of samples and \(n\) is the support size of the distribution. The authors show that their tester is better than all previously known testers as they achieve the best constants in the sample complexity.

News for February 2022

This month has seen a flurry of activity in sublinear algorithms and a diverse collection of papers have come up, with topics ranging from differentially private sublinear algorithms to local testers for multiplicity codes. Apologies to the readers for the delay in putting this post together!

Almost-Optimal Sublinear-Time Edit Distance in the Low Distance Regime by Karl Bringmann, Alejandro Cassis, Nick Fischer and Vasileios Nakos (arXiv)

This paper considers the problem of gap edit distance, i.e., of determining if the edit distance between two strings \(x\) and \(y\) is at most \(k\) or at least \(K\). Their main result is an algorithm that runs in time \(O(n/k + \text{poly}(k))\) and solves the problem for \(K = k \cdot 2^{\tilde{O}(\sqrt{\log k})}\). The paper improves upon earlier results of Goldenberg, Krauthgamer and Saha (2019) and Kociumaka and Saha (2020) who solved the problem for \(K = k^2\) with the same asymptotic guarantee on the query complexity.

One of the interesting takeaways from the paper is that the complexity of solving the gap Hamming distance and gap edit distance are similar in the low distance regime. For both, the complexity of solving the \((k, k^{1+o(1)})\)-gap problem is \(n/k^{1 \pm o(1)}\). This needs to be contrasted with the fact that solving \((k, \Omega(n))\)-gap edit distance requires \(\Omega(\sqrt{k})\) queries as shown by Batu, Ergun, Kilian, Magen and Raskhodnikova (2003), whereas \((k, \Omega(n))\)-gap Hamming distance can be solved in \(O(1)\) time.

These results are incomparable to those obtained by Goldenberg, Kociumaka, Krauthgamer and Saha (which was discussed in our November 2021 post), where they give a nonadaptive algorithm with complexity \(O(n/k^{1.5})\) for \((k, k^2)\)-gap edit distance problem. The algorithm in the present paper is adaptive and works faster for smaller values of \(k\).

Privately Estimating Graph Parameters in Sublinear time by Jeremiah Blocki, Elena Grigorescu, Tamalika Mukherjee (arXiv)

Differentially private approximation algorithms for optimization problems on graphs is a well-studied topic. This paper opens up an exciting research direction by initiating a systematic study of the design of differentially private sublinear-time algorithms. The setting is that graphs are viewed as databases and two graphs are neighboring if they differ in an edge (or a node). An algorithm \(A\) is \(\epsilon\)-differentially private if for every pair of edge-neighboring(or node-neighboring) graphs \(G, G’\) and for every subset \(S\) of outputs, \(\Pr[A(G) \in S] \leq \exp(\epsilon) \cdot \Pr[A(G’) \in S]\).

The paper presents \(\epsilon\)-differentially private sublinear-time algorithms for well-studied problems such as estimating the average degree, the size of a min vertex cover and the size of a maximum matching. These algorithms access the input graphs via neighbor queries and degree queries.

In addition to providing a strong privacy guarantee, their algorithms nearly match the approximation and complexity guarantees of their non-differentially private counterparts. The main idea seems to be the formalization of a sensitivity notion, which they refer to as Global Coupled Sensitivity, and bounding it for the known sublinear-time algorithms for the aforementioned problems. Finally, they add Laplace noise calibrated with this sensitivity value to the output of the algorithms to make them differentially private.

Testability and Local Certification of Monotone Properties in Minor-closed Classes by Louis Esperet And Sergey Norin (arXiv)

One of the major interests in graph property testing is to characterize which properties are testable, i.e, can be \(\epsilon\)-tested with query complexity that depends only on the parameter \(\epsilon\). The question of testability is well-understood in the dense graph model as well as the bounded degree model. This paper concerns itself with testability questions in the general model or the sparse model of graph property testing, where graphs are represented as adjacency lists with no bound on the maximum degree.

The authors prove that every monotone property of minor-closed graph classes is testable with one-sided error, where a property is monotone if it is closed under taking subgraphs and a graph class is minor-closed if it is closed under taking minors. A crucial fact to be noted here is that a tester is allowed to make only uniformly random nerighbor queries.

This result is a significant generalization of a 2019 result by Czumaj and Sohler, who proved that for every finite set of graphs \(\mathcal{H}\), every \(\mathcal{H}\)-free property of minor-closed graph classes is testable with one-sided error, where a graph satisfies \(\mathcal{H}\)-freeness if none of its subgraphs belong to \(\mathcal{H}\).

They show an interesting consequence of their results to designing a short local certification scheme for monotone properties of minor-closed graph classes. Roughly speaking, they show the existence of a prover-verifier system for the aforementioned testing problem where proofs of length \(O(\log n)\) are assigned to each vertex and that verifier needs to observe only the proofs assigned to a vertex and its neighbors.

The plane test is a local tester for Multiplicity Codes by Dan Karliner, Roie Salama, and Amnon Ta-Shma (ECCC)

Multiplicity codes are a generalization of Reed-Muller codes and was first studied by Guruswami and Wang (2013) and Kopparty, Saraf and Yekhanin (2014). The messages here are polynomials of degree \(d\) over \(m\) variables and the codeword corresponding to a polynomial \(p\) is the evaluation of \(p\) and of all of its directional derivatives of order upto \(s\) over all the points in \(\mathbb{F}_q^m\), where \(q\) is a prime power.

Even though multiplicity codes are known to be locally decodable, it was open whether they are locally testable. Local testers for Reed Muller codes work by restricting the evaluations to a uniformly random line in \(\mathbb{F}_q^m\) and checking whether it corresponds to the evaluations of a degree \(d\) univariate polynomial. The authors first show that such a tester does not work for the case of multiplicity codes when \(d\) is large. They then show that a plane test is a good local tester for multiplicity codes even for larger values of \(d\). Specifically, a plane test checks whether the restriction of a given word, which is purportedly the evaluation of a polynomial of degree \(d\) and of its derivatives, to a uniformly random plane in \(\mathbb{F}_q^m\) is a bivariate multiplicity code of degree \(d\).

We conclude the post with a short note from Nader Bshouty and Oded Goldreich on a fundamental characterization result in property testing.

On properties that are non-trivial to test by Nader H. Bshouty and Oded Goldreich (ECCC)

A property on binary strings is nontrivial if for infinitely many \(n\), the property contains at least one string of length \(n\) and at most \(2^{n – \Omega(n)}\) strings of length \(n\). The note shows that every nontrivial property requires \(\Omega(1/\epsilon)\) queries to \(\epsilon\)-test.