We have a rich bounty of papers this March (with a Feb 29 submission that got associated with this month). Local Computations Algorithms, Error Correction, PageRank computations, Quantum testing, Distribution testing, Graph property testing, and nice note on a statement we all knew (but never bothered to prove!). (Ed: We missed a paper on graph property testing, and updated the blurb on “A basic lower bound for property testing”.)
Average-Case Local Computation Algorithms by Amartya Shankha Biswas, Ruidi Cao, Edward Pyne, and Ronitt Rubinfeld (arXiv). Readers of this blog are likely familiar with Local Computation Algorithms (LCA) model. Imagine the input is (say) a large undirected graph \(G\), to which an algorithm has query access to. The aim is to give sublinear access to a large, linear sized output, such as a graph spanner. So each bit/coordinate/edge of the output should be computable with only sublinear access to \(G\) (these are called probes). There is a rich line of results for computing \(k\)-spanners, which are sparse subgraphs of \(G\) that approximate the shortest path distances up to a factor \(k\). Previous LCA lower bounds give a barrier of \(\Omega(\sqrt{n})\) probes per spanner query. To beat this bound, this paper introduces average-case LCAs. Imagine that \(G\) is generated according to graph distribution, like Erdős-Rényi, and the LCA only needs to succeed with high probability over the input. For such settings, this paper gives a number of LCAs for different parameter settings that beats the \(\sqrt{n}\) probe barrier. An additional concept introduced is that of joint sampling. Here, the LCA is expected to generate the random input \(G\) and gives access to the spanner. It is shown that one can beat the trivial bound obtained by just coupling together LCAs for input generation and spanner construction.
Local Correction of Linear Functions over the Boolean Cube by Prashanth Amireddy, Amik Raj Behera, Manaswi Paraashar, Srikanth Srinivasan, and Madhu Sudan (ECCC, arXiv). Speaking of LCAs, a classic example of an LCA is a local error correcting code (LCC). The input is a function \(f:\{0,1\}^n \to G\) where \(G\) is an Abelian group. Think of codewords as linear functions over this space. A \((\delta, q)\)-LCC would provide local access to the closest keyword to \(f\) with distance \(< \delta\) from the property of linear functions. Each coordinate of the closest keyword is obtained with \(q\) queries/probes to \(f\). Classic results provide a \((\Omega(1), O(1))\)-LCC when \(G = \mathbb{Z}_2\). The aim of this paper is to consider general range groups, such as the reals. Not much is known in this space for LCCs. This paper gives a \((1/4-\varepsilon, \widetilde{O}(\log n))\)-LCC for this setting. (Note that \(1/4\) is the decoding radius.) Most previous LCC results use “common” symmetries among the (hypercube) domain and range, while this result works when the range is totally unrelated to the domain. Moreover the \(\widetilde{O}(\log n)\) query complexity is nearly matches the latex \(\Omega(\log n)\)-query lower bound for this problem.
Revisiting Local Computation of PageRank: Simple and Optimal by Hanzhi Wang, Zhewei Wei, Ji-Rong Wen, and Mingji Yang (arXiv). PageRank is one of the fundamental computations in modern network science. Given a (possibly) directed graph \(G = (V,E)\), PageRank defines a “truncated random walk” that works as follows. It starts from the uniform distribution. At each step, independently with probability \(\alpha\), the walk just stops. With the remaining probability, it performs a single random walk step by going to a random (out) neighbor. (This definition is equivalent to the stationary distribution of the random walk Markov Chain with a teleport probability \(\alpha\).) The PageRank value of vertex \(t\), denoted \(\pi(t)\), is the probability of ending at \(t\). The obvious algorithm for estimating \(t\) is to simply perform a bunch of random walks, requiring \(\Theta(1/\pi(t))\) to get non-trivial estimates. The average PageRank value is \(1/n\), so this leads to a linear time algorithm for an average vertex. This is a rich history of work on beating this bound, and getting a sublinear query complexity for (almost) all vertices. The approach is to get an estimator with running time (ideally) \(O(n \pi(t))\), By combining both estimators, we get \(O(\sqrt{n})\) time algorithms for estimating PageRank. This paper gets as close as possible to this bound, and achieves (the almost optimal) bound of \(O(\sqrt{n} \cdot m^{1/4})\). The actual bound is more involved and depends on the maximum in and out degrees. It is worth noting that random walks on directed graphs are nasty objects, so getting these algorithms is extremely challenging. Moreover, this paper nails down the upper and lower bounds in terms of the \(n, m\), and the max degrees.
Efficient Algorithms for Personalized PageRank Computation: A Survey by Mingji Yang, Hanzhi Wang, Zhewei Wei, Sibo Wang, and Ji-Rong Wen (arXiv). This survey isn’t meant for a TCS audience per se, but has bearing to the previous paper. And it has many explicit mentions to sublinear algorithms for PageRank computations. This survey focuses more on Personalized PageRank (PPR), wherein the walks starts from a single source vertex. Section 7 talks about sublinear algorithms for estimating individual PPR values, and discusses the techniques involved in these algorithms. This is a good survey for getting a sense of the interesting problems in estimating (P)PR values.
New Graph and Hypergraph Container Lemmas with Applications in Property Testing by Eric Blais and Cameron Seth (arXiv). The container method is a combinatorial approach that is now seeing many property testing applications. The best way to motivate this is to consider the classic problem of property testing if a dense graph \(G\) has a clique of size \(\rho n\). The canonical tester samples a small set of vertices (of size \(f(\rho)/poly(\varepsilon)\), where \(f(\cdot)\) is some explicit function), computes the largest clique in this set, and then extrapolates that to guess the largest clique size in \(G\). If \(G\) has a clique of size at least \(\rho n\) (the YES case), the analysis is an easy exercise. If \(G\) is \(\varepsilon\)-far from having a large clique (the NO case), the analysis needs to deal with probability that small cliques in \(G\) lead to large cliques in the sample (just by random deviation). This argument needs a union bound over “all possible (small) cliques” of \(G\). And this is always the hard part. The container method is about proving that there is a small (polynomial) number of “containers”, that contain all small cliques. The union bound then works out with stronger parameters. This was introduced to property testing in a beautiful, previous work of the authors, which led to substantial improvements for many property testing problems over dense graphs. In this paper, they generalize the container method to more settings, and prove stronger property testing bounds for satisfiability and \(k\)-colorability.
Hamiltonian Property Testing by Andreas Bluhm, Matthias C. Caro, and Aadil Oufkir (arXiv). This paper is on property testing the quantum notion of Hamiltonian locality. This description will only highlight my poor knowledge of quantum physics, but let me try. A Hamiltonian is an operator on \(n\) qubits, each of which should be thought of as a complex number with magnitude \(1\) (you can think of this a 2D vector, with each axis representing a classical bit). On a single qubit, there are four possible operations: identity, negate the complex part, switch the real and complex parts, or negate the complex and switch real/complex parts. These operations can be represented as \(2 \times 2\) matrices (the qubit is a 2D vector), and form a basis. We can generalize this basis to \(n\) qubits as follows. Pick a string in \(\{0,1,2,3\}^n\), where \(0\) indicates the identity operation, and the others map to the three non-trivial qubit transforms above. Apply the corresponding operation to the corresponding qubit. Mathematically, we represent this operation as a tensor product of the corresponding \(2 \times 2\) matrices. These operators form the Pauli basis; the locality of a basis operator is the number of non-trivial qubit operators used. Any Hamiltonian can be written in the Pauli basis, and it is called \(k\)-local if it is spanned by at most \(k\)-local basis operators. This paper sets up a natural property testing question. Distinguish the input Hamilton being \(k\)-local from the input being \(\varepsilon\)-far from being \(k\)-local. Access to the Hamiltonian (and exponentially large object to “write down”) is achieved by applying the operator/transformation to a vector of qubits. If distance between operators is measured by \(l_\infty\)-norm, then the property testing problem requires exponentially many queries to the Hamiltonian. The main result is that if distance is measure by \(l_2\)-norm, then only polynomially many queries suffice to test \(k\)-locality, for constant \(k\).
Equivalence Testing: The Power of Bounded Adaptivity by Diptarka Chakraborty, Sourav Chakraborty, Gunjan Kumar, Kuldeep S. Meel (arXiv). We all know the problem of equivalence/identity testing of distributions. This work is on the conditional sampling model. Consider two input distributions \(\mathcal{P}\) and \(\mathcal{Q}\) over domain \([n]\). For any subset \(S\) of the domain, we can generate a sample from the input distributions, conditioned on being in \(S\). In this model, existing results show that equivalence (\(\mathcal{P} = \mathcal{Q}\)) can property tested using \(O(\log\log n)\) samples (ignoring \(\varepsilon\) dependencies), a significant demonstration of the power of conditional sampling. But this is a sequential, and hence adaptive, algorithm. The best known bound for non-adaptive adaptive algorithms is \(O(\log^{12}n)\), while the known lower bound is \(\Omega(\log n)\). This paper shows that with a single round of adaptivity, one can get a \(\widetilde{O}(\log n)\)-query equivalence tester. This result opens up a rich source of problems for the conditional sampling model, wherein we look at the power of bounded adaptivity.
On the query complexity of testing local graph properties in the bounded-degree graph model by Oded Goldreich (arXiv). Consider property testing in bounded degree graph model. The input graph \(G = ([n], E)\) is represented as an adjacency list. In the dense graph setting, a vast majority of “interesting” properties are testable in time independent to graph size. For bounded-degree graphs, the story is much more complicated. A fundamental question is to characterize such “constant-time” testable properties. A local property (in this context) is one that is basically defined as subgraph freeness, with some augmentations by vertex degree. Previous work suggested that all local properties can be tested in time independent on graph size. This conjecture was subsequently refuted, but with a non-constructive argument that does not give an explicit query lower bound. This paper overcomes the non-constructive barrier, and gives a query lower bound of \(\Omega(\log^{(4)} n)\) for an explicit local property (\(\log^{(i)}\) is the iterated log). In a slightly weaker model, where the algorithm is not given the exact size of \(G\), the lower bound can be significantly improved to \(\Omega(\log n)\).
A basic lower bound for property testing by Eldar Fischer (arXiv). We end with a fundamental statement that you must have known. In any (discrete) setting, consider some non-trivial property \(\mathcal{P}\) with distance measured by Hamming distance. The setting that covers almost all property testing settings. Of course, any property tester requires \(\Omega(1/\varepsilon)\) queries. Intuitively, if a tester makes less \(O(1/\varepsilon)\) input queries, it might not “touch” the portion of the input that helps it in the distinguishing task. This basic fact has not been proven explicitly, and this note resolves that issue. The short paper gives a full self-contained proof of this fact, and does so without resorting to Yao’s minimax lemma. The proof is not difficult, but needs some care because the tester could be adaptive. Great read for students. (Further addendum: Nader Bshouty and Oded Goldreich pointed out that a previous note by Bshouty and Goldreich also proves such a lower bound. These results differ on the precise notation of a “non-trivial property”. Fischer’s definition is more direct and slightly stronger than the Bshouty-Goldreich definition. The Bshouty-Goldreich proof uses a construction of a “sufficiently separated” set of strings, which is different from the proof in this note. Again, both proofs are excellent reads for students to learn more about property testing. These results give a good perspective on crafting the right definitions.)