Category Archives: Monthly digest

News for April 2024

We have seven papers for you this month. Our potpourri includes two papers apiece on each of the following themes: Distribution Testing, property testing with a quantum twist, graph property testing, and finally two papers on testing function properties. A featured paper this month covers progress on optimal non-adaptive junta testers. Without further ado, let’s dig in. As usual, please let us know if I missed any papers.

Testing \(C_k\) freeness in bounded-arboricity graphs by Talya Eden, Reut Levi, and Dana Ron (arXiv) Our post from July 2021 highlighted an open problem posed by Goldreich. This problem asks if it is possible to transform property testers for bounded degree graphs to property testers for unbounded degree graphs with general arboricity. The featured paper answers the question Goldreich posed in the negative. In particular, testing \(C_4\) freeness in bounded arboricity graphs (with possibly unbounded degree) already admits an \(\Omega(n^{1/4})\) one-sided lowerbound. Up to \(\log\) factors, the paper also proves a matching upperbound. The same bounds hold for \(C_5\)-freeness. Further, for every \(k \geq 6\), the paper proves an \(\Omega(n^{1/3})\) one-sided lower-bound.

Directed Isoperimetry and Monotonicity Testing: A Dynamical Approach by Renato Ferreira Pinto Jr (arXiv) The featured paper considers a classic of property testing. So, you want to test whether an input real-valued function \(f \colon [0,1]^d \to \mathbb{R}\) over the solid cube is monotone or whether it is \(\varepsilon\)-far from being monotone. It is convenient to require that the input function better be differentiable and Lipschictz (for those of you keeping scores, this should remind you of our post from July 2023 where we covered a previous work in the same setting again by Renato). Motivated by the success of testers developed for the boolean hypercube domain, a natural analogy suggests to have in addition to a standard value oracle, an additional derivative oracle which takes as input a point \(x \in [0,1]^d\) and a direction \(\mathbf{v} \in \mathbb{R}^d\). This oracle allows you to query directional derivative at \(x\) along \(\mathbf{v}\). Renato’s punchline is a directed Poincare Inequality which in the above setup, connects the distance to monotonicity to the square root of directed analog of a suitable notion of influence. The techniques used in the paper seem intriguing. They are inspired by the original proof of the classic Poincare Inequality. In particular, as Renato notes “the main theme of our proof of this result is the study of the convergence of a partial differential equation.”

Optimal Non-Adaptive Tolerant Junta Testing via Local Estimators by Shivam Nadimpalli, Shyamal Patel (arXiv) The problem of Tolerant Junta testing is no stranger to our regular readers. This paper is fairly intriguing for those of you who care about testing function properties. As the abstract notes, this is the first paper to nail down the true (non-adaptive) query complexity for tolerantly testing some natural property of boolean functions. The main result of this paper presents a lower bound of \(2^{\widetilde{\Omega}(\sqrt{k \log(1/\varepsilon)})}\) on the task of non-adaptive tolerant \(k\)-junta testing. The paper presents an almost matching non-adaptive algorithm as well.

Testing Intersectingness of Uniform Families by Ishay Haviv and Michal Parnas (arXiv) Let \(\mathcal{F}\) denote an intersecting family all sets in which are subsets of an underlying \(n\)-element universe. This means that for any \(F_1, F_2 \in \mathcal{F}\), you have that \(F_1 \cap F_2 \neq \emptyset\). Some of you might immediately recall Erdos-Ko-Rado theorem which asserts an upperbound on the size of such an intersecting family where every set has size \(k\). Another famous result is Lovasz’s (positive) resolution of Kneser’s conjecture which asserts an lowerbound on the number of intersecting families you need to cover all \(k\)-subsets of \([n]\). For ease of discussion, let us follow the authors and cook up a property testing problem \(\textsf{INTERSECTING}_{n,k, \varepsilon}\). In this problem, you are given access to the indicator \(f \colon {[n] \choose k} \to \{0,1\}\) encoding the family \(\mathcal{F}\) and you ask whether \(\mathcal{F}\) is intersecting or whether it is \(\varepsilon\)-far from it. Recently, Chen-De-Li-Nadimpalli-Servedio explored the non-uniform-set-size variant of this problem (which we covered here). They presented one-sided algorithms with a non-adaptive query bound of \(2^{\widetilde{O}(\sqrt{n \log(1/\varepsilon)})}\) for this problem and they also showed an almost matching lowerbound. The featured paper contrasts these results with the situation you obtain when all the set sizes are indeed the same (that is, the paper explores the testability of \(\textsf{INTERSECTING}_{n,k, \varepsilon}\). Of the numerous results in the paper, let me highlight one: you can test \(\textsf{INTERSECTING}_{n,k, \varepsilon}\) with two-sided error with a mere \(O\big(\log n \cdot \frac{1}{\varepsilon} \big)\) queries. This result holds for \(\varepsilon \geq \Omega(k/n)^r\) for a fixed \(r\).

Distribution-Free Testing of Decision Lists with a Sublinear Number of Queries by Xi Chen, Yumou Fei and Shyamal Patel (arXiv) Decision lists are a popular and convenient way to represent a boolean function which is learnable from examples. In more detail, a decision list is a collection of pairs \((\alpha_1, \beta_1), (\alpha_2, \beta_2), \ldots (\alpha_k, \beta_k)\) (here, \(\alpha_j\)’s denote literals and \(\beta_j\)’s are bits). This list defines a boolean function \(f \colon \{0,1\}^n \to \{0,1\}\) as follows: for any \(\mathbf{x} \in \{0,1\}^n\), you let \(f(x) = \beta_j\) where \(j\) is the smallest index such that \(\alpha_j\) is satisfied by \(\mathbf{x}\). With this (not super rigorous) definition out of the way, I can now tell you about the main result of this featured paper. The main theorem of this paper asserts that in the distribution-free framework for property testing, you still get sublinear time algorithms for testing decision lists. In particular, thanks to this paper, now you can engineer a two-sided adaptive distribution free algorithm for testing decision lists which makes runs in time \(\widetilde{O}(n^{11/12})\).

Simple algorithms to test and learn local Hamiltonians by Francisco Escudero Gutiérrez (arXiv) The featured paper considers the task of (tolerantly) testing and learning an \(n\)-qubit \(k\)-local Hamiltonian from queries to its evolution operator. The main result of the paper asserts that the task of Tolerant Hamiltonian Locality Testing can be done with a mere \(1/(\varepsilon_2 – \varepsilon_1)^8\) queries to the evolution operator.

Local Test for Unitarily Invariant Properties of Bipartite Quantum States by Kean Chen, Qisheng Wang, Zhicheng Zhang (arXiv) Lots of investigations on quantum entanglement consider bipartite quantum states. The featured paper considers the task of testing properties of bipartite pure states. The paper begins by recalling a helpful duality between a property of bipartite pure states being unitarily invariant on one part, and the property being locally testable on the other part. This duality does not offer any insights into the query complexity of the local tester. The main result of the paper proves that the local tester indeed achieves optimal sample complexity over all global testers.

Learning Intersections of Halfspaces with Distribution Shift: Improved Algorithms and SQ Lower Bounds by Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan (arXiv) Consider the standard setup of supervised learning with a twist. Namely, the distribution from which you receive samples in the training phase (\(\mathcal{D}\)) and the distribution from which the samples are taken in the test phase (\(\mathcal{D}’\)) are not necessarily the same (and hence the name — distribution shift). For example, this situation may arise if you use one set of equipments in the training phase and another during the test phase. This model was explored in a previous work by the same set of authors where they considered the task of obtaining a low-error classifier (on \(\mathcal{D}’\)) when they are additionally told that the training samples pass some helpful test. In this paper, the authors explore the problem of testing intersections of halfspaces with respect to Gaussian Training Distributions. The main contribution of the paper is a set of new positive results which extend the results from PAC learnability to the learnability under the new model. Indeed, under some reasonably mild assumptions, the bounds in the new model match the bounds from the standard PAC model. For quantitative details, please refer to the paper.

News for March 2024

We have a rich bounty of papers this March (with a Feb 29 submission that got associated with this month). Local Computations Algorithms, Error Correction, PageRank computations, Quantum testing, Distribution testing, Graph property testing, and nice note on a statement we all knew (but never bothered to prove!). (Ed: We missed a paper on graph property testing, and updated the blurb on “A basic lower bound for property testing”.)

Average-Case Local Computation Algorithms by Amartya Shankha Biswas, Ruidi Cao, Edward Pyne, and Ronitt Rubinfeld (arXiv). Readers of this blog are likely familiar with Local Computation Algorithms (LCA) model. Imagine the input is (say) a large undirected graph \(G\), to which an algorithm has query access to. The aim is to give sublinear access to a large, linear sized output, such as a graph spanner. So each bit/coordinate/edge of the output should be computable with only sublinear access to \(G\) (these are called probes). There is a rich line of results for computing \(k\)-spanners, which are sparse subgraphs of \(G\) that approximate the shortest path distances up to a factor \(k\). Previous LCA lower bounds give a barrier of \(\Omega(\sqrt{n})\) probes per spanner query. To beat this bound, this paper introduces average-case LCAs. Imagine that \(G\) is generated according to graph distribution, like Erdős-Rényi, and the LCA only needs to succeed with high probability over the input. For such settings, this paper gives a number of LCAs for different parameter settings that beats the \(\sqrt{n}\) probe barrier. An additional concept introduced is that of joint sampling. Here, the LCA is expected to generate the random input \(G\) and gives access to the spanner. It is shown that one can beat the trivial bound obtained by just coupling together LCAs for input generation and spanner construction.

Local Correction of Linear Functions over the Boolean Cube by Prashanth Amireddy, Amik Raj Behera, Manaswi Paraashar, Srikanth Srinivasan, and Madhu Sudan (ECCC, arXiv). Speaking of LCAs, a classic example of an LCA is a local error correcting code (LCC). The input is a function \(f:\{0,1\}^n \to G\) where \(G\) is an Abelian group. Think of codewords as linear functions over this space. A \((\delta, q)\)-LCC would provide local access to the closest keyword to \(f\) with distance \(< \delta\) from the property of linear functions. Each coordinate of the closest keyword is obtained with \(q\) queries/probes to \(f\). Classic results provide a \((\Omega(1), O(1))\)-LCC when \(G = \mathbb{Z}_2\). The aim of this paper is to consider general range groups, such as the reals. Not much is known in this space for LCCs. This paper gives a \((1/4-\varepsilon, \widetilde{O}(\log n))\)-LCC for this setting. (Note that \(1/4\) is the decoding radius.) Most previous LCC results use “common” symmetries among the (hypercube) domain and range, while this result works when the range is totally unrelated to the domain. Moreover the \(\widetilde{O}(\log n)\) query complexity is nearly matches the latex \(\Omega(\log n)\)-query lower bound for this problem.

Revisiting Local Computation of PageRank: Simple and Optimal by Hanzhi Wang, Zhewei Wei, Ji-Rong Wen, and Mingji Yang (arXiv). PageRank is one of the fundamental computations in modern network science. Given a (possibly) directed graph \(G = (V,E)\), PageRank defines a “truncated random walk” that works as follows. It starts from the uniform distribution. At each step, independently with probability \(\alpha\), the walk just stops. With the remaining probability, it performs a single random walk step by going to a random (out) neighbor. (This definition is equivalent to the stationary distribution of the random walk Markov Chain with a teleport probability \(\alpha\).) The PageRank value of vertex \(t\), denoted \(\pi(t)\), is the probability of ending at \(t\). The obvious algorithm for estimating \(t\) is to simply perform a bunch of random walks, requiring \(\Theta(1/\pi(t))\) to get non-trivial estimates. The average PageRank value is \(1/n\), so this leads to a linear time algorithm for an average vertex. This is a rich history of work on beating this bound, and getting a sublinear query complexity for (almost) all vertices. The approach is to get an estimator with running time (ideally) \(O(n \pi(t))\), By combining both estimators, we get \(O(\sqrt{n})\) time algorithms for estimating PageRank. This paper gets as close as possible to this bound, and achieves (the almost optimal) bound of \(O(\sqrt{n} \cdot m^{1/4})\). The actual bound is more involved and depends on the maximum in and out degrees. It is worth noting that random walks on directed graphs are nasty objects, so getting these algorithms is extremely challenging. Moreover, this paper nails down the upper and lower bounds in terms of the \(n, m\), and the max degrees.

Efficient Algorithms for Personalized PageRank Computation: A Survey by Mingji Yang, Hanzhi Wang, Zhewei Wei, Sibo Wang, and Ji-Rong Wen (arXiv). This survey isn’t meant for a TCS audience per se, but has bearing to the previous paper. And it has many explicit mentions to sublinear algorithms for PageRank computations. This survey focuses more on Personalized PageRank (PPR), wherein the walks starts from a single source vertex. Section 7 talks about sublinear algorithms for estimating individual PPR values, and discusses the techniques involved in these algorithms. This is a good survey for getting a sense of the interesting problems in estimating (P)PR values.

New Graph and Hypergraph Container Lemmas with Applications in Property Testing by Eric Blais and Cameron Seth (arXiv). The container method is a combinatorial approach that is now seeing many property testing applications. The best way to motivate this is to consider the classic problem of property testing if a dense graph \(G\) has a clique of size \(\rho n\). The canonical tester samples a small set of vertices (of size \(f(\rho)/poly(\varepsilon)\), where \(f(\cdot)\) is some explicit function), computes the largest clique in this set, and then extrapolates that to guess the largest clique size in \(G\). If \(G\) has a clique of size at least \(\rho n\) (the YES case), the analysis is an easy exercise. If \(G\) is \(\varepsilon\)-far from having a large clique (the NO case), the analysis needs to deal with probability that small cliques in \(G\) lead to large cliques in the sample (just by random deviation). This argument needs a union bound over “all possible (small) cliques” of \(G\). And this is always the hard part. The container method is about proving that there is a small (polynomial) number of “containers”, that contain all small cliques. The union bound then works out with stronger parameters. This was introduced to property testing in a beautiful, previous work of the authors, which led to substantial improvements for many property testing problems over dense graphs. In this paper, they generalize the container method to more settings, and prove stronger property testing bounds for satisfiability and \(k\)-colorability.

Hamiltonian Property Testing by Andreas Bluhm, Matthias C. Caro, and Aadil Oufkir (arXiv). This paper is on property testing the quantum notion of Hamiltonian locality. This description will only highlight my poor knowledge of quantum physics, but let me try. A Hamiltonian is an operator on \(n\) qubits, each of which should be thought of as a complex number with magnitude \(1\) (you can think of this a 2D vector, with each axis representing a classical bit). On a single qubit, there are four possible operations: identity, negate the complex part, switch the real and complex parts, or negate the complex and switch real/complex parts. These operations can be represented as \(2 \times 2\) matrices (the qubit is a 2D vector), and form a basis. We can generalize this basis to \(n\) qubits as follows. Pick a string in \(\{0,1,2,3\}^n\), where \(0\) indicates the identity operation, and the others map to the three non-trivial qubit transforms above. Apply the corresponding operation to the corresponding qubit. Mathematically, we represent this operation as a tensor product of the corresponding \(2 \times 2\) matrices. These operators form the Pauli basis; the locality of a basis operator is the number of non-trivial qubit operators used. Any Hamiltonian can be written in the Pauli basis, and it is called \(k\)-local if it is spanned by at most \(k\)-local basis operators. This paper sets up a natural property testing question. Distinguish the input Hamilton being \(k\)-local from the input being \(\varepsilon\)-far from being \(k\)-local. Access to the Hamiltonian (and exponentially large object to “write down”) is achieved by applying the operator/transformation to a vector of qubits. If distance between operators is measured by \(l_\infty\)-norm, then the property testing problem requires exponentially many queries to the Hamiltonian. The main result is that if distance is measure by \(l_2\)-norm, then only polynomially many queries suffice to test \(k\)-locality, for constant \(k\).

Equivalence Testing: The Power of Bounded Adaptivity by Diptarka Chakraborty, Sourav Chakraborty, Gunjan Kumar, Kuldeep S. Meel (arXiv). We all know the problem of equivalence/identity testing of distributions. This work is on the conditional sampling model. Consider two input distributions \(\mathcal{P}\) and \(\mathcal{Q}\) over domain \([n]\). For any subset \(S\) of the domain, we can generate a sample from the input distributions, conditioned on being in \(S\). In this model, existing results show that equivalence (\(\mathcal{P} = \mathcal{Q}\)) can property tested using \(O(\log\log n)\) samples (ignoring \(\varepsilon\) dependencies), a significant demonstration of the power of conditional sampling. But this is a sequential, and hence adaptive, algorithm. The best known bound for non-adaptive adaptive algorithms is \(O(\log^{12}n)\), while the known lower bound is \(\Omega(\log n)\). This paper shows that with a single round of adaptivity, one can get a \(\widetilde{O}(\log n)\)-query equivalence tester. This result opens up a rich source of problems for the conditional sampling model, wherein we look at the power of bounded adaptivity.

On the query complexity of testing local graph properties in the bounded-degree graph model by Oded Goldreich (arXiv). Consider property testing in bounded degree graph model. The input graph \(G = ([n], E)\) is represented as an adjacency list. In the dense graph setting, a vast majority of “interesting” properties are testable in time independent to graph size. For bounded-degree graphs, the story is much more complicated. A fundamental question is to characterize such “constant-time” testable properties. A local property (in this context) is one that is basically defined as subgraph freeness, with some augmentations by vertex degree. Previous work suggested that all local properties can be tested in time independent on graph size. This conjecture was subsequently refuted, but with a non-constructive argument that does not give an explicit query lower bound. This paper overcomes the non-constructive barrier, and gives a query lower bound of \(\Omega(\log^{(4)} n)\) for an explicit local property (\(\log^{(i)}\) is the iterated log). In a slightly weaker model, where the algorithm is not given the exact size of \(G\), the lower bound can be significantly improved to \(\Omega(\log n)\).

A basic lower bound for property testing by Eldar Fischer (arXiv). We end with a fundamental statement that you must have known. In any (discrete) setting, consider some non-trivial property \(\mathcal{P}\) with distance measured by Hamming distance. The setting that covers almost all property testing settings. Of course, any property tester requires \(\Omega(1/\varepsilon)\) queries. Intuitively, if a tester makes less \(O(1/\varepsilon)\) input queries, it might not “touch” the portion of the input that helps it in the distinguishing task. This basic fact has not been proven explicitly, and this note resolves that issue. The short paper gives a full self-contained proof of this fact, and does so without resorting to Yao’s minimax lemma. The proof is not difficult, but needs some care because the tester could be adaptive. Great read for students. (Further addendum: Nader Bshouty and Oded Goldreich pointed out that a previous note by Bshouty and Goldreich also proves such a lower bound. These results differ on the precise notation of a “non-trivial property”. Fischer’s definition is more direct and slightly stronger than the Bshouty-Goldreich definition. The Bshouty-Goldreich proof uses a construction of a “sufficiently separated” set of strings, which is different from the proof in this note. Again, both proofs are excellent reads for students to learn more about property testing. These results give a good perspective on crafting the right definitions.)

News for February 2024

February this year was slightly superlinear, with 29 days instead of the usual 28. As a result… 5 property testing papers! (Including one overlooked from January),

Testing Calibration in Subquadratic Time, by Lunjia Hu, Kevin Tian, and Chutong Yang (arXiv). The authors consider the question of model calibration, where a binary predictor is said to be calibrated if \(\mathbb{E}[ y\mid v=t ] = t\) for all \(t\), where \(y\) is the observed outcome and \(v\) is the prediction. This notion, central to algorithmic fairness, comes with a host of challenges: one of them being to assess whether a given predictor is indeed calibrated, and quantifying by how much it deviates from it. Following work by Błasiok, Gopalan, Hu, and Nakkiran which introduced a notion of distance to calibration, the paper defines the (property testing) task of calibration testing, with connections to distribution testing, and provides subquadratic-time algorithms (in the sample complexity) for the task. The authors also obtain analogous results for tolerant calibration testing, which they also introduce.

The role of shared randomness in quantum state certification with unentangled measurements, by Jayadev Acharya and Yuhan Liu (arXiv). In this paper (from January), the authors consider the following question, the quantum analogue of identity testing from the classical distribution testing world: what is the copy complexity (≈sample complexity) of certifying (≈testing) whether an unknown quantum state (≈quantum analogue of a probability distribution) is equal to a known, reference quantum state? And, crucially, what about doing this when our quantum hands are tied, i.e., without using entanglement — but possibly with adaptive measurements? This is not a new question, and we previously covered a couple papers on this in April 2020 and Feb 2021. What is new here is that the authors show it’s not about adaptivity! Mirroring what happens in the classical (distributed) case, the key here turns out to be shared randomness: that is, whether the measurements are made independently (in which case \(\Theta(d^2)\) copies are necessary and sufficient), or chosen randomly but jointly (in which case the copy complexity is \(\Theta(d^{3/2})\)).

Low Acceptance Agreement Tests via Bounded-Degree Symplectic HDXs, by Yotam Dikstein, Irit Dinur, and Alexander Lubotzky (ECCC) and Constant Degree Direct Product Testers with Small Soundness, by Mitali Bafna, Noam Lifshitz, Dor Minzer (ECCC). [Two independent works]

Let \(X\) be a (small) set of \(k\)-element subsets of \([n]\), and \(\{f_S\colon S\to \Sigma\}_{S\in X}\) a family of partial functions. Is there a way to “stitch together” all the functions \(f_S\) into a global one \(G\colon X \to \Sigma\)? A testing algorithm for this is called an agreement test, and the most natural goes as follows: pick \(S,T\in X\) at random (say, with fixed, small intersection), and accept if, and only if, \(f_{S}, f_T\) agree on \(S\cap T\). Does this work? In which parameter regime (i.e., how does the acceptance probability \(\varepsilon\) relate to the closeness-to-a-global-function-\(G\)? How large does \(X\) need to be? The two papers both show that the above agreement test works in the small soundness regime (small \(\varepsilon\)), for \(= O(n)\). Or, as the authors of the first of the two papers put it: “In words, we show that ε agreement implies global structure”

Efficient learning of quantum states prepared with few fermionic non-Gaussian gates, by Antonio Anna Mele and Yaroslav Herasymenko (arXiv). While most of the paper’s focus is on tomography (learning) of a specific class of quantum states, the authors also provide in Appendix A an algorithm for a property testing question: namely, testing the Gaussian dimension of a quantum state: specifically, tolerant testing of \(t\)-compressible \(n\)-qubit Gaussian states in trace distance (Theorem 48). I do not fully grasp what all this means, to be honest, so I’ll stop here.

News for January 2024

Welcome to the posting for first month of 2024! We hope you had a good start to this year. Last month, we had 3 papers. So, without further delay, let’s get started.

On locally-characterized expander graphs (a survey) by Oded Goldreich (ECCC) In this paper, written in an expository style, Goldreich surveys the result of Adler, Kohler and Peng which we covered in our posts on May 2021 and August 2020. The survey begins by reminding the reader what a locally-characterizable graph property is. A family of graphs \(\mathcal{G}\) is said to be characterized by a finite class \(\mathcal{F}\) of graphs if every graph \(G \in \mathcal{G}\) is \(F\)-free for all \(F \in \mathcal{G}\). Thanks to its connections with expressibility in first order logic, one would expect the a locally-characterizable graph property to admit testers depending only on the proximity parameter (in the bounded degree graph model). So, it was quite a surprise when Adler, Kohler and Peng showed that there are locally characterizable graph properties which provably require testers whose query complexity increases with the size of the graph. The main theorem of Adler, Kohler and Peng describes the locally characterizable property that is hard to test. This characterization asserts that you can get your hands on a finite class \(\mathcal{F}\) of graphs so that \(\mathcal{F}\)-freeness of a graph \(G\) means that the graph is a (bounded-degree) expander. One of the key ingredients used in this proof is the Zig-Zag construction of Reingold, Vadhan and Wigderson.

Fast Parallel Sampling under Isoperimetry by Nima Anari, Sinho Chewi, and Thuy-Duong Vuong (arXiv) The featured paper considers the task of sampling (in parallel) from a continuous distribution \(\pi\) supported over \(\mathbb{R}^d\). The main result in the paper shows that for distributions which satisfy a log-Sobolev inequality, you can use parallelized Langevin Algorithms and obtain samples from a distribution \(\pi’\) where \(\pi’\) and \(\pi\) are close in KL-divergence. As an application of their techniques, the authors show that their results can be used to do obtain RNC samples for directed Eulerian Tours and asymmetric Determinantal Point Processes.

Holey graphs: very large Betti numbers are testable by Dániel Szabó, Simon Apers (arXiv) This paper considers the problem of testing whether an input graph \(G\) has a very large \(k\)-th Betti Number (at least \((1-\delta) d_k\) where \(d_k\) denotes the number of \(k\)-cliques in \(G\) and \(\delta > 0\) is sufficiently small) in the dense graph model. As the title indicates, the result says that this property is testable for constant \(k\). Earlier in 2010, Elek investigated this question in the bounded degree model. Elek’s main result showed that with a number of queries \(q = q(\varepsilon\), you can obtain an estimate to the \(k\)-th Betti Number which is within an additive \(\pm \varepsilon n\) of the true \(k\)-th Betti Number. Let us contrast this result with the setting of dense graphs. The authors note that in the dense graph model, getting an estimate to the \(k\)-th Betti Number which is within an additive $\pm \varepsilon d_k$ of the true value needs \(\Omega(n)\) queries. This is the reason why the authors consider the formulation above.

News for December 2023

Happy New Year! This post covers the final property testing results for 2023. We have just two three papers, one on group testing, a new bound on sample-based testers, and a distribution testing result over higher dimensions.

On Testing Group Properties by Oded Goldreich and Laliv Tauber (ECCC). Group testing is a classic problem going back to the early days of property testing. We are given access to a “multiplication table” \(f: [n]^2 \to [n]\). Each element in \([n]\) is (supposedly) an element of a group, and \(f(i,j)\) is the product of elements \(i\) and \(j\). Our aim is to determine, in the property testing sense, whether \(f\) is the multiplication table of a group. The earliest result, from the seminal Spot-checkers paper, gave an \(\widetilde{O}(n^{3/2}/\varepsilon)\) time tester (where \(\varepsilon\) is the proximity parameter). This paper significantly improves that classic bound with a one-sided \(\widetilde{O}(n/\varepsilon)\) tester. Moreover, this tester can be adapted for testing of \(f\) is Abelian. The result is obtained by a series of testers, starting from extremely simple (yet time inefficient) to more complex versions. The best known lower bound is just \(\Omega(\log n)\), and that leaves a tantalizing (and wide open) gap to be reduced.

A Structural Theorem for Local Algorithms with Applications to Coding, Testing, and Verification by Marcel Dall’Agnol, Tom Gur, and Oded Lachish (arXiv). One of the powers of property testers is their choice of query. This is unlike the typical setting in learning, where one only gets access to random samples (or evaluations). Sample-based testers were defined to bridge this gap; so it is a property tester that only has access to (evaluations of) uniform random domain points. A sample-based tester simply checks if the sample is consistent with the property. It is natural to ask if the existence of a (vanilla) property tester implies the existence of a sample based tester. For the simplest setting, consider a \(q\)-query non-adaptive tester for some property \(\mathcal{P}\). One can visualize this as a collection of query sets of size \(q\) in a universe of size \(n\) (the domain). Naively, one might hope that with a \(q\)-way collision argument, a random sample of size \(O(n^{1-1/q})\) would contain a query set, yielding a sample based tester. Previous work showed that, for any property with a \(q\)-query non-adaptive tester, there is a sample-based tester with complexity \(O(n^{1-1/(q^2\log q)})\). Remarkably, this work gives such a bound for even adaptive testers (the best previous bound was \(O(n^{1-1/\exp(q)})\)). The result is placed in a broader framework of robust local algorithms, which subsume \(q\) query property testers, locally decodable codes (LDC), and MA proofs of proximity.

Testing Closeness of Multivariate Distributions via Ramsey Theory by Ilias Diakonikolas, Daniel M. Kane, and Sihan Liu (arXiv). (Missed from last month. -Ed) This paper considers distribution testing where the universe is \(\mathbb{R}^d\). The notion of closeness is called \(\mathcal{A}_k\) distance: we cover the universe with \(k\) axis-parallel rectangles, and “reduce” the distribution to a discrete universe of size \(k\). We then take TV-distance over these reduced distributions. When \(d=1\), the complexity of testing closeness was known to be \(\Theta(k^{4/5}/poly(\epsilon))\). For \(d > 1\), this paper gives the first non-trivial closeness testing result. The (optimal in \(k\)) bound achieved is \(\Theta(k^{6/7}/poly(\epsilon))\). Interestingly, there is a jump in the exponent on \(k\) from \(d=1\) to \(d=2\), but no jump for larger \(d\).

News for November 2023

November does not seem to have been a busy month for sublinear-time algorithms! As usual, please let us know if we missed any paper.

Nearly Optimal Bounds for Sample-Based Testing and Learning of k-Monotone Functions, by Hadley Black (arXiv). (This paper is from Oct, which we unfortunately missed. -Ed) Monotone functions are a common object of study in property testing. One of the big promises of property testing is that it is potentially easier than learning. Consider a monotone function \(f:\{0,1\}^n \to \{0,1\}\), and the problem of sampled-based monotonicity testing. This means that the tester only has access to uniform random samples. The classic Bshouty-Tamon Fourier spectrum result proves that that \(\exp(O(\sqrt{d}/\varepsilon))\) samples suffice to learn \(f\) up to error \(\varepsilon\). But could testing be significantly easier? The only non-trivial lower bound for sample-based testers was \(\Omega(\sqrt{2^d/\varepsilon})\) for \(\varepsilon \ll d^{-3/2}\). Until this paper. One of the main result is a lower bound of \(\exp(\Omega(\sqrt{d}/\varepsilon))\), which is optimal up to polynomial factors. Thus, with random samples, the learning and testing complexities are within polynomials of each other. Interestingly, sampled-based testing with one-sided error requires \(2^{\Omega(d)}\) queries, and is therefore harder than two-sided testing. That runs counter to most monotonicity testing results, wherein the one-sided and two-sided tester complexities are (roughly) the same. The paper also gives analogous results for the property of \(k\)-monotonicity, a generalization of monotonicity.

Testing Intersecting and Union-Closed Families, by Xi Chen, Anindya De, Yuhao Li, Shivam Nadimpalli, Rocco A. Servedio (arXiv). A function \(f:2^{[n]} \to \{0,1\}\) is monotone if \(f^{-1}(1)\) is a monotone set system. Testing monotonicity of functions of the form \(f:2^{[n]} \to \{0,1\}\) is a cornerstone of research in property testing and the work has resulted in novel insights on isoperimetry of the Boolean hypercube. Khot, Minzer and Safra (FOCS 2015; SICOMP 2018) gave a \(\tilde{\Theta}(\sqrt{n}/\epsilon^2)\)-query nonadaptive \(\epsilon\)-tester for monotonicity, which is complemented by a \(\tilde{\Omega}(n^{1/3})\) lower bound by Chen, Waingarten and Xie (STOC 2017). The current paper studies testing the properties of `intersectingness’ and union-closedness of the set system \(f^{-1}(1)\) defined by functions of the form \(f:2^{[n]} \to \{0,1\}\). They show that neither of these properties can have nonadaptive \(\epsilon\)-testers with \(\text{poly}(n,1/\epsilon)\) query complexity, which happens to be the case for monotonicity as discussed above. In addition to the strong lower bounds they exhibit, they also give nonadaptive one-sided error \(\epsilon\)-testers with query complexity \(\text{poly }(n^{\sqrt{n \log (1/\epsilon)}},1/\epsilon)\) for these properties.

Next, we make a leap to the world of quantum sublinear-time algorithms.

(Quantum) complexity of testing signed graph clusterability, by Kuo-Chin Chen, Simon Apers, and Min Hsieu Hsieh (arXiv). A signed graph is one where edges are either labeled positive or negative and such a graph is clusterable, if there is a way to partition the vertex set so that edges with endpoints in the same part are labeled positive and edges whose endpoints are in different parts are labeled negative. This paper proves two main results on property testing clusterability of signed graphs presented in the bounded degree model. First, they show that every classical tester for the problem requires \(\Omega(\sqrt{n})\) queries. Additionally, they also prove that there is a quantum algorithm for the same problem with query complexity \(\tilde{O}(n^{1/3})\), thereby implying that quantum algorithms are more powerful than their classical counterparts for the problem of clusterability testing.

Quantum speedups for linear programming via interior point methods, by Simon Apers and Sander Gibling (arXiv). This paper gives a quantum algorithm that, given a linear program over \(d\) variables containing \(n\) constraints, outputs a feasible solution with value at most \(\epsilon\) more than that of the optimal solution, and runs in time \(\tilde{O}(\sqrt{n} \cdot\text{ poly}\log(1/\epsilon))\) for constant \(d\), i.e., in time sublinear in the “height” of the program. Other quantum algorithms for solving linear systems in the same sense have running times that either depend on \(\text{poly}(1/\epsilon)\) or on the condition number of the linear system being solved.

News for October 2023

Last month was a little slower, with only (unless we missed some) three papers: two papers appearing, and one that was overlooked from the month before.

Stability of Homomorphisms, Coverings and Cocycles I: Equivalence, by Michael Chapmanand Alexander Lubotzky (arXiv). This paper considers three “stability” problems in topological property testing: namely, problems of the form “are objects almost-X close to X”, where X (here) is one of homorphisms, coverings of a cell complex, or 1-cocycles. The main result of the paper is that the three property testing problems are equivalent: namely, they admit testing proximity-oblivious testers (POTs) with similar rejection probability rates.

Testing Higher-order Clusterability on graphs, by Yifei Li, Donghua Yang, and Jianzhong Li (arXiv). The authors propose a new notion of graph clusterability, higher-order clusterability, meant to generalize the previously studied notions of clusterability; and proceed to provide testing algorithms for this notion.

Private Distribution Testing with Heterogeneous Constraints: Your Epsilon Might Not Be Mine, by Clément Canonne and Yucheng Sun (arXiv). This distribution testing paper (carry-over from the previous month) focuses on the extensively studied problem of closeness testing: given samples from two unknown distributions \(p\) and \(q\), decide whether \(p=q\) or if they are far. Now, add (differential) privacy: what if the two sets of samples, from \(p\) and \(q\) respectively, were sensitive data? And now, the focus of the paper… what if the two sets of samples were not equally sensitive? What are the trade-offs between the number of samples from \(p\) and the number from \(q\), then?

News for September 2023

Sorry for delay in getting this month’s post out. This time we had seven (EDIT) eight (LATER EDIT) nine papers. Thanks to our readers for pointing out a paper we missed. Do let us know if we missed any (EDIT) others. Alright, without further delay, let us look at this month’s spread.

Mildly Exponential Lower Bounds on Tolerant Testers for Monotonicity, Unateness, and Juntas by Xi Chen, Anindya De, Yuhao Li, Shivam Nadimpalli, and Rocco A. Servedio (arXiv) Let us begin with a paper on property testing of boolean functions over the binary hypercube \(\{0,1\}^n\). Quoting from the abstract.

[This paper gives] the first superpolynomial (in fact, mildly exponential) lower bounds for tolerant testing of monotonicity, unateness, and juntas with a constant separation between yes and no cases.

Let us take a little dip to get a superficial feel for what techniques the paper uses for boolean monotonicity. The key lies in an adaptation of the lower bound techniques pioneered in the paper of Pallavoor-Raskhodnikova-Waingarten (PRW) which considers distributions over “yes functions” (which are \(\varepsilon/\sqrt{n}\)-close to monone) and “far functions” (which are $\eps$-far from being monotone). PRW split the \(n\) variables into \(n/2\) control variables and \(n/2\) action variables. PRW consider the subcubes indexed by bit-strings where the control variables are balanced. The function value is chosen carefully on these subcubes which ensures any tester that reliably distinguishes yes and no functions need to sample bit-strings from the same balanced subcube which differ in lots of action variables. As PRW argue, this event occurs with low probability if the allowed number of queries is small. The key insight of the featured paper is to modify the PRW construction by using random monotone DNF formulas due to Talagrand. If this sets up your appetite, go read the paper!

Testing Junta Truncation by William He and Shivam Nadimpalli (arXiv)

Let \(f \colon \{0,1\}^n \to \{0,1\}\) be a \(k\)-junta. You are given a set \(\mathcal{U}_{\text{yes}} = \bigl\{0,1\bigr\}^n\) and a set \(\mathcal{U}_{\text{no}} = \bigl\{x \in \{0,1\}^n \colon f(x) = 1 \bigr\}.\) Consider uniform distributions supported on both of these sets which we call \(\mathcal{D}_{\text{yes}}\) and \(\mathcal{D}_{\text{no}}\). Finally, we define a distribution \(\mathcal{D} = \begin{cases} \mathcal{D}_{\text{yes}} \text{ w.p. } 1/2 \\ \mathcal{D}_{\text{no}} \text{ w.p } 1/2 \end{cases}.\)

You are given \(t\) samples from the distribution \(D\). The task is to decide whether \(D\) is the yes distribution or the no distribution. The featured paper shows you can reliably solve this task with \(t \leq \min(2^k + \log{n \choose k }, 2^{k/2} \log^{1/2}{n \choose k})\) samples. The paper also supplements this result with a lower bound of \(t \geq \log{n \choose k}\) samples fewer than which cannot be used to reliably distinguish these two distributions. The results suggest that this “testing junta truncation” problem requires learning the set of relevant variables for the junta.

Longest Common Substring and Longest Palindromic Substring in \(\mathbf{\widetilde{O}(\sqrt n)}\) Time by Domenico Cantone, Simone Faro, Arianna Pavone, and Caterina Viola (arXiv) I paraphrase from the abstract of this paper. You know the longest common substring and longest palindromic substring as classic problems in computer science both of which can be solved in linear time using suffix trees. Recently, quantum algorithms were proposed for both of these problems in the query model both of which issue only \(o(n)\) quantum queries. The featured paper notes that this query model has a shortcoming namely when it comes to real life implementation on actual hardware. The current paper address this shortcoming by presenting \(o(n)\) quantum-query algorithms in the circuit model of computation.

Testing properties of distributions in the streaming model by Sampriti Roy and Yadu Vasudev (arXiv) Alright, now let us consider a different twist on distribution testing. Suppose you have a small memory. You obtain a bunch of samples to solve some standard distribution testing task but the twist is of course you cannot store all the samples. What can you say about how sample complexity trades off against space complexity? The featured paper studies this trade off in the standard access model and the conditional access model. One of the results of the paper asserts that in the conditional access model, you can do identity testing with \( \widetilde{O}\bigl(\frac{\log^4n}{\varepsilon^4}\bigr)\) samples while using only \(O\bigl(\frac{\log^2 n}{\varepsilon^2} \bigr)\) bits of memory.

Testing Spreading Behavior in Networks with Arbitrary Topologies by Augusto Modanese and Yuichi Yoshida (arXiv) We covered the problem of testing dynamic environments in this March 2014 post and that May 2021 post earlier. The goal here is to check whether a dynamically evolving system evolves according to some fixed rule or whether it evolves according to some fixed rule or whether the system is far from systems that evolve according to that fixed rule. The May 2021 post covered a paper which shows you can test dynamically evolving systems that evolve according to what is called the threshold rule. The featured paper considers rules motivated by some kind of models for infection spreading. One of the results in the paper presents one-sided and two-sided testers (with \(O(1/\varepsilon)\) query complexity) for testing a single step of evolution (on bounded degree graphs) with these rules.

A Tight Lower Bound of Ω(log n) for the Estimation of the Number of Defective Items by Nader Bshouty and Gergely Harcos (ECCC) The featured paper considers a problem in group testing. Let us quickly review the setup for group testing. You are given some ground set \(X\) of \(|X| = n\) items. Suppose the set of items in the set \(I \subseteq X\) is defective. The challenge is to devise a test which refers to some set \(Q \subseteq X\) where the test is said to be successful iff \(Q \cap I \neq \emptyset\). This paper presents lower bounds for non-adpative algorithms for group testing. And as the title says, if your algorithm wishes to estimate the number of defective items to within a constant factor, you better pay up \(\Omega(\log n)\) tests.

A tight lower bound on non-adaptive group testing estimation by Tsun-Ming Cheung, Hamed Hatami, and Anthony Ostuni (arXiv) As our readers pointed out, this paper is concurrent with the paper above and achieves the same lower bound. Indeed, this holds for both the one-sided and the two-sided variants. Furthermore, as this paper shows if one knows the set \(I\) satisfies \(L \leq |I| \leq U\) then you can show both one-sided and two-sided lower bounds of \(\Omega(U/L)\) non-adaptive queries if you want a constant approximation to \(|I|\).

On Testing Isomorphism to a Fixed Graph in the Bounded-Degree Graph Model by Oded Goldreich and Laliv Tauber (ECCC) This paper looks like a fantastic read for your students — especially when written in such an engaging style spanning (only) 18 highly readable pages. As the title indicates, this paper considers the challenge of testing isomorphism to a fixed graph in the bounded degree model. The main result of this paper asserts that for almost all \(d\)-regular graphs \(H\) on \(|V(H)| = n\) vertices, testing isomorphism to \(H\) can be done in about \(\approx \sqrt n\) queries. The paper also presents an almost matching (query) lower bound which also holds for almost all graphs \(H\).

Tolerant Testing of High-Dimensional Samplers with Subcube Conditioning by Gunjan Kumar, Kuldeep S. Meel, and Yash Pote (arXiv) Let us consider the distribution testing setup with a twist: Suppose you are given some unknown distribution \(\mathcal{Q}\) supported over \(\{0,1\}^n\) and you want to sample from it conditioned on some predicate \(\mathcal{Q}\). The question is can you efficiently check whether these are legit samples (satisfying the predicate) which are taken from the distribution \(\mathcal{Q}\). Our October 2020 news covered a tolerant tester on this problem which involved some subset of the authors of the current paper. The featured paper considers what additional leverage you gain if you are given access to a sampling oracle which can sample from “conditioned subcubes.” In this model, you can query some subcube and after issuing a query, you will receive an element \(x\) from this subcube with probability proportional to original probability weight of \(x\). The paper provides a tolerant tester in this setup which makes at most \(\widetilde{O}(n^3/(\varepsilon_2 – \varepsilon_1)^5)\) queries to this sampling oracle.

News for August 2023

The month of August was… wow, intense! We listed no fewer than 13 property testing papers, which is great. Keep them coming!

If you like high-dimensional expanders (HDX) and agreement testing, this is your lucky day! We start with two (2!) exciting papers on HDX, and agreement testing in the low soundness regime:

Characterizing Direct Product Testing via Coboundary Expansion, by Mitali Bafna and Dor Minzer (ECCC). This paper focuses on direct product testing: namely, given query access to a function \(F\) defined on a \(d\)-dimensional simplicial complex, we consider testers which samples two faces, and check if \(F\) is consistent on their intersection. Such testers will pass with probability one when \(F\) is a direct product, so the question is what “soundness” (probability of rejecting if \(F\) is far from being one) one can achieve. And, central to this paper: can we use high-dimensional expanders to get such testers?
While the high-soundness regime is reasonably well understood, the low-soundness one… not so much. This paper makes progress on that front: the main result states that, to allow for direct product testing in this low soundness regime, “a high-dimensional spectral expander must possess a property that may be seen as a generalization of coboundary expansion.”

Agreement theorems for high dimensional expanders in the small soundness regime: the role of covers, by Yotam Dikstein and Irit Dinur (ECCC). Concerned with the same type of question, this paper provides another characterization of the agreement test properties of high-dimensional expanders in the low-soundness regime: specifically, in terms of the topological covers of the expander.

Rejoice: we’re not done with HDX!

New Codes on High Dimensional Expanders, by Irit Dinur, Siqi Liu, and Rachel Zhang (ECCC). This paper constructs a new family of locally testable codes (LTCs), based on simplicial (bounded-degree) high dimensional expanders, which, while they do not achieve the \(c^3\) Holy Grail (constant rate, distance, and testability), do satisfy a host of potentially useful properties: symmetric, with low-density parity-check matrices (LDPC), and some additional structure (multiplying two codewords gives a related codeword).

Now, onto quantum…

Space-bounded quantum state testing via space-efficient quantum singular value transformation, by François Le Gall, Yupan Liu, and Qisheng Wang (ECCC). Suppose two “computationally limited” (read: space-bounded) quantum devices are used to prepare quantum states \(\rho_0\) and \(\rho_1\), and claim they are identical: your goal is to check whether these two states are close, or very far, for some reasonable distance measure. How hard could that be, computationally? This is exactly a variant of quantum state testing (i.e., the “quantum generalization of tolerant closeness testing” in distribution testing jargon) which places restrictions on the “source” of the observations: this paper establishes several characterisations of the computational complexity of the task, for various distance measures: trace distance, Hilbert-Schmidt distance, quantum entropy difference, and quantum Jensen-Shannon divergence.

Quantum Lower Bounds by Sample-to-Query Lifting, by Qisheng Wang and Zhicheng Zhang (arXiv). There exist several methods to prove quantum query complexity lower bounds, such as the polynomial and the adversary methods. This paper brings a new kid in town: a “sample-to-query lifting” result, which lets one convert a query-based algorithm for a promise problem to obtain a sample-based quantum algorithm for a corresponding testing problem, with a quadratic blowup from query to sample complexity. That is, any sample complexity lower bound for that problem now implies a (quadratically weaker) query complexity lower bound for the original task! Thus, the result allows to “systematically derive quantum query lower bounds using corresponding sample lower bounds from quantum information theory.” The authors use this sample-to-query lifting to establish both new and previously known quantum query complexity lower bounds.

A little magic means a lot, by Andi Gu, Lorenzo Leone, Soumik Ghosh, Jens Eisert, Susanne Yelin, and Yihui Quek (arXiv). Alright, I have to say, I am completely out of my depth with this one. But the title is great, the abstract sounds interesting, and the paper contains a lower bound on the number of copies any tester for “non-stabilizerness” requires (Theorem 6).

… then interactive proofs…

Distribution-Free Proofs of Proximity, by Hugo Aaronson, Tom Gur, Ninad Rajgopal, and Ron Rothblum (ECCC). In distribution-free property testing, to model “real practical settings” the underlying distance measure is not Hamming, but one induces by an unknown probability distribution, PAC-learning style. In interactive proofs of proximity (IPP), the property testing algorithm can interact by an all-powerful, but all-very-much-untrusted prover, IP-style. Well, you probably have guessed by now: this paper introduces a new type of property testing, combining the two! And shows that any problem in NC admits a distribution-free interactive proof of proximity with strongly sublinear communication (\(\tilde{O}(\sqrt{n})\)), which (under some reasonable assumption) is near-optimal.

… and then, distribution testing!

New Lower Bounds for Testing Monotonicity and Log Concavity of Distributions, Yuqian Cheng, Daniel Kane, and Zhicheng Zheng (arXiv). Monotonicity is a pretty clean and natural property of probability distributions over the line: the probability mass function must be non-increasing. This definition straightforwardly generalizes to either dimensions, and, let’s be honest: given how simple this property is, surely, by now we must have tight bounds on the sample complexity of testing monotonicity, right? Turns out, yes, but no. The known upper and lower bounds are tight… as long as the distance parameter \(\varepsilon\) is not too small. But if it is, then… the bounds are lose. And same thing for a related property, log-concavity! This state of affairs was really annoying to some, and I am on of these “some.” Fortunately, this paper makes significant progress towards scratching that itch, by proving new lower bounds for both monotonicity and log-concavity (via a new lower bound technique, relying on moment matching but preserving the type of constraints monotonicity and log-concavity impose on the individual probabilities). Almost there!

Tight Lower Bound on Equivalence Testing in Conditional Sampling Model, by Diptarka Chakraborty, Sourav Chakraborty, and Gunjan Kumar (arXiv). In the usual sampling model for distribution testing, the algorithm gets i.i.d. samples from the unknown distribution(s). This is very natural, but often leads to very high sample complexities… a recent line of work, starting with Chakraborty, Fischer, Goldhirsh, and Matsliah (2011) and Canonne, Ron, and Servedio (2012), introduced the “conditional sampling model”, a more powerful oracle setting where the algorithm gets to condition on subsets of the domain of its choosing. And, lo and behold, with great (additional) power comes greatly lower sample complexity! But again, some itch to scratch: for one of the “flagship” distribution testing questions, closeness testing, there remained a quadratic gap between upper (\(\tilde{O}(\log\log n)\), Falahatgar, Jafarpour, Orlitsky, Pichapati, and Suresh ’15) and lower (\(\Omega(\sqrt{\log\log n})\), Acharya, Canonne, Kamath’15). Not anymore! For this question (and some related ones), this paper essentially closes the gap, proving an \(\tilde{\Omega}(\log\log n)\) lower bound, bypassing the limitations of previous lower bound techniques!

Support Testing in the Huge Object Model, by Tomer Adar, Eldar Fischer, and Amit Levi (arXiv). Another paper, another distribution testing model! In the Huge Object Model recently introduced by Goldreich and Ron, one gets i.i.d. samples from an unknown probability distribution over \(n\)-bit strings, but does not to actually get the samples themselves, which are too big: instead, the algorithm then gets query access to the individual samples (each being an \(n\)-bit string). This is a rather appealing model, and the authors focuses on the task of testing the support size of the distribution. As their results show, this turns out to be quite interesting: namely, the paper shows a separation between adaptive and non-adaptive algorithms, as well as a separation between number of samples and number of queries (to these samples).

We also have some Boolean function testing on the menu:

Linear isomorphism testing of Boolean functions with small approximate spectral norm, by Arijit Ghosh, Chandrima Kayal, Manaswi Paraashar, and Manmatha Roy (arXiv). Given two Boolean functions \(f,g\colon\mathbb{F}_2^n\to{-1,1}\), a standard property testing question is to test whether \(f,g\) are isomorphic. Now, one can instead ask whether they are linearly isomorphic: i.e., whether there exists some invertible matrix \(A\) with entries in \(\mathbb{F}_2\) such that \(f\circ A = g\). This is this question that the authors study, through two different lenses: first, focusing on the communication complexity of the question when Alice has \(f\) and Bob has \(g\); and on the (usual) query complexity. In both cases, their results are expressed in terms of the (approximate) spectral norm of the functions.

Adversarial Low Degree Testing, by Dor Minzer and Kai Zhe Zheng (arXiv). Consider an adversarial setting of property testing where, every time the tester makes a query, an adversary gets to “erase” a small number values of the function, say \(t\), to try to fool the tester. This model was recently introduced by Kalemaj, Raskhodnikova, and Varma: in this paper, the authors continue this line of study, and show that one can test in an erasure-resilient fashion whether a function has low degree (at most \(d\)) with \(\tilde{O}(\log^{O(d)}(t)/\varepsilon)\) queries.

And, to conclude, graph testing:

Testing Graph Properties with the Container Method, by Eric Blais and Cameron Seth (arXiv). Consider the \(\rho\)-CLIQUE property: does a graph on \(n\) vertices contains a clique of size at least \(\rho n\), or is it \(\varepsilon\)-far from it? This property has been extensively studied in the dense graph testing model, but lower and upper bound still exhibited a factor-\(\rho\) gap, particularly striking in the “small clique” regime (when \(\rho \ll 1\)). This paper, using the graph container method, nearly closes that gap, up to polylogarithmic factors: showing an \(\tilde{O}(\rho/\varepsilon)\)-query upper bound. The authors also establish an improved upper bound for another flagship testing question, \(k\)-COLORABILITY.

Oh, my. We forgot a paper last month (appeared end of June), which means… fourteen. Fourteen! (And, of course, apologies for overlooking this paper!)

Refining the Adaptivity Notion in the Huge Object Model, by Tomer Adar and Eldar Fischer (arXiv). For the huge object model in property testing, see one of the papers above. In this one, the authors focus on the notion of adaptivity in this model, in a fine-grained fashion: i.e., which queries depends on what the tester previously saw, with interesting subtleties about what that means in the huge object model. Of course, the key question is whether adaptivity allows for more efficient property testing… As they show, the answer is yes: the authors establish a hierarchy of separations.

News for July 2023

We’re now back to our regular cadence of 4+ monthly papers on property testing and sublinear time algorithms. July brings us a new sublinear PageRank computation and numerous results on our trending topics of distribution and monotonicity testing. Without further delay, let’s see the July spread.

Estimating Single-Node PageRank in \(\widetilde{O}(\min(d_t,\sqrt{m}))\) Time by Hanzhi Wang and Zhewei Wei (arXiv). PageRank is one of the most important quantities computed on graphs, especially in today’s networked world. Consider an undirected graph \(G = (V,E)\). The PageRank value of vertex \(t\) is the probability that a short random walk, starting from the uniform distribution, reaches the vertex \(t\). (Ok ok, I’m fudging details, but this is close enough to the truth.) The aim is to estimate this probability to within additive error \(O(1/n)\), where \(n\) is the number of vertices. A standard simulation would give an \(O(n)\) time algorithm; can we do sublinear in graph size? Previous work (which actually has implementations!) has led to \(O(\sqrt{n\cdot d_t})\) for undirected graphs [LBGS15] and (roughly) \(O(n^{2/3} d^{1/3}_{max})\) for all graphs [BPP18]. Here, \(d_t\) is the degree of vertex \(t\) and \(d_{max}\) is the maximum degree. This paper gets a bound of \(\widetilde{O}(\min(d_t,\sqrt{m}))\). A lower bound is still open for the fundamental problem! (A nice problem for any students reading…?)

Directed Poincare Inequalities and \(L_1\) Monotonicity Testing of Lipschitz Functions by Renato Ferreira Pinto Jr. (arXiv, ECCC). We all (or at least me) love monotonicity testing. The paper studies the continuous version, where \(f:[0,1]^n \to \mathbb{R}\). To have a reasonable notion of distance and testers, we will require functions to be differentiable and \(L\)-Lipschitz. We measure distance using \(\ell_1\) distance, so the distance between \(f,g\) is \(\int_{[0,1]^n} |f-g| d\nu\) (where we integrate over the uniform measure) [BRY14]. To access \(f\), we are also provided a “derivative oracle”: given a point \(x\) and a direction \(\vec{v}\), we can query the directional derivative along \(\vec{v}\) at \(x\). One of the key insights in (discrete, Boolean) monotonicity testing is the connection to directed isoperimetric inequalities. These inequalities are directed versions of classic inequalities that relate boundaries (or gradients) to volumes. For \(L\)-Lipschitz functions, the classic Poincare inequality states that \(E[\|\nabla f\|_1] \geq \textrm{var}(f)\), where \(\nabla f\) is the gradient and \(\textrm{var}(f)\) is (up to constant factors) the distance to being constant. This paper proves the directed version \(E[\|\nabla^- f\|_1] \geq \varepsilon(f)\). Roughly speaker, \(\nabla^-\) is the “negative gradient” (\(min(\nabla f, 0)\)) and \(\varepsilon(f)\) is the \(L_1\)-distance to monotonicity. This result directly yields a \(O(n/\varepsilon)\) query tester for monotonicity. The paper asks the tantalizing question: can we do better?

A Strong Composition Theorem for Junta Complexity and the Boosting of Property Testers by Guy Blanc, Caleb Koch, Carmen Strassle, and Li-Yang Tan (arXiv). It is best to go backwards in discussing this paper, starting from the implications and going to the core results. The problem at hand is the classic one of junta testing. Given \(f:\{0,1\}^n \to \{0,1\}\), distinguish if \(f\) only depends on \(r\) variables (an \(r\)-junta) or is \(\varepsilon\)-far from having that property. This problem is well studied, has optimal (efficient) testers, and even has results in the distribution-free setting. In the latter setting, there exists some (unknown) distribution \(\mathcal{D}\) on the domain according to which distance is defined. The tester can access to queries according to \(\mathcal{D}\). The clarity of junta testing disappears when we consider tolerant testing: can we distinguish \(f\) is (say) \(1/4\) close to an \(r\)-junta from being \(1/3\)-far (where distances are measured according to \(\mathcal{D}\))? A remarkable consequence of this paper is that this tolerant testing version is unlikely to have a \(poly(n)\) time algorithm. (Note that the sample complexity might still be small.) The main tool is a composition theorem that gives reductions between low \(\varepsilon\) testers and constant \(\varepsilon\) testers. The details are intricate, but here’s the gist. Suppose we can construct composing functions \(g\) such that the distance to “junta-ness” of \(g \circ f\) is much larger than the distance of \(f\). Then a tester (that can only deal with large \(\varepsilon\)) on \(g \circ f\) can effectively property test \(f\). (Obviously, I’m greatly oversimplifying and mischaracterizing. So go read the paper!)

The Full Landscape of Robust Mean Testing: Sharp Separations between Oblivious and Adaptive Contamination by Clément L. Canonne, Samuel B. Hopkins, Jerry Li, Allen Liu, and Shyam Narayanan (arXiv). Consider the fundamental hypothesis testing problem of Gaussian testing. We wish to distinguish the \(d\)-dimensional, zero-mean, unit covariance Gaussian \(\mathcal{N}(0,I)\), from the “shifted” version \(\mathcal{N}(\mu,I)\), where \(\mu\) is a vector of length at least $latex  \alpha$. Existing results give a simple algorithm using \(\Theta(\sqrt{d}/\alpha^2)\) samples. Suppose there is an adversary who can corrupt samples. The adaptive adversary can look at the samples generated, and arbitrarily change an \(\varepsilon\) fraction of entries. The weaker, oblivious adversary can also arbitrarily change an \(\varepsilon\) fraction of entries, but cannot look at the samples generated. Can we perform Gaussian testing in this setting? A previous result gave an optimal bound for adaptive adversaries [N22]. This paper gives the optimal sample complexity bound for oblivious adversaries. The expression is somewhat complicated. But the punchline is that for many regimes of parameters \(d, \alpha, \varepsilon\), the oblivious adversary is strictly weaker. Meaning, there is a testing algorithm (against an oblivious adversary) whose sample complexity is strictly less than the optimal bound against adaptive adversaries. This comes as a surprise, because in typical distribution testing settings, these adversaries are basically equivalent.

Uniformity Testing over Hypergrids with Subcube Conditioning by Xi Chen and Cassandra Marcussen (arXiv). A problem of generalizing from hypercube to hypergrids, an issue I have much obsessed over. Consider the problem of uniformity testing, where the domain is the hypergrid \([m]^n\). We wish to test if a distribution \(\mathcal{D}\) over the hypergrid is uniform. Unlike the standard distribution testing setting, the domain size is exponentially large. To aid us, we can perform conditional sampling: we can select any sub-hypergrid, and get samples from \(\mathcal{D}\) restricted to this sub-hypergrid. Previous work solved this problem for the hypercube domain (when \(m=2\)), by providing a tester with sample complexity \(O(\sqrt{n}/\varepsilon^2)\) [CCKLW22]. The previous work did not work for any other hypergrid (even \(m=3\)). This paper gives the first solution for uniformity testing on hypergrids with a tester of sample complexity \(O(poly(m)n/\varepsilon^2)\). The dependence on \(m\) has to be at least \(\sqrt{m}\), from existing results. One of the important tools is getting robust versions of classic isoperimetric inequalities over the hypergrid. An important open question is to determine the optimal dependence on \(\mathcal{m}\).

Learning and Testing Latent-Tree Ising Models Efficiently by Davin Choo, Yuval Dagan, Constantinos Daskalakis, and Anthimos Vardis Kandiros (arXiv). Depending on your standpoint, one may or may not consider this a “sublinear time” paper (it does not show that testing \(\ll\) learning). But given the interest in distribution testing, I think it’s germane to our blog. We have a high-dimensional distribution \(\mathcal{D}\) over \((x_1, x_2, \ldots, x_n)\). Without any assumption, learning (or even identity testing) requires exponential in \(n\) many samples. This paper assumes that \((x_1, x_2, \ldots, x_n)\) is generated from a latent-tree Ising model. There is a tree where each node is associated with a number (think \(x_i\)), and the joint distribution satisfies some dependencies according to the edges. We only observe the values of the leaves; hence, the term “latent” model. The main result shows that identity testing can be done in with polynomial in \(n\) samples. The authors also prove that one can learn the distribution in (albeit larger) \(poly(n)\) samples.