News for July 2023

Leave a reply

We’re now back to our regular cadence of 4+ monthly papers on property testing and sublinear time algorithms. July brings us a new sublinear PageRank computation and numerous results on our trending topics of distribution and monotonicity testing. Without further delay, let’s see the July spread.

Estimating Single-Node PageRank in $\widetilde{O}(\min(d_t,\sqrt{m}))$ Time by Hanzhi Wang and Zhewei Wei (arXiv). PageRank is one of the most important quantities computed on graphs, especially in today’s networked world. Consider an undirected graph $G = (V,E)$. The PageRank value of vertex $t$ is the probability that a short random walk, starting from the uniform distribution, reaches the vertex $t$. (Ok ok, I’m fudging details, but this is close enough to the truth.) The aim is to estimate this probability to within additive error $O(1/n)$, where $n$ is the number of vertices. A standard simulation would give an $O(n)$ time algorithm; can we do sublinear in graph size? Previous work (which actually has implementations!) has led to $O(\sqrt{n\cdot d_t})$ for undirected graphs [LBGS15] and (roughly) $O(n^{2/3} d^{1/3}_{max})$ for all graphs [BPP18]. Here, $d_t$ is the degree of vertex $t$ and $d_{max}$ is the maximum degree. This paper gets a bound of $\widetilde{O}(\min(d_t,\sqrt{m}))$. A lower bound is still open for the fundamental problem! (A nice problem for any students reading…?)

Directed Poincare Inequalities and $L_1$ Monotonicity Testing of Lipschitz Functions by Renato Ferreira Pinto Jr. (arXiv, ECCC). We all (or at least me) love monotonicity testing. The paper studies the continuous version, where $f:[0,1]^n \to \mathbb{R}$. To have a reasonable notion of distance and testers, we will require functions to be differentiable and $L$-Lipschitz. We measure distance using $\ell_1$ distance, so the distance between $f,g$ is $\int_{[0,1]^n} |f-g| d\nu$ (where we integrate over the uniform measure) [BRY14]. To access $f$, we are also provided a “derivative oracle”: given a point $x$ and a direction $\vec{v}$, we can query the directional derivative along $\vec{v}$ at $x$. One of the key insights in (discrete, Boolean) monotonicity testing is the connection to directed isoperimetric inequalities. These inequalities are directed versions of classic inequalities that relate boundaries (or gradients) to volumes. For $L$-Lipschitz functions, the classic Poincare inequality states that $E[\|\nabla f\|_1] \geq \textrm{var}(f)$, where $\nabla f$ is the gradient and $\textrm{var}(f)$ is (up to constant factors) the distance to being constant. This paper proves the directed version $E[\|\nabla^- f\|_1] \geq \varepsilon(f)$. Roughly speaker, $\nabla^-$ is the “negative gradient” ($min(\nabla f, 0)$) and $\varepsilon(f)$ is the $L_1$-distance to monotonicity. This result directly yields a $O(n/\varepsilon)$ query tester for monotonicity. The paper asks the tantalizing question: can we do better?

A Strong Composition Theorem for Junta Complexity and the Boosting of Property Testers by Guy Blanc, Caleb Koch, Carmen Strassle, and Li-Yang Tan (arXiv). It is best to go backwards in discussing this paper, starting from the implications and going to the core results. The problem at hand is the classic one of junta testing. Given $f:\{0,1\}^n \to \{0,1\}$, distinguish if $f$ only depends on $r$ variables (an $r$-junta) or is $\varepsilon$-far from having that property. This problem is well studied, has optimal (efficient) testers, and even has results in the distribution-free setting. In the latter setting, there exists some (unknown) distribution $\mathcal{D}$ on the domain according to which distance is defined. The tester can access to queries according to $\mathcal{D}$. The clarity of junta testing disappears when we consider tolerant testing: can we distinguish $f$ is (say) $1/4$ close to an $r$-junta from being $1/3$-far (where distances are measured according to $\mathcal{D}$)? A remarkable consequence of this paper is that this tolerant testing version is unlikely to have a $poly(n)$ time algorithm. (Note that the sample complexity might still be small.) The main tool is a composition theorem that gives reductions between low $\varepsilon$ testers and constant $\varepsilon$ testers. The details are intricate, but here’s the gist. Suppose we can construct composing functions $g$ such that the distance to “junta-ness” of $g \circ f$ is much larger than the distance of $f$. Then a tester (that can only deal with large $\varepsilon$) on $g \circ f$ can effectively property test $f$. (Obviously, I’m greatly oversimplifying and mischaracterizing. So go read the paper!)

The Full Landscape of Robust Mean Testing: Sharp Separations between Oblivious and Adaptive Contamination by Clément L. Canonne, Samuel B. Hopkins, Jerry Li, Allen Liu, and Shyam Narayanan (arXiv). Consider the fundamental hypothesis testing problem of Gaussian testing. We wish to distinguish the $d$-dimensional, zero-mean, unit covariance Gaussian $\mathcal{N}(0,I)$, from the “shifted” version $\mathcal{N}(\mu,I)$, where $\mu$ is a vector of length at least $latex \alpha$. Existing results give a simple algorithm using $\Theta(\sqrt{d}/\alpha^2)$ samples. Suppose there is an adversary who can corrupt samples. The adaptive adversary can look at the samples generated, and arbitrarily change an $\varepsilon$ fraction of entries. The weaker, oblivious adversary can also arbitrarily change an $\varepsilon$ fraction of entries, but cannot look at the samples generated. Can we perform Gaussian testing in this setting? A previous result gave an optimal bound for adaptive adversaries [N22]. This paper gives the optimal sample complexity bound for oblivious adversaries. The expression is somewhat complicated. But the punchline is that for many regimes of parameters $d, \alpha, \varepsilon$, the oblivious adversary is strictly weaker. Meaning, there is a testing algorithm (against an oblivious adversary) whose sample complexity is strictly less than the optimal bound against adaptive adversaries. This comes as a surprise, because in typical distribution testing settings, these adversaries are basically equivalent.

Uniformity Testing over Hypergrids with Subcube Conditioning by Xi Chen and Cassandra Marcussen (arXiv). A problem of generalizing from hypercube to hypergrids, an issue I have much obsessed over. Consider the problem of uniformity testing, where the domain is the hypergrid $[m]^n$. We wish to test if a distribution $\mathcal{D}$ over the hypergrid is uniform. Unlike the standard distribution testing setting, the domain size is exponentially large. To aid us, we can perform conditional sampling: we can select any sub-hypergrid, and get samples from $\mathcal{D}$ restricted to this sub-hypergrid. Previous work solved this problem for the hypercube domain (when $m=2$), by providing a tester with sample complexity $O(\sqrt{n}/\varepsilon^2)$ [CCKLW22]. The previous work did not work for any other hypergrid (even $m=3$). This paper gives the first solution for uniformity testing on hypergrids with a tester of sample complexity $O(poly(m)n/\varepsilon^2)$. The dependence on $m$ has to be at least $\sqrt{m}$, from existing results. One of the important tools is getting robust versions of classic isoperimetric inequalities over the hypergrid. An important open question is to determine the optimal dependence on $\mathcal{m}$.

Learning and Testing Latent-Tree Ising Models Efficiently by Davin Choo, Yuval Dagan, Constantinos Daskalakis, and Anthimos Vardis Kandiros (arXiv). Depending on your standpoint, one may or may not consider this a “sublinear time” paper (it does not show that testing $\ll$ learning). But given the interest in distribution testing, I think it’s germane to our blog. We have a high-dimensional distribution $\mathcal{D}$ over $(x_1, x_2, \ldots, x_n)$. Without any assumption, learning (or even identity testing) requires exponential in $n$ many samples. This paper assumes that $(x_1, x_2, \ldots, x_n)$ is generated from a latent-tree Ising model. There is a tree where each node is associated with a number (think $x_i$), and the joint distribution satisfies some dependencies according to the edges. We only observe the values of the leaves; hence, the term “latent” model. The main result shows that identity testing can be done in with polynomial in $n$ samples. The authors also prove that one can learn the distribution in (albeit larger) $poly(n)$ samples.

News for June 2023

Leave a reply

June was a somewhat quiet month for sublinear-time algorithms with only one paper coming to our attention.

Relaxed Local Correctability from Local Testing by Vinayak M. Kumar and Geoffrey Mon (arXiv). As the name indicates, the paper proposes a scheme to construct good relaxed locally correctable codes (RLCC) from good locally testable codes (LTC). A relaxed local decoder for an RLCC is a sublinear algorithm that, given query access to a corrupted word $w$ of length $n$ and an index $i \in [n]$ as input, either returns $c[i]$, where $c$ is the closest codeword to $w$, or returns a failure symbol $\perp$. In the regime of constant distance and constant rate, the prior best known lower and upper bounds on the query complexity of RLCCs is $\tilde{\Omega}(\sqrt{\log n})$ (Dall’Agnol, Gur, and Lachish; 2021) and $(\log n)^{O(\log \log \log n)}$ (Cohen and Yankovitz; 2022), respectively. Using their scheme to convert LTCs to RLCCs, this paper improves the state of the art by providing constant distance constant rate RLCCs with query complexity $\log^{O(1)} n$.

News for May 2023

Leave a reply

Apologies, dear readers for the delay in getting out this month’s post. This month we had ~~three papers — all on testing properties of graphs!~~ (EDIT: Updated later) four papers: three on property testing problems on graphs and the last one on testing convexity. One of the featured papers this month revisits the problem of testing the properties of directed graphs and comes back with a happy progress report. Alright, let’s dig in.

A Distributed Conductance Tester Without Global Information Collection by Tugkan Batu, Chhaya Trehan (arXiv) One of the classic questions in property testing considers the task of testing expansion. Here, you are interested in knowing whether the input graph has conductance at least $\alpha$ or it is far from having conductance at most $\alpha^2$. On a parallel track, we recall that thanks to the classic work of Parnas and Ron, we know there are connections between distributed algorithms and graph property testing. Meditating on these connections led to the emergence of distributed graph property testing as an active area of research. The featured paper considers the task of testing expansion in the distributed framework. The algorithms presented give a distributed implementation of multiple random walks from all vertices, and controls the congestion of the implementation. In particular, this leads to a $O(\log n/\alpha^2)$ round expansion-tester. In a first attempt at such an implementation, you might note that you need to track how well short random walks mix when started from a bunch of randomly chosen vertices. This seems to require maintaining some global state/global aggregate information. One of the important features of their algorithm (as mentioned in the title) does away with the need to maintain such global states. As a closing remark, I note the algorithm presented in this paper does not require the graph to be bounded degree.

Testing versus estimation of graph properties, revisited by Lior Gishboliner, Nick Kushnir, Asaf Shapira (arXiv) This paper considers the task of additively estimating the distance to a property $\mathcal{P}$ of a dense graph. Let me set up some context to view the results in the featured paper by summarizing what was known before. One of the early important results in this area is the original result of Fischer and Newman which shows that any testable graph property admits a $\pm \varepsilon$ distance approximation algorithm, which follows from the regularity lemma. However, the query complexity of the resulting algorithm is a Wowzer-style bound. Later, Hoppen et al., building upon tools from the classic work of Conlon and Fox, demonstrated that this bound of $twr(poly(1/\varepsilon))$ also holds for testable hereditary properties. Fiat and Ron introduced a decomposition machinery that allowed them to decompose a “complex” property into a collection of simpler properties. They used this decomposition to estimate distances to a vast collection of graph properties. They also asked if it was possible to find a transformation using which one can bypass the blowup in query complexity incurred by Fischer and Newman for some rich class of graph properties. The featured paper proves that for a hereditary graph property, you can in fact get algorithms where the query complexity for distance estimation only grows as $\exp(1/\varepsilon)$. Additionally, for every testable graph property, you can get distance estimators for that property whose query complexity only grows doubly exponentially in $1/\varepsilon$ (as opposed to the tower bound earlier).

An Optimal Separation between Two Property Testing Models for Bounded Degree Directed Graphs by Pan Peng, Yuyang Wang (arXiv) Unlike undirected graphs, directed graph properties have not received as much attention in the property testing community. In a classic work, Bender and Ron considered two natural models for studying property testing on directed graphs. The first model is one where you can only follow the “out” edges or the so-called unidirectional model. In the other model, you are allowed to follow both the “out” edges and the “in” edges incident on the vertex which is also called the bidirectional model. The featured paper considers directed graphs where the in-degree and the out-degrees are both bounded in both of the models mentioned above. The graph is presented to you in the adjacency list format (tailored for the model you consider). The paper shows that even for the fundamental task of subgraph-freeness, the directed world has some interesting behavior with respect to the two models above. Let me showcase one of the catchy results from the paper which illustrates this separation nicely. Take a connected directed graph $H$ with $k$ source components. The paper shows that for sufficiently small $\varepsilon$, testing whether a directed graph $G$ is $H$-free or $\varepsilon$-far from being $H$-free requires $\Omega(n^{1-1/k})$ unidirectional queries. On the other hand, in the bidirectional model, this can be done with a mere $O_{\varepsilon, d, k}(1)$ number of queries.

(EDIT: Added later)

Testing Convexity of Discrete Sets in High Dimensions by Hadley Black, Eric Blais, Nathaniel Harms (arXiv) As the title suggests, this paper explores the problem of testing convexity. To understand the notion of convexity explored in the paper, consider the following setup: You call a set $S \subseteq \{-1, 0, 1\}^n$ convex if there exists a convex set $S’ \subseteq \mathbb{R}^n$ such that $S = S’ \cap \{-1,0,1\}^n$. And you call a set $S \subseteq \{-1,0,1\}^n$ far from being convex if for every convex $T \subseteq \{-1,0,1\}^n$, you have $|S \oplus T| \geq \varepsilon 3^n$. The paper shows that when you are allowed membership queries, you can test convexity non-adaptively with a one-sided error by issuing $3^{O(\sqrt{n \log(1/\varepsilon)})}$ queries. Also, they prove an almost matching lower bound. Finally, with a lower bound of $3^{\Omega(n)}$ when confined to using sample-based testers, authors provably demonstrate that membership queries indeed buy you some undeniable power for testing convexity in high dimensions.

News for April 2023

Leave a reply

After an empty month, the engines of PTReview are roaring back to life with a fresh batch of papers for this month’s edition. In total, we have four papers that are sure to pique your interest. It’s been an action-packed month with a diverse range of topics covered in the featured papers. The first paper delves into new variations in distribution testing, while the second paper discusses optimal testers for Bayes Nets. The third paper focuses on optimal tolerant junta-testers, and the final paper presents a cool monotonicity tester over hypergrids.

Distribution Testing Under the Parity Trace by Renato Ferreira Pinto Jr and Nathaniel Harms (arXiv) The featured paper considers the classic setup in distribution testing with a twist. To explain the results, let me introduce the framework considered in this work. Consider distributions supported over $[n]$. Suppose I want to design distribution testers where instead of obtaining samples in the usual way, I first obtain an ordered list of samples, I store it in a sequence $S$ and only the least significant bit of each element of $S$ is made available to your distribution testing algorithm. This is called a parity trace. Note that with this access model, suddenly a bunch of standard tasks become non-trivial. To take an example from the paper, you can no longer distinguish between the uniform distribution supported on $\{1,2, \ldots, n\}$ and the uniform distribution supported on $\{n+1, n+2, \ldots 2n\}$ in this access model. Nevertheless, the paper shows, you can still obtain testers which require only sublinear number of accesses for testing uniformity in this model.

Another cool feature of this big paper is an unexpected foray into the trace reconstruction literature from a property testing viewpoint. I wish I understood the formal connection better to describe a bit more about it. But for now, let me leave that as an appetizer which (hopefully) encourages you to take a look at the paper.

New Lower Bounds for Adaptive Tolerant Junta Testing by Xi Chen and Shyamal Patel (arXiv) If you are a regular here on the PTReview corner, you are probably no stranger to the tolerant junta testing problem. As a corollary to the main result, the paper in question proves a lower bound of $k^{\Omega(\log k)}$ queries on any adaptive algorithm that wishes to test whether the input function $f$ is $\varepsilon_1$ close to being a $k$-junta or whether it is $\varepsilon_2$-far $\left(\text{where } \varepsilon_2 \geq \varepsilon_1 + \displaystyle\frac{1}{poly(k)}\right)$. Indeed, another remarkable achievement of the paper is that it achieves a superpolynomial separation between non-tolerant versions and the tolerant versions of any natural property of boolean functions under the adaptive setting.

Near-Optimal Degree Testing for Bayes Nets by Vipul Arora, Arnab Bhattacharyya, Clément L. Canonne (our own!) and Joy Qiping Yang (arXiv) This paper continues a line of investigation which a subset of the authors were a part of (which we also covered in our News for April 2022). Let us remind ourselves of the setup. You are given a probability distribution $\mathcal{P}$ supported over the Boolean Hypercube. Suppose $\mathcal{P}$ can be generated by a Bayseian Network. You may think of a Bayesian Network as a DAG where each vertex tosses a coin (with different heads probabilities). The question seeks to test whether $\mathcal{P}$ admits a sparse Bayesian Net (in the sense of each vertex having small in-degree). The main result of the paper gives an algorithm for this task which requires $\Theta(2^{n/2}/\varepsilon^2)$ samples. The paper also proves an almost matching lower bound.

A $d^{1/2+o(1)}$ Monotonicity Tester for Boolean Functions on $d$-Dimensional Hypergrids by Hadley Black, Deeparnab Chakrabarty and C. Seshadhri (again, our own!) (arXiv) Again, the problem of monotonicity testing of boolean functions hardly requires any detailing to the regular readers of PTReview. As you can see in our News from November 2022 there were two concurrent papers mulling over this problem over the hypergrid domain. The featured paper is the result of a dedicated pursuit by the authors and the key result is what the title says. Namely, you can test monotonicity with a number of (non-adaptive, one-sided) queries that has no dependence on $n$.

News for March 2023

Leave a reply

I never thought this day would come.

For the first time in PTReview history, there is no paper to report. Nada. Zilch.

The calm before the storm…?

News for February 2023

Leave a reply

Despite being a short month, February 2023 has witnessed a significant amount of activity under the sublinear “regime”. Let us know if we have missed anything!

Dynamic $(1 + \epsilon)$-Approximate Matching Size in Truly Sublinear Update Time by Sayan Bhattacharya, Peter Kiss, and Thatchaphol Saranurak (arXiv). This work throws light on connections between the dynamic and query models of computation and uses them for making advances on approximating the size of a maximum cardinality matching (MCM) in a general graph. In particular, as the main technical ingredient in obtaining an improved dynamic algorithm for maintaining an approximation to the size of MCM, the authors provide a $\pm \epsilon n$ approximation algorithm for estimating the size of MCM in a general $n$-vertex graph by making $n^{2 – \Omega_{\epsilon}(1)}$ adjacency queries. Prior to this result, the state of the art (Behnezhad, Roghani & Rubinstein; STOC’23) was a $n^{2 – \Omega(1)}$-query algorithm for the same problem with a multiplicative approximation guarantee of $1.5$ and an additive guarantee of $o(n)$.

Uniformity Testing over Hypergrids with Subcube Conditioning by Xi Chen and Cassandra Marcussen (arXiv). As the name indicates, the paper studies the fundamental problem of testing uniformity of distributions supported over hypergrids $[m]^n$. The tester that they present make $O(\text{poly}(m)\sqrt{n}/\epsilon^2)$ queries to a conditional subcube sampling oracle, which, when given a subcube of $[m]^n$, returns a point sampled from the distribution conditioned on the point belonging to the subcube. The result is a generalization of the uniformity tester for distributions supported over the $n$-dimensional hypercube (Canonne, Chen, Kamath, Levi and Waingarten; SODA ’21).

Easy Testability for Posets by Panna Timea Fekete and Gabor Kun (arXiv). This paper deals with testing properties of directed graphs in the adjacency matrix model. The main characters of the story are posets, or directed acyclic graphs (DAGs) that are transitively closed. Given a family $\mathcal{F}$ of finite posets, let $\mathcal{P}_\mathcal{F}$ denote the set of all finite posets that do not contain any element of $\mathcal{F}$ as a subposet. The main result of the paper is an $\epsilon$-tester with query complexity $\text{poly}(1/\epsilon)$ for $\mathcal{P}_\mathcal{F}$. The authors obtain this result by proving a removal lemma for posets. The result is placed in the larger context of understanding what properties of graphs can be tested with query complexity that has a polynomial dependence on $1/\epsilon$ in the adjacency matrix model.

Compressibility-Aware Quantum Algorithms on Strings by Daniel Gibney and Sharma V. Thankachan (arXiv). Lastly, we have a paper on quantum string algorithms that run in sublinear time. In short, the authors present quantum algorithms with optimal running times for computing the Lempel-Ziv (LZ) encoding and Burrows Wheeler Transform (BWT) of highly compressible strings. A main consequence of these results is a faster quantum algorithm for computing the longest common subsequence (LCS) of two strings when the concatenation of the strings is highly compressible. It is to be noted that sublinear-time algorithms do not exist for these problems in the classical model of computation. More details follow.

Factoring a string into disjoint substrings (factors) in an specific manner is the main step in the LZ compression algorithm. The smaller the number of factors, the more compressible the string is. This paper gives a quantum algorithm for the problem of computing the LZ factorization of a string in time $\tilde{O}(\sqrt{nz})$, where $z$ is the number of factors in the string. They also show that their algorithm is optimal. Using this algorithm, they obtain a fast algorithm for computing the BWT of an input string, as well as an algorithm running in time $\tilde{O}(\sqrt{nz})$ to compute the LCS of two strings, where $n$ is the length and $z$ is the number of factors in the concatenation of the two strings. When $z$ is $o(n^{1/3})$, this algorithm gives an improvement over the previous best quantum algorithm running in time $\tilde{O}(n^{2/3})$.

News for January 2023

Leave a reply

Welcome to the first batch of 2023. Looks like it’s going to be a good year, with 5 property testing or related papers (that I could find) already:

An efficient asymmetric removal lemma and its limitations, by Lior Gishboliner, Asaf Shapira, and Yuval Wigderson (arXiv). One of the jewels of graph property testing is the triangle removel lemma (and its many generalizations and variants), which relates the number of triangles in a dense graph to its distance from being triangle-free: namely, any graph $\varepsilon$-far from being triangle-free must have $\delta(\varepsilon)n^3$ triangles, where the density $\delta(\varepsilon)$ only depends on the distance (and not the size of the graph!). This immediately leads to constant-query testers (and even “proximity-oblivious” testers) for triangle-freness (and, more generally, pattern-freeness). Unfortunately, the dependence on $\varepsilon$ is quite bad, essentially a tower-type function (and it is known no polynomial bound is possible). This work attempts to bypass this impossibility result by proving an asymmetric removal lemma, or the form “any graph $\varepsilon$-far from being triangle-free must have $\mathrm{poly}(\varepsilon)n^5$ 5-cycles” (and generalizations beyond triangles). This seems like a very interesting direction, with potential applications to property testing, and (who knows!) efficient testers for many properties hithertho only known to be (practically) testable for constant $\varepsilon$.

Related (more removal lemmata!), a different work on this topic:

The Minimum Degree Removal Lemma Thresholds, by Lior Gishboliner, Zhihan Jin, and Benny Sudakov (arXiv). As mentioned above, removal lemmata relate the distance $\varepsilon$ from being $H$-free (for a given subgraph $H$) to the density $\delta(\varepsilon)$ of occurrences of $H$ in the graph. Sadly, it is known that this density will be superpolynomial (in the distance) unless $H$ is bipartite… which, while technically still yielding testing algorithms (query complexity independent of the size of the graph!), yields very inefficient testers (very bad dependence on $\varepsilon$!). This paper studies one direction to bypass this sad state of affairs: under which additional assumption on the underlying graph (specifically, bounds on its minimum degree) can we obtain a polynomial bound on $\delta(\varepsilon)$? And a linear bound? The authors give a tight degree condition for $\delta(\varepsilon)$ to be polynomial when $H$ is an odd cycle, and their results for the linear-dependence case establishes a separation between the two. Put differently: obtaining polynomial-query testers via removal lemmas is possible for a strictly larger class of graphs than linear-query ones!

And now, for something completely different: testing binary matrices!

A Note on Property Testing of the Binary Rank, by Nader H. Bshouty (arXiv). The binary rank of a matrix $M\in\{0,1\}^{n\times m}$ is the smallest $d$ such that there exist $A\in\{0,1\}^{n\times d}$ and $B\in\{0,1\}^{d\times m}$ with $M=AB$; this can also be seen as the minimal number of bipartite cliques needed to partition the edges of a bipartite graph represented by $M$. One can also define the relaxed notion of $s$-binary rank, if one enforces that each edge of the bipartite graph is covered by at most $s$ bipartite cliques. The property testing question is then to decide, given inputs $s,d,\varepsilon$, if $M$ has $s$-binary rank at most $d$, or is $\varepsilon$-far from it. The main result of this note is to give one-sided testers (one adaptive, and one non-adaptive) for $s$-binary rank with query complexity $\tilde{O}(2^d)$ (for constant $s$), improving on the previous algorithms by a factor $2^d$.

Into the quantum realm!

Testing quantum satisfiability, by Ashley Montanaro, Changpeng Shao, and Dominic Verdon (arXiv). Classically, one can study the property version of $k$-SAT, which asks to decide whether a given instance is satisfiable or far from being so. And people (namely, Alon and Shapira, in 2003) did! Quantumly, one can define an analogue of $k$-SAT, “quantum $k$-SAT”: and people (namely, Bravyi, in 2011) did! But what about property testing of quantum $k$-SAT? Well, now, people (namely, the authors of this paper) just did! Showing (Corollary 1.10) that one can efficiently distinguish between (1) the quantum $k$-SAT is satisfiable, and (2) it is far from satisfiable by a product state. This, effectively, extends the result of Alon–Shapira’03 to the quantum realm.

And to conclude, a foray into reinforcement learning via distribution testing…

Lower Bounds for Learning in Revealing POMDPs, by Fan Chen, Huan Wang, Caiming Xiong, Song Mei, and Yu Bai (arXiv). Alright, I’m even more out of my depth than usual here, so I’ll just copy (part of) the abstract, for fear I don’t do justice to the authors’ work: “This paper studies the fundamental limits of reinforcement learning (RL) in the challenging partially observable setting. While it is well-established that learning in Partially Observable Markov Decision Processes (POMDPs) requires exponentially many samples in the worst case, a surge of recent work shows that polynomial sample complexities are achievable under the revealing condition — A natural condition that requires the observables to reveal some information about the unobserved latent states. However, the fundamental limits for learning in revealing POMDPs are much less understood, with existing lower bounds being rather preliminary and having substantial gaps from the current best upper bounds. We establish strong PAC and regret lower bounds for learning in revealing POMDPs. […] Technically, our hard instance construction adapts techniques in distribution testing, which is new to the RL literature and may be of independent interest.” (Emphasis mine)

News for December 2022

Leave a reply

Happy New Year! Apologies for the late post; I was stuck for far too long in holiday mode. We haven’t missed much. There were only two papers in the month of December, since, I’m assuming, many of us were entering holiday mode.

Testing in the bounded-degree graph model with degree bound two by Oded Goldreich and Laliv Tauber (ECCC). One of great, central results in graph property testing is that all monotone properties are testable (with query complexity independent on graph size) on dense graphs. The sparse graph universe is far, far, more complicated and interesting. Even for graphs with degree bound 3, natural graph properties can have anywhere from constant to linear (in $n$) query complexity. This note shows that when considering graphs with degree bound at most 2, the landscape is quite plain. The paper shows that all properties are testable in $poly(\varepsilon^{-1})$. Any graph with degree at most 2 is a collection of paths and cycles. In $poly(\varepsilon^{-1})$ queries, one can approximately learn the graph. (After which the testing problem is trivial.) The paper gives a simple $O(\varepsilon^{-4})$ query algorithm, which is improved to the nearly optimal $\widetilde{O}(\varepsilon^{-2})$ bound.

On the power of nonstandard quantum oracles by Roozbeh Bassirian, Bill Fefferman, and Kunal Marwaha (arXiv). This paper is on the power of oracles in quantum computation. An important question in quantum complexity theory is whether $QCMA$ is a strict subset of $QMA$. The former consists of languages decided by Merlin-Arthur quantum protocols with a classical witness (the string that Merlin provides). The latter class allows Merlin to be a quantum witness. This paper shows a property testing problem where such a separation is shown. The property is essentially graph non-expansion (does there exist a set of low conductance?). The input graph should be thought of as an even (bounded) degree with “exponentially many” vertices. So it has $N = 2^n$ vertices. The graph is represented through a special “graph-coded” function. The paper shows that there is a $poly(n)$-sized quantum witness for non-expansion that can be verified in $poly(n)$ time, which includes queries to the graph-coded function. On the other hand, there is no classic $poly(n)$-sized witness that can be verified in $poly(n)$ queries to the graph-coded function. (Informally speaking, any $QCMA$ protocol needs exponentially many queries to the graph.)

A Pedagogical reference to kick off the New Year

1 Reply

Dear Readers,

Our own Clément Canonne has written a beautiful survey which is now available in FnT book format from now publishers. This appears to be a very promising read — especially for the Distribution Testers among you. Today’s post is a mere advertisement for this beautiful survey/book which is clearly the result of a dedicated pursuit.

Let me now dig into this survey a teeny tiny bit. One among the many cool features of this survey is that it uses one central example (testing goodness-of-fit) to give a unified treatment to the diverse tools and techniques used in distribution testing. Another plus for me is the historical notes section that accompanies every chapter. In particular, I really liked jumping into the informative history section at the end of Chapter 2 which has an almost story like feel to it. If the above points do not catch your fancy, then please try opening the survey. You will be hardpressed to find a book that is typeset in such an aesthetically pleasing way with colored fonts to emphasize various parameters in several intricate proofs. Happy Reading!

News for November 2022

1 Reply

All the best to everyone for a Happy 2023. The holiday season is ripe with lots of papers to engage our readers. So, we have nine papers and we hope some of those will catch your fancy. As a new year treat, we also feature Gilmer’s constant lower bound on the union-closed sets problem of Frankl. On we go.

Sublinear Time Algorithms and Complexity of Approximate Maximum Matching by Soheil Behnezhad, Mohammad Roghani, Aviad Rubinstein (arXiv) This paper makes significantly advances our understanding of the maximum matching problem in the sublinear regime. Your goal is to estimate the size of the maximum matching and you may assume that you have query access to the adjacency list of your graph. Our posts from Dec 2021 and June 2022 reported some impressive progress on this problem. The upshot from these works essentially said that you can beat greedy matching and obtain a $\frac{1}{2} + \Omega(1)$ approximate maximum matching in sublinear time. Let me first go over the algorithmic results from the current paper. The paper shows the following two algorithmic results:

(1) An algorithm that runs in time $n^{2 – \Omega_{\varepsilon}(1)}$ and returns a $2/3 – \varepsilon$ approximation to maximum matching in general graphs, and

(2) An algorithm that runs in time $n^{2 – \Omega_{\varepsilon}(1)}$ and returns a $2/3 + \varepsilon$ approximation to maximum matching size in bipartite graphs.

The question remained — can we show a lower bound that grows superlinearly with $n$. The current work achieves this and shows that even on bipartite graphs, you must make at least $n^{1.2 – o(1)}$ queries to the adjacency list to get a better than $2/3 + \Omega(1)$ approximation. (An aside: A concurrent work by Bhattacharya-Kiss-Saranurak from December also obtains similar algorithmic results for approximating the maximum matching size in general graphs).

Directed Isoperimetric Theorems for Boolean Functions on the Hypergrid and an $\widetilde{O}(n \sqrt d)$ Monotonicity Tester by Hadley Black, Deeparnab Chakrabarty, C. Seshadhri (arXiv) Boolean Monotonicity testing is as classic as classic gets in property testing. Encouraged by the success of isoperimetric theorems over the hypercube domain and the monotonicity testers powered by these isoperimetries (over the hypercube), one may wish to obtain efficient monotonicity testers for the hypergrid $[n]^d$. Indeed, the same gang of authors as above showed in a previous work that a Margulis style directed isoperimetry can be extended from the lowly hypercube to the hypergrid. This resulted in a tester with $\widetilde{O}(d^{5/6})$ queries. The more intricate task of proving a directed Talagrand style isoperimetry that underlies the Khot-Minzer-Safra breakthrough was a challenge. Was. The featured work extends this isoperimetry from the hypercube to the hypergrid and this gives a tester with query complexity $\widetilde{O}(n \sqrt d)$ which is an improvement over the $d^{5/6}$ bound for domains where $n$ is (say) some small constant. But as they say, when it rains, it pours. This brings us to a concurrent paper with the same result.

Improved Monotonicity Testers via Hypercube Embeddings by Mark Braverman, Subhash Khot, Guy Kindler, Dor Minzer (arXiv) Similar to the paper above, this paper also obtains monotonicity testers over the hypergrid domain, $[n]^d$, with $\widetilde{O}(n^3 \sqrt d)$ queries. This paper also presents monotonicity testers over the standard hypercube domain — $\{0,1\}^d$ in the $p$-biased setting. In particular, their tester issues $\widetilde{O}(\sqrt d)$ queries to successfully test monotonicity on the $p$-biased cube. Coolly enough, this paper also proves directed Talagrand style isoperimetric inequalities both over the hypergrid and the $p$-biased hypercube domains.

Toeplitz Low-Rank Approximation with Sublinear Query Complexity by Michael Kapralov, Hannah Lawrence, Mikhail Makarov, Cameron Musco, Kshiteej Sheth (arXiv) Another intriguing paper for the holiday month. So, take a Toeplitz matrix. Did you know that any psd Toeplitz matrix admits a (near-optimal in the Frobenius norm) low-rank approximation which is itself Toeplitz? This is a remarkable statement. The featured paper proves this result and uses it to get more algorithmic mileage. In particular, suppose you are given a $d \times d$ Toeplitz matrix $T$. Armed with the techniques from the paper you get algorithms that return a Toeplitz matrix $\widetilde{T}$ with rank slightly bigger than $rank(T)$ which is a very good approximation to $T$ in the Frobenius norm. Moreover, the algorithm only issues a number of queries sublinear in the size of $T$.

Sampling an Edge in Sublinear Time Exactly and Optimally by Talya Eden, Shyam Narayanan and Jakub Tětek (arXiv) Regular readers of PTReview are no strangers to the fundamental task of sampling a random edge from a graph which you can access via query access to its vertices. Of course, you don’t have direct access to the edges of this graph. This paper considers the task of sampling a truly uniform edge from the graph $G = (V,E)$ with $|V| = n, |E| = m$. In STOC 22, Tětek and Thorup presented an algorithm for a relaxation of this problem where you want an $\varepsilon$-approximately unifrom edge. This algorithm runs in time $O\left(\frac{n}{\sqrt{m}} \cdot \log(1/\varepsilon) \right)$. The featured paper presents an algorithm that samples an honest to goodness uniform edge in expected time $O(n/\sqrt{m})$. This closes the problem as we already know a matching lower bound. Indeed, just consider a graph with $O(\sqrt m)$ vertices which induce a clique and all the remaining components are singletons. You need to sample at least $\Omega(n/\sqrt m)$ vertices before you see any edge.

Support Size Estimation: The Power of Conditioning by Diptarka Chakraborty, Gunjan Kumar, Kuldeep S. Meel (arXiv) This work considers the classic problem of support size estimation with a slight twist. You are given access to a stronger (conditioning based) sampling oracle. Let me highlight one of the results from this paper. So, you are given a distribution $D$ where $supp(D) \subseteq [n]$. You want to obtain an estimate to $supp(D)$ that lies within $supp(D) \pm \varepsilon n$ with high probability. Suppose you are also given access to the following sampling oracle. You may choose any subset $S \subseteq [n]$ and you may request a sample $x \sim D\vert_S$. An element $x \in S$ is returned with probability $D\vert_S(x) = D(x)/D(S)$ (for simplicity of this post, let us assume $D(S) > 0$). In addition, this oracle also reveals for you the value $D(x)$. The paper shows that the algorithmic task of obtaining a high probability estimate to the support size (to within $\pm \varepsilon n$) with this sampling oracle admits a lower bound of $\Omega(\log (\log n)$ calls to the sampling oracle.

Computing (1+epsilon)-Approximate Degeneracy in Sublinear Time by Valerie King, Alex Thomo, Quinton Yong (arXiv) Degeneracy is one of the important graph parameters which is relevant to several problems in algorithmic graph theory. A graph $G = (V,E)$ is $\delta$-degenerate if all induced subgraphs of $G$ contain a vertex with degree at most $\delta$. The featured paper presents algorithms for a $(1 + \varepsilon)$-approximation to degeneracy of $G$ where you are given access to $G$ via its adjacency list.

Learning and Testing Latent-Tree Ising Models Efficiently by Davin Choo, Yuval Dagan, Constantinos Daskalakis, Anthimos Vardis Kandiros (arXiv) Ising models are emerging as a rich and fertile frontier for Property Testing and Learning Theory researchers (at least to the uninitiated ones like me). This paper considers latent-tree ising models. These are ising models that can only be observed at their leaf nodes. One of the results in this paper gives an algorithm for testing whether the leaf distributions attached to two latent-tree ising models are close or far in the TV distance.

A constant lower bound for the union-closed sets conjecture by Justin Gilmer (arXiv) The union-closed sets conjecture of Frankl states that for any union closed set system $\mathcal{F} \subseteq 2^{[n]}$, it holds that there is a mysterious element $i \in [n]$ that shows up in at least $c = 1/2$ of the sets in $\mathcal{F}$. Gilmer took a first swipe on this problem and gave a constant lower bound of $c = 0.01$. This has already been improved by at least four different groups to $\frac{3-\sqrt{5}}{2}$, a bound which is the limit of Gilmer’s method (which takes all of only 9 pages!).

The key lemma Gilmer proves is the following. Suppose you sample two sets: $A, B \sim \mathcal{D}_n$ (iid) from some distribution $\mathcal{D}_n$ over the subsets of $[n]$. Suppose for every index $i \in [n]$, it holds that the probability that the element $i$ shows up in the random set $A$ is at most $0.01$. Then you have $H(A \cup B) \geq 1.26 H(A)$. This is all you need to finish Gilmer’s proof (of $c = 0.01$). The remaining argument is as follows. Suppose, by the way of contradiction, that no element shows up in at least $0.01$ fraction of sets in the union closed family $\mathcal{F}$. An application of the key lemma would then give $H(A \cup B) > H(A)$ which is a contradiction if $A,B$ are chosen uniformly from $\mathcal{F}$. The proof of the key lemma is also fairly slick and uses pretty simple information theoretic tools.

Property Testing Review

The latest in property testing and sublinear time algorithms

News for July 2023

News for June 2023

News for May 2023

News for April 2023

News for March 2023

News for February 2023

News for January 2023

News for December 2022

A Pedagogical reference to kick off the New Year

News for November 2022