# News for January 2021

The first month of 2021 has brought with it 5 papers, covering graph testing, Boolean function testing, and distribution testing — as well as database theory. Let’s dive into it.

Random walks and forbidden minors III: $$\mathrm{poly}(d/\varepsilon)$$-time partition oracles for minor-free graph classes, by Akash Kumar, C. Seshadhri, and Andrew Stolman (arXiv). Minor-closed bounded-degree graphs have a very nice property: denoting by $$d$$ the degree bound and $$n$$ the number of edges, it is always possible to partition any such graph into components of constant size, $$O_\varepsilon(1)$$, just by removing a linear number of edges, merely $$\varepsilon dn$$ ($$\varepsilon$$ being a small parameter). This is a crucial result in many graph property testing algorithms, those which rely on something called a “partition oracle”: loosely speaking, a routine which makes “few” queries to the graph, and is able to indicate which component of an underlying partition any given vertex belongs to. But what is “few” here? The first partition oracles made $$d^{\mathrm{poly}(d,1/\varepsilon)}$$ queries to the graph to answer any such request. This later got significantly improved to $$d^{\mathrm{log}(d/\varepsilon)}$$. Using spectral graph theory techniques previously developed by the authors (hence the “III” in the title), this work settles the question, achieving partition oracles which as good as it gets: making only $$\mathrm{poly}(d,1/\varepsilon)$$ queries to the graph! This in turns has immediate consequences for graph property testing, which the paper details.

And since we are on the topic of oracles… more exciting news on that front:

Spectral Clustering Oracles in Sublinear Time, by Grzegorz Gluch, Michael Kapralov, Silvio Lattanzi, Aida Mousavifar, Christian Sohler (arXiv). Given a graph $$G$$ and an integer $$k$$, one often want to partition the graph into components $$C_1,C_2,\dots,C_k$$, such that each $$C_i$$ is well-connected and has few edges to the other $$C_j$$’s. But can we get ultra efficient algorithms for such spectral clusterings? Specifically, can we design oracles for them: sublinear-time algorithms which provide implicit access to an underlying “good” spectral clustering $$\hat{C}_1,\hat{C}_2,\dots,\hat{C}_k$$, by returning on any query vertex $$v$$ the index of the cluster $$\hat{C}_i$$ to which $$v$$ belongs? This paper introduces the question, and answers it in the affirmative: in more detail, it provides a spectral clustering oracle which, for any $$\varepsilon>0$$, has preprocessing time $$2^{\mathrm{poly}(k/\varepsilon)}n^{1/2+O(\varepsilon)}$$, query time $$n^{1/2+O(\varepsilon)}$$, and space $$n^{1/2+O(\varepsilon)}$$; and provides access to a clustering with relative error $$O(\varepsilon\log k)$$ per cluster. The paper also allows tradeoffs between query time and space, and discusses applications to the Local Computation Algorithms (LCA) model.

Next stop, distribution testing…

The Sample Complexity of Robust Covariance Testing, by Ilias Diakonikolas and Daniel M. Kane (arXiv). Suppose you have i.i.d. samples from some high-dimensional Gaussian $$\mathcal{N}(0,\Sigma)$$ in $$d$$ dimensions, and want to test whether the unknown covariance matrix $$\Sigma$$ is the identity, versus $$\varepsilon$$-far from it (in Frobenius norm). Good news: we know how to do that, and $$\Theta(d/\varepsilon^2)$$ samples are necessary and sufficient. (To learn $$\Sigma$$, that’d be $$\Theta(d^2/\varepsilon^2)$$.) Bad news: you don’t have i.i.d. samples from some high-dimensional Gaussian $$\mathcal{N}(0,\Sigma)$$; what you have is i.i.d. samples from a noisy version of it, $$(1-\alpha)\mathcal{N}(0,\Sigma) + \alpha B$$, where $$B$$ is an arbitrary “bad” distribution (not necessarily Gaussian itself). You still want to test whether the covariance $$\Sigma$$ is the identity, but now you have that extra $$\alpha$$ fraction of noisy samples, and you need to do that testing robustly… The good news is, you can still do that by learning the covariance matrix $$\Sigma$$, robustly, with $$O(d^2/\varepsilon^2)$$ samples. The bad news is the main result of this paper: that’s also the best you can do. That is, $$\Omega(d^2)$$ samples are necessary: if you have to be robust to noise, testing is no longer easier than learning….

Onto Boolean functions!

Junta Distance Approximation with Sub-Exponential Queries, by Vishnu Iyer, Avishay Tal, and Michael Whitmeyer (ECCC). If you follow this blog, you may have seen over the past couple years a flurry of results about tolerant junta testing: “given query access to some Boolean function $$f\colon\{-1,1\}^n\to\{-1,1\}$$, how close is $$f$$ to only depending on $$k \ll n$$ of its variables?”
This paper contains several results on this problem, including an improved bicriteria tolerant testing algorithm: an efficient algorithm to distinguish between $$\varepsilon$$-close to $$k$$-junta and $$1.1\varepsilon$$-far from $$k’$$-junta making $$\mathrm{poly}(k,1/\varepsilon)$$ queries (and $$k’ = O(k/\varepsilon^2)$$). But the main result of the paper, and the one giving it its name, is for the non-relaxed version where $$k’=k$$: while all previous works had a query complexity $$2^{O(k)}$$, here the authors show how to break that exponential barrier, giving a fully tolerant testing algorithm with query complexity $$2^{\tilde{O}(\sqrt{k})}$$!

And finally, a foray into database theory:

Towards Approximate Query Enumeration with Sublinear Preprocessing Time, by Isolde Adler and Polly Fahey (arXiv). In this paper, the authors are concerned with the task of (approximate) query enumeration on databases, aiming for ultra efficient (i.e., sublinear-time) algorithms. Leveraging techniques from property testing (specifically, in the bounded-degree graph model), they show the following:
On input databases of bounded degree and bounded tree-width, every (fixed) first-order definable query can be enumerated approximately in time linear in the output size, after only a sublinear-time preprocessing phase.

That’s all for this month! If you noticed a paper we missed, please let us know in the comments.

# News for October 2020

Sorry for the delay in writing this monthly digest: we hope you didn’t spend the week frantically refreshing your browsers! We found four papers this month: let’s dive in.

A Structural Theorem for Local Algorithms with Applications to Coding, Testing, and Privacy, by Marcel Dall’Agnol, Tom Gur, and Oded Lachish (ECCC). This paper introduces the notion of “robust (local) algorithm,” an abstraction which encompasses may types of algorithms: e.g., property testing algorithms, locally decodable codes, etc. The main result of this work is that any (possibly adaptive) $$q$$-query robust local algorithm can be transformed into a non-adaptive, sample-based one making $$n^{1-1/(q^2\log q)}$$ queries (where $$n$$ is the input size). Here, “sample-based” means that the algorithm doesn’t get to make arbitrary queries, but just gets to observe randomly chosen coordinates of the input. As application of this result, the authors derive new upper and lower bounds for several of the types of local algorithms mentioned above, resolving open questions from previous works.

Testing Tail Weight of a Distribution Via Hazard Rate, by Maryam Aliakbarpour, Amartya Shankha Biswas, Kavya Ravichandran, Ronitt Rubinfeld (arXiv). The authors consider, from the point of view of distribution testing, the question of deciding whether data is heavy-tailed: as would, for instance, data following a power law. The paper first sets out to formalize the question, and discusses various possible definitional choices before setting on one; after which it provides a test, and analyzes its sample complexity as a function of various parameters (such as smoothness of the unknown distribution). The authors finally back these results with empirical evaluation of their algorithm.

On Testing of Samplers, by Kuldeep S. Meel, Yash Pote, Sourav Chakraborty (arXiv). Suppose you are given a (known) distribution $$p$$ over some domain $$\Omega$$, and want to sample from it conditioned on some predicate $$\varphi$$. Now someone comes to you with an algorithm which does exactly that, efficiently, and cheaply: great! But can you easily check that you’re not getting fooled, and that this sampler actually does what it claims? This paper provides this: an algorithm which accepts if the sampled distribution is $$\varepsilon$$-close to what it should (roughly, in a multiplicative, KL divergence sense), and rejects if it’s $$\varepsilon’$$-far (in total variation distance). The number of samples required is polynomial in $$\varepsilon’-\varepsilon$$, and depends on some characteristic of $$p$$ and $$\varphi$$, the “tilt” (ratio between max and min probability of the conditional distribution).

Finally, an omission from late September:
Sample optimal Quantum identity testing via Pauli Measurements, by Nengkun Yu (arXiv). The abstract is concise and clear enough to speak for itself: “In this paper, we show that $$\Theta(\textrm{poly}(n)\cdot\frac{4^n}{\varepsilon^2})$$ is the sample complexity of testing whether two $$n$$-qubit quantum states $$\rho$$ and $$\sigma$$ are identical or $$\varepsilon$$-far in trace distance using two-outcome Pauli measurements.”

Please let us know if you spotted a paper we missed!

# Welcome Akash Kumar!

Let’s welcome our latest editor, Akash Kumar. Akash will be taking the place of Gautam Kamath, who has decided to pass the torch on. Let’s also thank Gautam for all the help with PTReview.

# News for January 2017

2017 starts off rather slow for property testing. Though, we have an intriguing paper to report – an experimental analysis of a classic sublinear algorithm.

Evaluating a Sublinear-time Algorithm for the Minimum Spanning Tree Weight Problem, by Gabriele Santi and Leonardo De Laurentiis (arXiv). The Chazelle-Rubinfeld-Trevisan Minimum Spanning Tree algorithm is a classic in sublinear algorithms. This algorithms provides a $$(1+\varepsilon)$$ approximation to the MST in time independent of the number of vertices (although it does depend on the average degree). But how this compare with Prim’s algorithm on real instances, in a real (not theoretical) computer? This intriguing paper does a detailed experimental comparison. Having done experimental graph algorithms myself, I can attest to the various challenges: how to choose a test set of graphs? How to set error parameters? Can data structure optimization on the coding side beat asymptotic improvements? This paper does a series of experiments on synthetic graph generators (such as Erdős-Rényi, Barabási-Albert, Watts-Strogatz models). They do validate the basic CRT algorithm at scale, by showing that it is faster than Prim for graphs with more than a million edges. Their experiments suggest that the sublinear-time algorithm gives little benefits when $$\varepsilon \leq 0.2$$. The paper has many experiments for a variety of settings, and the authors do a comprehensive study of the various parameters. I’d definitely recommend to anyone interested in exploring how property testing might influence algorithms in the real world.

# Welcome Clément and Gautam!

Dear readers, please welcome the new additions to our moderator group, Clément Canonne and Gautam Kamath! It’s great that more property testing researchers are helping take PTReview further. PTReview is becoming really popular these days! (Or so we hope…)

# News for December 2015

Greetings from the exciting Workshop on Sublinear Algorithms at John Hopkins University! As this workshop and the upcoming SODA and ITCS conferences get 2016 to a roaring start, let us take one last look back at property testing news from last year. In December, one work in particular caught my eye:

Non-Local Probes Do Not Help with Graph Problems by Mika Göös, Juho Hirvonen, Reut Levi, Moti Medina, and Jukka Suomela (arXiv). A generalization of property testing that has recently seen some fascinating developments in the past few years is the local computation algorithms (LCA) model, in which the algorithm is asked to answer some local query (such as “what is the color of this vertex in some fixed, legal coloring of the graph?”) in sublinear-time. This paper relates the LCA model to message-passing models and in the process gives a powerful new tool for establishing lower bounds in LCAs.