Category Archives: Monthly digest

News for August 2021

This month saw four papers. One on group testing, another on distribution testing, yet another which makes progress on testing problems on decision trees and the last one on graph property testing. Without further ado, let’s dive in.

Group Testing with Non-identical Infection Probabilities by Mustafa Doger, Sennur Ulukus (arXiv) Consider the classic group testing problem. Here the setup is the following. You are given a bunch of individuals from a population \(\mathcal{P}\). You have an infection vector which records the infection status of each individual in the population where the \(i\)-th individual is infected with probability \(p_i\). You want to recover all the infected individuals. You are allowed to group individuals together and you can test the entire group in a single shot. If the group tests negative, you are happy all the tested individuals are off the hook. Otherwise, if the group tests positive, you need more tests for further classification. This paper proposes a greedy way to build pools of individuals you would test. The pools are built adaptively: as in future pools are built using the knowledge of how the preceding tests fared. The key result in the paper upperbounds the number of tests performed in terms of the entropy of the infection vector.

Uniformity Testing in the Shuffle Model: Simpler, Better, Faster by Clément L. Canonne, Hongyi Lyu (arXiv) Differentially private distribution testing as a research area has been gathering momentum steadily over the last few years. If you read our last month’s post, you might recall there are a wide variety of models of DP each corresponding to a different “threat model”. The most stringent among the most explored models is the “local model”, the least stringent being the “central model” and there is an intermediate threat model, the so called “shuffle model“. This paper simplifies the analysis of uniformity testing algorithm under the shuffle model and presents an algorithm with sample complexity \(O(k^{3/4})\) for testing uniformity over a support of size \([k]\).

On Learning and Testing Decision Tree by Nader H. Bshouty, Catherine A. Haddad-Zaknoon (arXiv) In our December 2020 post, we covered a result of Blanc et al., which proves the following: Suppose you are given a boolean function \(f\) and the property \(\mathcal{P}\) of size-\(s\) decision trees. The result of Blanc et al gives you a function \(g \in \mathcal{P}\) with \(dist(f,g) = O(dist (f, \mathcal{P}))\) where \(g \in \) is guaranteed to have decision tree complexity \(s^{O(\log^2 s)}\). This result implies a bi-criteria tester for the following property: is \(f \in \mathcal{P}\) or is \(f\) \(\varepsilon\)-far from having decision tree complexity \(\phi(s) = s^{O(\log^3 s)}\). The current paper improves this result by presenting a property tester with \(\phi(s) = s^{O(\log^2 s)}\).

The complexity of testing all properties of planar graphs, and the role of isomorphism by Sabyasachi Basu, Akash Kumar, C. Seshadhri (arXiv) (Disclaimer: I am one of the authors of this paper). This paper presents a result that I, in my biased opinion, find interesting. So, here is the setup. You are given a bounded degree planar graph. And I cook up some God-forsaken property and ask you to test it. Turns out, no matter how devilishly I cooked up the property, you can test in with \(\exp(O(\varepsilon^{-2}))\) queries. The nice happenstance is that you also have a matching lower bound of \(\exp(\Omega(\varepsilon^{-2}))\) queries! And interestingly, this lower bound is witnessed by the very natural property of testing isomorphism to a fixed graph which means that isomorphism is the hardest property of planar graphs.

New for July 2021

This month saw three papers appear online, together covering a rather broad range of topics: testing of regular languages, distribution testing under differential privacy, and local testability from high-dimensional expanders. Let’s dive in!

Property Testing of Regular Languages with Applications to Streaming Property Testing of Visibly Pushdown Languages, by Gabriel Bathie and Tatiana Starikovskaya (paper). Let \(L\in \Sigma^\ast\) be a regular language recognized by an automation with \(m\) states and \(k\) connected components: given as input a word \(u\in \Sigma^n\), what is the query complexity to test membership to \(L\) in Hamming distance? Edit distance? Or, more generally, weighted edit distance, where each letter of the word \(u\) comes with a weight? In this paper, the authors focus on non-adaptive, one-sided errors testing algorithms, for which they show an upper bound of \(q=O(k m \log(m/\varepsilon)/\varepsilon)\) queries (with running time \(O(m^2 q)\)), which they complement by a query complexity lower bound of \(\Omega(\log(1/\varepsilon)/\epsilon)\), thus matching the upper bound for languages recognized by constant-size automata. The guarantee for the upper bound is with respected to weighted edit distance, and thus implies the same upper bound for testing with respect to Hamming distance.
To conclude, the authors use an existing connection to streaming property testing to obtain new algorithms for property testing of visibly pushdown languages (VPL) in the streaming model, along with a new lower bound in that model.

High dimensional expansion implies amplified local testability, by Tali Kaufman and Izhar Oppenheim (arXiv). This paper sets out to show that codes that arise from high-dimensional expanders are locally testable (membership to the code can be tested using very few queries). To do so, the authors define a new notion of high-dimensional expanding system (HDE system), as well as that of amplified local testability, a stronger notion than local testability; they then prove that a code based on a HDE system satisfies this stronger notion. Moreover, they show that many well-known families of codes are, in fact, HDE system codes, and therefore satisfy this stronger notion of local testability as well.

Finally, a survey on differential privacy, with a foray into distribution testing:

Differential Privacy in the Shuffle Model: A Survey of Separations, by Albert Cheu (arXiv). If you are familiar with differential privacy (DP), you may recall that there are several notions of DP, each meant to address a different “threat model” (depending on whom you trust with your data). Shuffle DP is one of them, intermediate between “central” DP and the more stringent “local” DP. Long story short: with shuffle DP, the tradeoff between privacy and accuracy can be strictly in-between what’s achievable in central and local DP, and that’s the case for one of the usual suspects of distribution testing, uniformity testing (“I want to test if the data uniformly distributed, but now, with privacy of that data in mind”). The survey discusses what is known about this in Sections 3.3 and 6, and what the implications are; but there are quite a few questions left unanswered… Long story short: a very good introduction to shuffle privacy, and to open problems in that area!

News for June 2021

A quieter month after last month’s bonanza. One (applied!) paper on distribution testing, a paper on tolerant distribution testing, and a compendium of open problems. (Ed: alas, I missed the paper on tolerant distribution testing, authored by one of our editors. Sorry, Clément!)

Learning-based Support Estimation in Sublinear Time by Talya Eden, Piotr Indyk, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, and Tal Wagner (arXiv). A classic problem in distribution testing is that of estimating the support size \(n\) of an unknown distribution \(\mathcal{D}\). (Assume that all elements in the support have probability at least \(1/n\).) A fundamental result of Valiant-Valiant (2011) proves that the sample complexity of this problem is \(\Theta(n/\log n)\). A line of work has emerged in trying to reduce this complexity, with additional sources of information. Canonne-Rubinfeld (2014) showed that, if one can query the exact probabilities of elements, then the complexity can be made independent of \(n\). This paper studies a robust version of this assumption: suppose, we can get constant factor approximations to the probabilities. Then, the main result is that we can get a query complexity of \(n^{1-1/\log(\varepsilon^{-1})} \ll n/\log n\) (where the constant \(\varepsilon\) denotes the additive approximation to the support size). This paper also does empirical experiments to show that the new algorithm is indeed better in practice. Moreover, it shows that existing methods degraded rapidly with poorer probability estimates, while the new algorithm maintains its accuracy even with such estimates.

The Price of Tolerance in Distribution Testing by Clément L. Canonne, Ayush Jain, Gautam Kamath, and Jerry Li (arXiv). While we have seen many results in distribution testing, the subject of tolerance is one that hasn’t received as much attention. Consider the problem of testing if unknown distribution \(\mathcal{p}\) (over domain \([n]\)) is the same as known distribution \(\mathcal{q}\). We wish to distinguish \(\varepsilon_1\)-close from \(\varepsilon_2\)-far, under total variation distance. When \(\varepsilon_1\) is zero, this is the standard property testing setting, and classic results yield \(\Theta(\sqrt{n})\) sample complexity. If \(\varepsilon_1 = \varepsilon_2/2\), then we are looking for a constant factor approximation to the distance. And the complexity is \(\Theta(n/\log n)\). Surprisingly, nothing was known in better. Until this paper, that is. The main result gives a complete characterization of sample complexity (up to log factors), for all values of \(\varepsilon_1, \varepsilon_2\). Remarkably, the sample complexity has an additive term \((n/\log n) \cdot (\varepsilon_1/\varepsilon^2_2)\). Thus, when \(\varepsilon_1 > \sqrt{\varepsilon_2}\), the sample complexity is \(\Theta(n/\log n)\). When \(\varepsilon_1\) is smaller, the main result gives a smooth dependence on the sample complexity. One the main challenges is that existing results use two very different techniques for the property testing vs constant-factor approximation regimes. The former uses simpler \(\ell_2\)-statistics (e.g. collision counting), while the latter is based on polynomial approximations (estimating moments). The upper bound in this paper shows that simpler statistics based on just the first two moments suffice to getting results for all regimes of \(\varepsilon_1, \varepsilon_2\).

Open Problems in Property Testing of Graphs by Oded Goldreich (ECCC). As the title clearly states, this is a survey covering a number of open problems in graph property testing. The broad division is based on the query model: dense graphs, bounded degree graphs, and general graphs. A reader will see statements of various classic open problems, such as the complexity of testing triangle freeness for dense graphs and characterizing properties that can be tested in \(poly(\varepsilon^{-1})\) queries. Arguably, there are more open problems (and fewer results) for testing in bounded degree graphs, where we lack broad characterizations of testable properties. An important, though less famous (?), open problem is that of the complexity of testing isomorphism. It would appear that the setting of general graphs, where we know the least, may be the next frontier for graph property testing. A problem that really caught my eye: can we transform testers that work for bounded degree graphs into those that work for bounded arboricity graphs? The latter is a generalization of bounded degree that has appeared in a number of recent results on sublinear graph algorithms.

News for May 2021

We hope you are all staying safe. With massive vaccination programs across the globe we hope you and your loved ones are getting back to what used to be normal. With that out of the way, let us circle back to Property Testing. This month was less sleepy as compared to the two preceding months and we saw six papers in total (two of them explore problems in quantum property testing). Without further ado, let us take a deeper dive.

GSF-locality is not sufficient for proximity-oblivious testing, by Isolde Adler, Noleen Kohler, Pan Peng (arXiv) The notion of proximity oblivious testers was made explicit in the seminal work of Goldreich and Ron in 2009 [GR09]. A proximity oblivious tester for a graph property is a constant query tester that rejects a graph with probability that monotonically increases with distance to the property. (Edit: Correction) A property is called proximity oblivious testable (or PO testable) if it has a one sided proximity oblivious tester. [GR09] gave a characterization of which properties \(\Pi\) are PO testable in the bounded degree model if and only if it is a “local” property of some kind which satisfies a certain non propagation condition. [GR09] conjectured that all such “local” properties satisfy this non propagation condition. This paper refutes the above conjecture from [GR09].

Coming up next. More action on triangle freeness.

Testing Triangle Freeness in the General Model in Graphs with Arboricity \(O(\sqrt n)\), by Reut Levi (arXiv) PTReview readers are likely to be aware that triangle freeness has been a rich source of problems for developing new sublinear time algorithms. This paper considers the classic problem of testing triangle freeness in general graphs. In the dense case, algorithms with running time depending only on \(\varepsilon\) are known thanks to the work of Alon, Fischer, Krivelevich and Szegedy. In the bounded degree case, Goldreich and Ron gave testers with query complexity \(O(1/\varepsilon)\). This paper explores the problem in general graph case and proves an upper bound of \(O(\Gamma/d_{avg} + \Gamma)\) where \(\Gamma\) is the arboricity of the graph. The author also shows that this upperbound is tight for graphs with arboricity at most \(O(\sqrt n)\). Curiously enough, the algorithm does not take arboricity of the graph as an input and yet \(\Gamma\) (the arboricity) shows up in the upper and lower bounds.

Testing Dynamic Environments: Back to Basics, by Yonatan Nakar and Dana Ron (arXiv) Goldreich and Ron introduced the problem of testing “dynamic environments” in 2014. Here is the setup for this problem. You are given an environment that evolves according to a local rule. Your goal is to query some of the states in the system at some point of time and determine if the system is evolving according to some fixed rule or is far from it. In this paper, the authors consider environments defined by elementary cellular automata which evolve according to threshold rules as one of the first steps towards understanding what makes a dynamic environment tested efficiently. The main result proves the following: if your local rules satisfy some conditions, you can use a meta algorithm with query complexity \(poly(1/\varepsilon)\) which is non adaptive and has one sided error. And all the threshold rules indeed satisfy these conditions which means they can be tested efficiently.

Identity testing under label mismatch, by Clement Canonne and Karl Wimmer (arXiv) This paper considers a classic problem distribution testing with the following twist. Let \(q\) denote a distribution supported on \([n]\). You are given access to samples from another distribution \(p\) where \(p = q \circ \pi\) where \(\pi\) is some unknown permutation. Thus, I relabel the data and I give you access to samples from the relabeled dataset. Under this promise, note that identity testing becomes a trivial problem if \(q\) is known to be uniform over \([n]\). The authors develop algorithms for testing and tolerant testing of distributions under this additional promise of \(p\) being a permutation of some known distribution \(q\). The main result shows as exponential gap between the sample complexity of testing and tolerant testing under this promise. In particular, identity testing under the promise of permutation has sample complexity \(\Theta(\log^2 n)\) whereas tolerant identity testing under this promise has sample complexity \(\Theta(n^{1-o(1)})\).

Testing symmetry on quantum computers, by Margarite L. LaBorde and Mark M. Wilde (arXiv) This paper develops algorithms which test symmetries of a quantum states and changes generated by quantum circuits. These tests additionally also quantify how symmetric these states (or channels) are. For testing what are called “Bose states” the paper presents efficient algorithms. The tests for other kinds of symmetry presented in the paper rely on some aid from a quantum prover.

Quantum proofs of proximity, by Marcel Dall’Agnol, Tom Gur, Subhayan Roy Moulik, Justin Thaler (ECCC) The sublinear time (quantum) computation model has been gathering momentum steadily over the past several years. This paper seeks to understand the power of \({\sf QMA}\) proofs of proximity for property testing (recall \({\sf QMA}\) is the quantum analogue of \({\sf NP}\)). On the algorithmic front, the paper develops sufficient conditions for properties to admit efficient \({\sf QMA}\) proofs of proximity. On the complexity front, the paper demonstrates a property which admits an efficient \({\sf QMA}\) proof but does not admit a \({\sf MA}\) or an interactive proof of proximity.

News for April 2021

A somewhat “sublinear” month of April, as far as property testing is concerned, with only one paper. (We may have missed some; if so, please let us know in the comments!)

Graph Streaming Lower Bounds for Parameter Estimation and Property Testing via a Streaming XOR Lemma, by Sepehr Assadi and Vishvajeet N (arXiv). This paper establishes space vs. pass trade-offs lower bounds for streaming algorithms, for a variety of graph tasks: that is, of the sort “any \(m\)-pass-streaming algorithm for task \(\mathcal{T}\) must use memory at least \(f(m)\).” The tasks considered include graph property estimation (size of the maximum matching, of the max cut, of the weight of the MST) and property testing for sparse graphs (connectivity, bipartiteness, and cycle-freeness). The authors obtained exponentially improved lower bounds for those, via reductions to a relatively standard problem, (noisy) gap cycle counting, for which they establish their main lower bound. As a key component of their proof, they prove a general direct product result (XOR lemma) for the streaming setting, showing that the advantage for solving the XOR of \(\ell\) copies of a streaming predicate \(f\) decreases exponentially with \(\ell\).

News for March 2021

A somewhat quieter month by recent standards. Three Two papers: graph property testing and quantum distribution testing. (Ed: The distribution testing paper was a revision of a paper we already covered in Sept 2020.)

Robust Self-Ordering versus Local Self-Ordering by Oded Goldreich (ECCC). In Nov 2020, we covered a paper that uses a tool called self-ordered graphs, that transferred bit string function lower bounds to graph property testing. Consider a labeled graph. A graph is self-ordered if its automorphism group only contains the identity element (it has no non-trivial isomorphisms). A graph is robustly self-ordered, if every permutation of the vertices leads to a (labeled) graph that is sufficiently “far” according to edit distance. Given a self-ordered graph \(G\), a local self-ordering procedure is the following. Given access to a copy \(G’\) of \(G\) and a vertex \(v \in V(G’)\), this procedure determines the (unique) vertex in \(V(G)\) that corresponds to \(v\) with sublinear queries to \(G\). In other words, it can locally “label” the graph. Intuitively, one would think that more robustly self-ordered graphs will be easier to locally label. This paper studies the relation between robust and local self-ordering. Curiously, this paper refutes the above intuition for bounded-degree graphs, and (weakly) confirms it for dense graphs. Roughly speaking, there are bounded degree graphs that are highly robustly self-ordered, for which any local self-ordering procedure requires \(\omega(\sqrt{n})\) queries. Moreover, there are bounded degree graphs with \(O(\log n)\)-query local self-ordering procedures, yet are not robustly self-ordered even for weak parameters. For dense graphs, the existence of fast non-adaptive local self-ordering procedures implies robust self-ordering.

Testing identity of collections of quantum states: sample complexity analysis by Marco Fanizza, Raffaele Salvia, and Vittorio Giovannetti (arXiv). This paper takes identity testing to the quantum setting. One should think of a \(d\)-dimensional quantum state as a \(d \times d\) density matrix (with some special properties). To learn the state entirely up to error \(\varepsilon\) would require \(O(\varepsilon^{-2} d^2)\) samples/measurements. A recent result of Badescu-O’Donnell-Wright proves that identity testing to a known state can be done significantly faster using \(O(\varepsilon^{-2} d)\) measurements. This paper takes this result a step further by consider a set of \(N\) quantum states. A “sample” is like a classical sample, where one gets a sample from a distribution of quantum states. The YES (“uniform”) case is when all the states are identical. The NO (“far from uniform”) case is when they are “far” from being the same state. This paper proves that \(O(\varepsilon^{-2}\sqrt{N}d)\) samples suffices for distinguishing these cases.

News for February 2021

We got quite some action last month. We saw five papers. A lot of action in graph world and some action in quantum property testing which we hope you will find appetizing. Also included is a result on sampling uniformly random graphlets.

Testing Hamiltonicity (and other problems) in Minor-Free Graphs, by Reut Levi and Nadav Shoshan (arXiv). Graph Property Testing has been explored pretty well for dense graphs (and reasonably well for bounded degree graphs). However, testing properties in the general case still remains an elusive goal. This paper makes contributions in this direction and as a first result it gives an algorithm for testing Hamiltonicity in minor free graphs (with two sided error) with running time \(poly(1/\varepsilon)\). Let me begin by pointing out that Hamiltonicity is an irksome property to test in the following senses.

  • It is neither monotone nor additive. So the partition oracle based algorithms do not immediately imply a tester (with running time depending only on \(\varepsilon\) for Hamiltonicity. This annoyance bugs you even in the bounded degree case.
  • Czumaj and Sohler characterized what graph properties are testable with one-sided error in general planar graphs. In particular, they show a property of general planar graphs is testable iff this property can be reduced to testing for a finite family of finite forbidden subgraphs. Again, Hamiltonicity does not budge to this result.
  • There are (concurrent) results by Goldreich and Adler-Kohler which show that with one-sided error, Hamiltonicity cannot be tested with \(o(n)\) queries.

The paper shows that distance to Hamiltonicity can be exactly captured in terms of a certain combinatorial parameter. Thereafter, the paper tries to estimate this parameter after cleaning up the graph a little. This allows them to estimate the distance to Hamiltonicity and thus also implies a tolerant tester (restricted to mino-free graphs).

Testing properties of signed graphs, by Florian Adriaens, Simon Apers (arXiv). Suppose I give you a graph \(G=(V,E)\) where all edges come with a label: which is either “positive” or “negative”. Such signed graphs are used to model various scientific phenomena. Eg, you can use these to model interactions between individuals in social networks into two categories like friendly or antagonistic.

This paper considers property testing problems on signed graphs. The notion of farness from the property extends naturally to these graphs (both in the dense graph model and the bounded degree model). The paper contains explores three problems in both of these models: signed triangle freeness, balance and clusterability. Below I will zoom into the tester for clusterability in the bounded degree setting developed In the paper. A signed graph is considered clusterable if you can partition the vertex set into some number of components such that the edges within any component are all positive and the edges running across components are all negative.

The paper exploits a forbidden subgraph characterization of clusterability which shows that any cycle with exactly one negative edge is a certificate of non-clusterability of \(G\). The tester runs multiple random walks from a handful of start vertices to search for these “bad cycles” by building up on ideas in the seminal work of Goldreich and Ron for testing bipariteness. The authors put all of these ideas together and give a \(\widetilde{O}(\sqrt n)\) time one-sided tester for clusterability in signed graphs.

Local Access to Random Walks, by Amartya Shankha Biswas, Edward Pyne, Ronitt Rubinfeld (arXiv). Suppose I give you a gigantic graph (with bounded degree) which does not fit in your main memory and I want you to solve some computational problem which requires you to solve longish random walks of length \(t\). And lots of them. It would be convenient to not spend \(\Omega(t)\) units of time performing every single walk. Perhaps it would work just as well for you to have an oracle which provides query access to a \(Position(G,s,t)\) oracle which returns the position of a walk from \(s\) at time \(t\) of your choice. Of course, you would want the sequence of vertices returned to behave consistently with some actual random walk sampled from the distribution of random walks starting at \(s\). Question is: Can I build you this primitive? This paper answers this question in affirmative  and shows that for graphs with spectral gap \(\Delta\), this can be achieved with running time \(\widetilde{O}(\sqrt n/\Delta)\) per query. And you get the guarantee that the joint distribution of the vertices you return at queried times is \(1/poly(n)\) close to the uniform distribution over such walks in \(\ell_1\).  Thus, for a random \(d\)-regular graph, you get running times of the order \(\widetilde{O}(\sqrt n)\) per query. The authors also show tightness of this result by showing to get subconstant error in \(\ell_1\), you necessarily need \(\Omega(\sqrt n/\log n)\) queries in expectation.

Efficient and near-optimal algorithms for sampling connected subgraphs, by Marco Bressan (arXiv). As the title suggests, this paper considers efficient algorithms for sampling a uniformly random \(k\)-graphlet from a given graph \(G\) (for \(k \geq 3\)). Recall, a \(k\)-graphlet refers to a collection of \(k\)-vertices which induce a connected graph in \(G\). The algorithm considered in the paper is pretty simple. You just define a Markov Chain \(\mathcal{G}_k\) with all \(k\)-graphlets as its state space. Two states in \(\mathcal{G}_k\) are adjacent iff their intersection is a \((k-1)\)-graphlet. To obtain a uniformly random sample, a classical idea is to just run this Markov Chain and obtain an \(\varepsilon\)-uniform sample. However, the gap between upper and lower bounds on the mixing time of this walk is of the order \(\rho^{k-1}\) where \(\rho = \Delta/\delta\) (that is the ratio of maximum and minimum degrees to the power \(k-1\)). The paper closes this gap up to logarithmic factors and shows that the mixing time of the walk is at most \(t_{mix}(G) \rho^{k-1} \log(n/\varepsilon)\). It also proves an almost matching lower bound. Further, the paper also presents an algorithm with event better running time to return an almost uniform \(k\)-graphlet. This exploits a previous observation: sampling a uniformly random \(k\)-graphlet is equivalent to sampling a uniformly random edge in \(\mathcal{G}_{k-1}\). The paper then proves a lemma which upperbounds the relaxation time of walks in \(\mathcal{G}_k\) to walks in \(\mathcal{G}_{k-1}\). And then you upperbound the mixing time in terms of the relaxation time to get an improved expected running time of the order \(O(t_{mix}(G) \cdot \rho^{k-2} \cdot \log(n/\varepsilon)\).

Toward Instance-Optimal State Certification With Incoherent Measurements, by Sitan Chen, Jerry Li, Ryan O’Donnell (arXiv). The problem of quantum state certification has gathered interest over the last few years. Here is the setup: you are given a quantum state \(\sigma \in \mathbb{C}^{d \times d}\) and you are also given \(N\) copies of an unknown state \(\rho\). You want to distinguish between the following two cases: Does \(\rho = \sigma\) or is \(\sigma\) at least \(\varepsilon\)-far from \(\rho\) in trace norm? Badescu et al showed in a recent work that if entangled measurements are allowed, you can do this with a mere \(O(d/\varepsilon^2)\) copies of \(\rho\). But using entangled states comes with its own share of problems. On the other hand if you disallow entanglement, as Bubeck et al show, you need \(\Omega(d^{3/2}/\varepsilon^2)\) measurements. This paper asks: for which states \(\sigma\) can you improve upon this bound. The work takes inspirations from a la “instance optimal” bounds for identity testing. Authors show a fairly general result which (yet again) confirms that the quantum world is indeed weird. In particular, the main result of the paper implies that the copy complexity of (the quantum analog of) identity testing in the quantum world (with non-adaptive queries) grows as \(\Theta(d^{1/2}/\varepsilon^2)\). That is, the number of quantum measurements you need increases with \(d\) (which is the stark opposite of the behavior you get in the classical world).

News for December 2020

Happy 2021! We now cover the last papers of 2020: a (smaller) spread with graph property testing, distribution property testing, and circuit complexity.

Almost Optimal Bounds for Sublinear-Time Sampling of k-Cliques: Sampling Cliques is Harder Than Counting by Talya Eden, Dana Ron, and Will Rosenbaum (arXiv). In many settings, sampling and counting are tightly coupled; the classic example being permanent estimation and sampling random bipartite matchings.This paper investigates the problem of sampling uniform random cliques, and unearths an interesting separation of approximately uniform sampling from approximate counting. Let’s consider the fundamental problem of edge and triangle counting, in graphs of constant arboricity. This class, which contains all minor-closed families, has deep connections to clique counting; clique counting can be done in linear time for bounded arboricity graphs.) The input connected graph \(G\) has \(n\) vertices, \(m\) edges, and \(t\) triangles. We assume the standard query model, with uniform random vertices, neighbor, and degree queries. Our aim is to get \((1+\varepsilon)\)-approximations for \(m\) and \(t\) (in general, any \(k\)-clique count). It is known from previous work that edge estimation can be done in \(O(1)\) time and triangle estimation can be done in \(O(n/t)\) time (ignoring log and \(\varepsilon\) dependencies). Moreover, these bounds is optimal. Can one get similar bounds for approximately sampling a uniform random edge/triangle? Previous work of Eden-Rosenbaum achieve such a bound for sampling edges. Surprisingly, this paper shows that triangle sampling is fundamentally harder, and requires \(\Omega(n^{3/4}/\sqrt{t})\) queries. The main result gives a precise and optimal bound for uniform sampling \(k\)-cliques in graphs of arboricity \(\alpha\).

Testing and reconstruction via decision trees by Guy Blanc, Jane Lange, and Li-Yang Tan (arXiv). A property tester is an (sublinear) algorithm that distinguishes an object with a property from those that are far. Reconstruction is a much stronger sublinear algorithm, which actually provides query access to the closest object. Suppose we have query access to an \(n\)-variate Boolean function \(f\), and the property \(\mathcal{P}\) of interest is size-\(s\) decision trees. A reconstructor would give query access to a function \(g \in \mathcal{P}\), such that \(dist(f,g) = O(OPT_f)\), where \(OPT_f\) is the distance of \(f\) to \(\mathcal{P}\). Observe that a reconstructor automatically gives a distance estimator (just estimate \(dist(f,g)\) by random sampling). One of the main results of this paper is a (weaker, bicriteria) reconstructor, where \(g\) is guaranteed to have decision-tree complexity \(s^{O(\log^2s)}\). Each query to \(g\) is computed in \(poly(\log s, 1/\varepsilon)\cdot n\) time. This reconstructor leads to a number of testing results, and reconstructors for other properties like low Fourier degree. Another interesting result proven: if there exists an efficient reconstructor that gets \(g\) of size \(s\), then there exists an efficient proper learner for decision trees (a big open problem).

Testing Product Distributions: A Closer Look by Arnab Bhattacharyya, Sutanu Gayen, Saravanan Kandasamy, and N. V. Vinodchandran (arXiv). This paper is on distribution testing, where all distributions involved are promised to be product distributions. Let \(P, Q\) be distributions over \(\Sigma^n\), where \(\Sigma\) is some (small) finite set. The problems studied are identity testing (\(Q\) is known/fixed, and one gets samples from unknown \(P\)) and closeness testing (both unknown). It is known from previous work that these problems can basically be solved with \(O(n|\Sigma|\varepsilon^{-2})\) samples. Note that the domain size is exponential in \(n\); thus, these bounds give an exponential improvement over identity/closeness testing in the vanilla (non-product) setting. Tolerant testing is of special significance here. Because the domain is so large, it makes sense to allow for some distance even in the “yes” case. The main result of this paper is to completely map out the complexities for identity/closeness testing where we wish to distinguish \(d_1(P,Q) < \varepsilon_1\) from \(d_2(P,Q) > \varepsilon_2\), where \(d_1, d_2\) are potentially different distance measures. For example, the paper considers total variation (TV), KL, Hellinger, and \(\chi^2\) distances. An important improvement achieved is for identity testing where \(d_1\) is Hellinger and \(d_2\) is TV distance: for certain values of \(\varepsilon_1, \varepsilon_2\), one can get a tester of optimal sample complexity \(O(\sqrt{n|\Sigma|})\). Previous bounds only gave a \(O(n|\Sigma|)\) sample complexity. There are a number of new lower bounds for different combinations of \(d_1, d_2\) distance pairs.

News for November 2020

Highlights in property testing this month, include developments across the board. We have a nice potpourri of results which will catch your fancy. From a result which shows the tightness of quadratic gap between query complexities of adaptive vs non-adaptive testers for dense graph properties, to isoperimetric Talagrand-like inequalities for real valued functions over the hypercube, to results which consider face off between quantum computing and distribution testing and more. Looks like the festive season came early for the PTReview readers.

Non-adaptive vs Adaptive Queries in the Dense Graph Testing Model (by Oded Goldreich and Avi Wigderson) (ECCC) It has been long established in the dense graph property testing literature that the query complexity of adaptive testers and non-adaptive testers are only a quadratic factor apart. This work asks: is this gap necessarily tight and answers in the affirmative. The main conceptual tool used in the paper is the notion of robustly self-ordered graphs and local self-ordering procedures for these graphs (which was developed by the same authors and was covered in our September Post). These results use these tools to port lower bounds on property testing problems for bit strings to graph properties.

Direct Sum and Partitionability Testing over General Groups (by Andrej Bogdanov and Gautam Prakriya) (ECCC) In their celebrated work, Blum, Luby and Rubinfeld gave a four query test to determine whether a function \(f \colon \mathbb{F}_2^n \to \mathbb{F}_2\) is affine. This paper considers a generalization of the notion of affinity to functions \(f \colon \{0,1\}^n \to G\) where \(G\) is an Abelian group. The generalization sough asks: does \(f\) have the form \(f = \sum x_ig_i + g_0\) for group elements \(g_0, g_1, \cdots, g_n \in G\). In this setting, the BLR analysis does not apply as the domain and range are note equipped with the same group structure. This work presents a four query test for the above problem. In fact, the test is more general and can be used to decide whether a function \(f \colon D_1 \times D_2 \ldots \times D_n \to G\) from a product domain \(D_1 \times D_2 \ldots \times D_n\) to an Abelian group $G$ is a direct sum if it has the form \(f(x_1, x_2, \cdots, x_n) = \sum f_i(x_i)\) which resolves a conjecture of Dinur and Golubev. The work also presents results for testing whether a function \(f\) defined from a product domain to an Abelian group \(G\) is a direct product.

Isoperimetric Inequalities for Real-Valued Functions with Applications to Monotonicity Testing (by Hadley Black, Iden Kalemaj and Sofya Raskhodnikova) (ArXiv) This paper generalizes the celebrated isoperimetric inequalities of boolean functions over the hypercube to the setting of real valued functions. This generalized isoperimetry is then put to work to obtain a monotonicity tester for \(f \colon \{0,1\}^n \to \mathbb{R}\). The tester makes \(\min(\tilde{O}(r \sqrt n), O(n))\) non-adaptive queries (where \(r\) is the size of image of \(f\)) and has one-sided error. The authors generalize the KMS inequality by a Boolean decomposition theorem which allows them to represent any real valued function over the cube as a collection of Boolean functions over the cube which capture \(f\)’s distance to montonicity as well the structure of violations to monotonicity in \(f\).

Erasure-Resilient Sublinear-Time Graph Algorithms (by Amit Levi, Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova and Nithin Varma) (ArXiv) In a previous work, a subset of the authors explored how to equip property testers with the erasure-resilience for function properties. With graph properties, a different picture emerges. In the dense graph model, where you have query access to the adjacency matrix, the situation is still fine: adjacency matrices are functional representations of graphs: therefore, if you have a black belt in making property testers erasure resilient for function properties, you would be able to test properties of dense graphs too. However, if I give you query access to the graph through an adjacency list, the picture changes. These are non-functional representations of graphs and therefore need new conceptual tools. This paper begins the study of erasure resilience in the adjacency list model. It focuses on two computational tasks: testing connectivity and estimating average degree. Let me showcase the results in the paper on testing connectivity. It is shown that you encounter a threshold phenomena in testing connectivity. As long as the fraction of erasures is small, you get testers which run in time independent of the size of the graph. But when the fraction of erasures exceeds a certain cutoff, it is shown that the tester needs a number of queries linear in the size of the adjacency list.

Expander Random Walks: A Fourier Analytic Approach (by Gil Cohen, Noam Peri and Amnon Ta-Shma) (ECCC) While this is a not exactly a property testing paper, how can you not report a paper which proves a significant strengthening of the classic CLT for Markov Chains (alright, for random walks on expanders) with respect to the TV distance? Let us now unpack the above a little. So, consider the following setup: Suppose you have a \(d\)-regular expander \(G\). Now, imagine running the following process \(\textsf{(P)}\):

1) Label half the vertices in \(G\) as \(+1\) and the other half as \(-1\) arbitrarily.
2) Take a \(t-1\) step random walk which involves \(t\) vertices: say \(v_1, v_2, \ldots v_t\).
3) Finally, return the boolean label of all of the visited vertices. This gives a bit-vector \(x \in \{-1,1\}^t\).

Now, let us ask: Which class of functions get fooled by this process?

The main result of the paper shows that this process fools

1) Symmetric functions
2) The class \(AC^0\)
3) Functions \(f\) which are computable by low width read once branching programs.

In more detail, let \(f\) be a symmetric function and let \(G\) be a \(\lambda\)-spectral expander. For the process \(\textsf{(P)}\) defined above, it follows by the spectral expansion of \(G\), that \(discrepancy(f) = |E_{x \sim \textsf{(P)}} [f(x)] – E_{x \sim U_t} [f(x)]|\) is small.

Now note that the total variation distance is precisely equal to the maximum discrepancy you get over all symmetric functions \(f\). This leads to the connection that allows the authors to upgrade CLT for expander random walks as they are able to show a CLT for Markov Chains with respect to the TV distance (as opposed to the CLT with respect to the Kolmogorov Distance which is what was known from the work of Kipnis and Vadhan since 1986).

New Sublinear Algorithms and Lower Bounds for LIS Estimation (by Ilan Newman and Nithin Varma) (ArXiv) As the title suggests, this paper considers the task of estimating the length of the longest increasing sequence in an array. In particular, it gives a non-adaptive algorithm which estimates the LIS length within an additive error of \(\pm \varepsilon n\) in \(\tilde{O}\left(\frac{r}{\varepsilon^3}\right)\) queries (where \(r\) is the number of distinct elements in \(A\)). On the lower bound side, the paper presents a lower bound of \((\log n)^{\Omega(1/\varepsilon)}\) non-adaptive queries. The lower bound construction uses ideas from lower bound of Ben-Eliezer, Canonne, Letzter and Waingarten on testing monotone pattern freeness.

Stability and testability: equations in permutations (by Oren Becker, Alexander Lubotzky, AND Jonathan Mosheiff) (ArXiv) Consider the following question: Suppose I ask if two permuations \(A,B \in Sym(n)\). Suppose I want to decide whether these permutations commute or whether they are far from permutations which commute. Can this question be answered in time independent of \(n\)? The authors answer this question in the affirmative. They show a simple Sample and Substitute procedure which essentially does the following: take samples \(x_1, x_2, cdots x_k \sim [n]\). And accept iff \(ABx_i = BAx_i\) for each \(i\). The authors call this particular set of equations/constraints (checking whether \(AB = BA\)) stable: as it admits a Sample and Substitute style tester with query complexity independent of \([n]\). This allows the authors to consider the group theoretic notion of stability as a property testing concept and allows them to examine the notion of stability from a computational standpoint.

StoqMA meets distribution testing (by Yupan Liu) (ArXiv) This paper shows a containment result. A certain complexity class \(\textsf{eStoqMA}\) is shown to be a subset of \(\textsf{StoqMA}\). Let us informally define what these classes are. A language \(L \in \textsf{StoqMA}\) if there exists a verifier (which is a reversible quantum circuit) such that on input a string \(x\) such that for every \(x \in L\) there exists a witness quantum state which makes the verifier accept with probability at least \(2/3\). And in case \(x \not\in L\), for every quantum state the verifier rejects \(x\) with probability at least \(2/3\). The paper characterizes this class from a distribution testing point of view. And this connection allows the author to conclude that \(\textsf{eStoqMA} = \textsf{MA}\). Here, \(\textsf{eStoqMA}\) is a sub-class of \(\textsf{StoqMA}\) where all YES instances have an easy witness.

(Edit: Added later) We missed the following two papers. Thanks to our readers for pointing it out.

Near-Optimal Learning of Tree-Structured Distributions by Chow-Liu (by Arnab Bhattacharya, Sutanu Gayen, Eric Price and N.V. Vinodchandran) (ArXiv) Graphical models are a convenient way to describe high dimensional data in terms of its dependence structure. An important question in this area concerns learning/inferring the underlying graphical model from iid samples. This paper considers the problem when the underlying graph is in fact a tree. Thus, the challenge is to learn a tree structured distribution. The main result is that given sample access to a tree structured distribution, you can recover an \(\varepsilon\) approximate tree with good probability in \(O(n/\varepsilon)\).

Relaxed Locally Correctable Codes with Improved Parameters (by Vahid R. Asadi and Igor Shinkar) (ECCC) In their work on Robust PCPs of Proximity, Ben-Sasson et al. asked whether it is possible to obtain a \(q\)-query (Relaxed) Locally Decodable Code whose block length is strictly smaller than the best known bounds on block lengths for Locally Decodable Codes. Recall with relaxed LDCs you are allowed to abort outputting the \(i^{th}\) bit of the message should you detect that the received word is not a valid codeword. This paper makes progress on this problem and shows that you can actually construct an \(O(q)\)-query relaxed LDCs which encode a message of length \(k\) using \(O(k^{1 + 1/q})\) queries. This matches some of the known lower bounds for constant query LDCs which thus makes progress towards understanding the gap between LDCs and relaxed LDCs in the regime \(q = O(1)\).

News for September 2020

Apologies dear readers for the late posting. The beginning of the school year is always frenzied, and the pandemic has only added to that frenzy. We have an exciting September, with four papers on graph property testing, one two papers on distribution testing, and one paper that connects both topics.

(Ed: we normally scan through ECCC and arXiv, but are happy to post about papers that appear elsewhere. Thanks to the reader who pointed out a relevant COLT 2020 paper.)

Estimation of Graph Isomorphism Distance in the Query World by Sourav Chakraborty, Arijit Ghosh, Gopinath Mishra, and Sayantan Sen (ECCC). Graph isomorphism is about as fundamental as it gets, and this papers studies approximating the graph isomorphism distance for dense graphs. There is a known graph \(G_k\) (with \(n\) vertices). The algorithm is given query access to an input graph \(G_u\) and needs to approximate the number of edge inserts/deletes required to make the graphs isomorphic. This is the tolerant testing version; the property testing version is known to be doable in \(\widetilde{O}(\sqrt{n})\) queries (Matsliah-Fischer). The main insight of this paper is to relate the tolerant testing complexity to a distribution testing problem. Consider distributions over the \(\{0,1\}^n\) defined by multisets of \(n\) hypercube points. Our aim is to estimate the earthmover distance between a known distribution and an unknown distribution. Interestingly, the query model is different: one can sample the underlying multisets without replacement. It turns out that the optimal complexity of this problem is (upto polylog factors) is the same as the optimal complexity of tolerant testing of graph isomorphism. A direct corollary is that the isomorphism distance can be approximated upto additive \(\epsilon n^2\) using \(\widetilde{O}(n)\) samples. This equivalence also gives an alternate proof for lower bounds for property testing graph isomorphism.

Robustly Self-Ordered Graphs: Constructions and Applications to Property Testing by Oded Goldreich and Avi Wigderson (ECCC). Let’s start from the application. The aim is to prove the following property testing lower bounds for the bounded-degree graph setting: an exponential separation between tolerant and vanilla testing, and finding an efficiently decidable property (in polynomial time) that cannot be property tested in sublinear time. For binary strings, results of this form are known. Can these be “ported” to the bounded-degree graph world? Can we construct graphs such that adjacency queries reduce to bit queries in strings? Naturally, one can simply represent the adjacency list as a string and treat graph queries as bit queries. But the problem is that of isomorphisms: different bit strings could represent the same graph and therefore, the different bit strings must have the same status with respect to the underlying property. The key insight in this paper is to introduce robustly self-ordered graphs, as a tool to port bit string property testing lower bounds to bounded-degree graphs. Such graphs essentially have a unique (identity) automorphism, even after a few edge insert/deletes. The actual definition is more technical, but that is the essence. The main result is an explicit construction of such graphs, from which the lower bound can be ported directly through a convenient lemma.

Modifying a Graph’s Degree Sequence and the Testablity of Degree Sequence Properties by Lior Gishboliner (arXiv). A sequence of numbers \(D = (d_1, d_2, \ldots, d_n)\) is graphic if there exists an undirected graph on \(n\) vertices whose degrees are precisely the numbers of the sequence. Graphical sequences have been characterized by classic results of Erdös-Gállai and Havel-Hakimi. This paper first proves the following theorem. Suppose a graphic sequence \(D’\) has \(l_1\)-distance at most \(\delta\) from the degree sequence \(D\) of a graph \(G\). Then, there exists a graph \(G’\) with degree sequence \(D’\) such that the (dense graph) distance between \(G\) and \(G’\) is \(O(\sqrt{\delta})\). This theorem is used to prove an interesting property testing result. Let \(\mathcal{D}\) be a subset of graphic sequences that are closed under permutation. Let \(\mathcal{G}\) be the set of graphs that have a degree sequence in \(\mathcal{D}\). Then \(\mathcal{G}\) can be tested in \(poly(1/\epsilon)\) queries.

Sampling an Edge Uniformly in Sublinear Time by Jakub Têtek (arXiv). In the general model for sublinear algorithms on graphs, an important choice is whether one allows uniform random edge queries. A natural question is whether such queries can simulated efficiently, using only random vertex, degree, and neighbor queries. This problem appears somewhat implicitly in previous sublinear subgraph counting algorithms, and Eden-Ron-Rosenbaum study it explicitly. They prove that one can sample from an \(\epsilon\)-approximate uniform distribution (over edges) using \(O(n/\sqrt{\epsilon m})\) samples. The problem of sampling from exactly the uniform distribution is left open. Until this paper. The main result shows that by modifying the Eden-Ron-Rosenbaum algorithm parameters, one can generate edge samples from an \(\epsilon\)-approximate uniform distribution using \(O((n/\sqrt{m})\log \epsilon^{-1})\) samples. The exact uniform distribution is achieved by setting \(\epsilon = 1/n\), to get a sample complexity of \(O((n\log n)/\sqrt{m})\).

Faster Property Testers in a Variation of the Bounded Degree Model by Isolde Adler and Polly Fahey (arXiv). The setting of bounded-degree graph property testing naturally extends to bounded-degree relational databases, which can be thought of as “directed” hypergraphs. This is an interesting new direction of research, that combines property testing with database theory (see Adler-Harwath and Chen-Yoshida). One of the main contributions of this work is to consider another notion of distance: edge and vertex inserts/deletes. This is a natural extension, and we can now compare distances between graphs/databases with different numbers of vertices. The main result is that, under this notion of distance, a large class of properties can be tested in constant running time on databases with bounded degree and treewidth. Specifically, any property expressible in Counting Monadic Second-Order Logic (CMSO) can be tested in constant time. Previous results by Alder-Harwath showed that such properties can be tested (under the standard distance notion) in constant queries, but polylogarithmic time.

Optimal Testing of Discrete Distributions with High Probability by Ilias Diakonikolas, Themis Gouleakis, Daniel M. Kane, John Peebles, and Eric Price (arXiv, ECCC). The focus of this paper is distribution testing in the “high probability” regime, where we wish the error of the tester to be \(< \delta\). Typically, most results just get an error of at most \(1/3\), from which standard probabilistic boosting would tack on an extra \(O(\log 1/\delta)\) factor. In standard TCS settings, one doesn’t focus on optimizing this dependence, but in statistics, there is significant focus on the optimal sample complexity. And indeed, for practical applications, it is crucial to have sharp bounds on the right number of samples required for hypothesis testing. The paper also argues that getting the optimal sample complexity requires new algorithms, even for uniformity testing. There are optimal results given for closeness and independence testing. The optimal sample complexity only pays a multiplicative factor of \(\log^{1/3} (1/\delta)\) or \(\log^{1/2}(1/\delta)\) over the optimal bound for constant error (with other additive terms depending on \(\log(1/\delta)\)).

Bessel Smoothing and Multi-Distribution Property Estimation by Yi Hao and Ping Li (COLT 2020). Let us consider some standard (tolerant) distribution testing questions, phrases as approximation algorithms. Given sample access to two distributions \(p\) and \(q\) over \([n]\), we may wish to estimate the \(l_1\)-distance, \(l_2\)-distance, relative entropy, etc. between these distributions. One can phrases this problem abstractly as estimating \(\sum_{i \in [n]} f(p_i, q_i)\), where \(f\) is some explicit function. This papers shows that for any 1-Lipschitz function \(f\) that satisfies some “regularity” property, the sum \(\sum_{i \in [n]} f(p_i, q_i)\) can be \(\epsilon\)-approximated with \(O(\epsilon^{-3}n/\sqrt{\log n})\) samples (apologies to the authors to replacing their \(k\) with the more familiar \(n\) for our readers). Thus, we can get sublinear sampling complexity for a very general class of estimation problems. Moreover, this was actually the simplest setting consider in the paper. One can deal with such functions of \(d\) distributions, not just two distributions. One of the corollaries of the theorems is a sublinear tolerant tester for the property of being a mixture of distributions.