News for April 2018

Three Four papers for April: a new take on linearity testing, results on sublinear algorithms with advice, histogram testing, and distributed inference problems. (Edit: Sorry Clément, for missing your paper on distributed inference!)

Testing Linearity against Non-Signaling Strategies, by Alessandro Chiesa, Peter Manohar, and Igor Shinkar (ECCC). This paper gives a new model for property testing, through the notion of non-signaling strategies. The exact definitions are quite subtle, but here’s a condensed view. For \(S \subseteq \{0,1\}^n\), let an \(S\)-partial function be one that is only defined on \(S\). Fix a “consistency” parameter \(k\). Think of the “input” as a collection of distributions, \(\{\mathcal{F}_S\}\), where each \(|S| \leq k\) and \(\mathcal{F}_S\) is a distribution of \(S\)-partial functions. We have a local consistency requirement: \(\{\mathcal{F}_S\}\) and \(\{\mathcal{F}_T\}\) must agree (as distributions) on restrictions to \(S \cap T\). In some sense, if we only view pairs of these distributions of partial functions, it appears as if they come from a single distributions of total functions. Let us focus on the classic linearity tester of Blum-Luby-Rubinfeld in this setting. We pick random \(x, y, x+y \in \{0,1\}^n\) as before, and query these values on a function \(f \sim {\mathcal{F}_{x,y,x+y}}\). The main question addressed is what one can say about \(\{\mathcal{F}_S\}\), if this linearity test passes with high probability. Intuitively (but technically incorrect), the main result is that \(\{\mathcal{F}_S\}\) is approximated by a “quasi-distribution” of linear functions.

An Exponential Separation Between MA and AM Proofs of Proximity, by Tom Gur, Yang P. Liu, and Ron D. Rothblum (ECCC). This result follows a line of work on understanding sublinear algorithms in proof systems. Think of the standard property testing setting. There is a property \(\mathcal{P}\) of \(n\)-bit strings, an input \(x \in \{0,1\}^n\), and a proximity parameter \(\epsilon > 0\). We add a proof \(\Pi\) that the tester (or the verifier) is allowed to use, and we define soundness and completeness in the usual sense of Arthur-Merlin protocols. For a \(\mathbb{MA}\)-proof of proximity, the proof \(\Pi\) can only depend on the string \(x\). In a \(\mathbb{AM}\)-proof of proximity, the proof can additionally depend on the random coins of the tester (which determine the indices of \(x\) queried). Classic complexity results can be used to show that the latter subsume the former, and this paper gives a strong separation. Namely, there is a property \(\mathcal{P}\) where any \(\mathbb{MA}\)-proof of proximity protocol (or tester) requires \(\Omega(n^{1/4})\)-queries of the input \(x\), but there exists an \(\mathbb{AM}\)-proof of proximity protocol making \(O(\log n)\) queries. Moreover, this property is quite natural; it is simply the encoding of permutations.

Testing Identity of Multidimensional Histograms, by Ilias Diakonikolas, Daniel M. Kane, and John Peebles (arXiv). A distribution over \([0,1]^d\) is a \(k\)-histogram if the domain can be partitioned into \(k\) axis-aligned cuboids where the probability density function is constant. Recent results show that such histograms can be learned in \(k \log^{O(d)}k\) samples (ignoring dependencies on accuracy/error parameters). Can we do any better for identity testing? This paper gives an affirmative answer. Given a known \(k\)-histogram \(p\), one can test (in \(\ell_1\)-distance) whether an unknown \(k\)-histogram \(q\) is equal to \(p\) in (essentially) \(\sqrt{k} \log^{O(d)}(dk)\) samples. There is a nearly matching lower bound, when \(k = \exp(d)\).

Distributed Simulation and Distributed Inference, by Jayadev Acharya, Clément L. Canonne, and Himanshu Tyagi (arXiv   ECCC). This papers introduces a model of distributed simulation, which generalizes distribution testing and distributed density estimation. Consider some unknown distribution \(\mathcal{D}\) with support \([k]\), and a “referee” who wishes to generate a single sample from \(\mathcal{D}\) (alternately, she may wish to determine if \(\mathcal{D}\) has some desired property). The referee can communicate with “players”, each of whom can generate a single independent sample from \(\mathcal{D}\). The catch is that each player can communicate at most \(\ell\) < \(log_2k\) bits (otherwise, the player can simply communicate the sampled element). How many players are needed for the referee to generate a single sample? The paper first proves that this task is basically impossible with a (worst-case) finite number of players, but can be done with expected \(O(k/2^\ell)\) players (and this is optimal). This can plugged into standard distribution testing results, to get inference results in this distributed, low-communication setting. For example, the paper shows that identity testing can be done with \(O(k^{3/2}/2^\ell)\) players.

News for March 2018

March has been a relatively slow month for property testing, with 3 works appearing online. (If you notice we missed some, please leave a comment pointing it out)

Edge correlations in random regular hypergraphs and applications to subgraph testing, by Alberto Espuny Díaz, Felix Joos, Daniela Kühn, and Deryk Osthus (arXiv). While testing subgraph-freness in the dense graph model is now well-understood, after a series of works culminating in a complete characterization of the testing problems which admit constant-query testers, the corresponding question for hypergraphs is far from resolved. In this paper, the authors develop new techniques for the study of study of random regular hypergraphs, which imply new testing results for subhypergraph-freeness testing, improving on the state-of-the-art for some parameter regimes (e.g., when the input graph satisfies some average-degree condition).

Back from hypergraphs to graphs, we also have:
The Subgraph Testing Model, by Oded Goldreich and Dana Ron (ECCC). Here, the authors introduce a new model for property testing of graphs, where the goal is to test if an unknown subgraph \(F\) of an explicitly given graph \(G=(V,E)\) satisfies the desired property. The testing algorithm is provided access to \(F\) via membership queries, i.e., through query access to the indicator function \(\mathbf{1}_F\colon E \to \{0,1\}\). (In some very hazy sense, this is reminiscent of the active learning or testing frameworks, where one gets more or less free access to unlabeled data but pays to see their label.) As a sample of the results obtained, the paper establishes that this new model and the bounded-degree graph model are incomparable: there exist properties easier to test in one model than the other, and vice-versa — and some properties equally easy to test in both.

And finally, to drive home the point that “models matter a lot,” we have our third paper:
Every set in P is strongly testable under a suitable encoding, by Irit Dinur, Oded Goldreich, and Tom Gur (ECCC). It is known that the choice of representation of the objects has a large impact in property testing: for instance, the complexity of testing a given property can change drastically between the dense and bounded-degree graph models. This work provides another example of such a strong dependence on the representation: while membership to some sets in \(P\) is known to be hard to test, the authors here prove that, for every set \(S\in P\), there exists a (polynomial-time, invertible) encoding \(E_S\colon \{0,1\}^\ast\to \{0,1\}^\ast\) such that testing membership to \(S\) under this encoding is easy. (They actually show even stronger a statement: namely, that under this encoding the set admits a “proximity-oblivious tester,” that is a constant-query testing algorithm which rejects with probability function of the distance to \(S\).)

Finally, on a non-property testing note: Edith Cohen, Vitaly Feldman, Omer Reingold, and Ronitt Rubinfeld recently wrote a pledge for inclusiveness in the TCS community, available here: https://www.gopetition.com/petitions/a-pledge-for-inclusiveness-in-toc.html
If you haven’t seen it already, we encourage you to read it.

Update: Fixed a mistake in the overview of the second paper; as pointed out by Oded in the comments, the main comparison was between the new model and the bounded-degree graph model, not the dense graph one.

News for February 2018

February had a flurry of conference deadlines, and they seem to have produced six papers for us to enjoy, including three on estimating symmetric properties of distributions.

Locally Private Hypothesis Testing, by Or Sheffet (arXiv). We now have a very mature understanding of the sample complexity of distributional identity testing — given samples from a distribution \(p\), is it equal to, or far from, some model hypothesis \(q\)? Recently,  several papers have studied this problem under the additional constraint of differential privacy. This paper strengthens the privacy constraint to local privacy, where each sample is locally noised before being provided to the testing algorithm.

Distribution-free Junta Testing, by Xi Chen, Zhengyang Liu, Rocco A. Servedio, Ying Sheng, and Jinyu Xie (arXiv). Testing whether a function is a \(k\)-junta is very well understood — when done with respect to the uniform distribution. In particular, the adaptive complexity of this problem is \(\tilde \Theta(k)\), while the non-adaptive complexity is \(\tilde \Theta(k^{3/2})\). This paper studies the more challenging task of distribution-free testing, where the distance between functions is measured with respect to some unknown distribution. The authors show that, while the adaptive complexity of this problem is still polynomial (at \(\tilde O(k^2)\)), the non-adaptive complexity becomes exponential: \(2^{\Omega(k/3)}\). In other words, there’s a qualitative gap between the adaptive and non-adaptive complexity, which does not appear when testing with respect to the uniform distribution.

The Vertex Sample Complexity of Free Energy is Polynomial, by Vishesh Jain, Frederic Koehler, and Elchanan Mossel (arXiv). This paper studies the classic question of estimating (the logarithm of) the partition function of a Markov Random Field, a highly-studied topic in theoretical computer science and statistical physics. As the title suggests, the authors show that the vertex sample complexity of this quantity is polynomial. In other words, randomly subsampling a \(\mathrm{poly}(1/\varepsilon)\)-size graph and computing its free energy gives a good approximation to the free energy of the overall graph. This is in contrast to more general graph properties, for the vertex sample complexity is super-exponential in \(1/\varepsilon\).

Entropy Rate Estimation for Markov Chains with Large State Space, by Yanjun Han, Jiantao Jiao, Chuan-Zheng Lee, Tsachy Weissman, Yihong Wu, and Tiancheng Yu (arXiv). Entropy estimation is now quite well-understood when one observes independent samples from a discrete distribution — we can get by with a barely-sublinear sample complexity, saving a logarithmic factor compared to the support size. This paper shows that these savings can also be enjoyed in the case where we observe a sample path of observations from a Markov chain.

Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance, by Yanjun Han, Jiantao Jiao, and Tsachy Weissman (arXiv). Speaking more generally of the above problem: there has been significant work into estimating symmetric properties of distributions, i.e., those which do not change when the distribution is permuted. One natural method for estimating such properties is to estimate the sorted distribution, then apply the plug-in estimator for the quantity of interest. The authors give an improved estimator for the sorted distribution, improving on the results of Valiant and Valiant.

INSPECTRE: Privately Estimating the Unseen, by Jayadev Acharya, Gautam Kamath, Ziteng Sun, and Huanyu Zhang (arXiv). One final work in this area — this paper studies the estimation of symmetric distribution properties (including entropy, support size, and support coverage), but this time while maintaining differentially privacy of the sample. By using estimators for these tasks with low sensitivity, one can additionally obtain privacy for a low or no additional cost over the non-private sample complexity.

News for January 2018

And now, for the first papers of 2018! It’s a slow start with only four papers (or technical three “standard property testing” papers, and one non-standard paper).

Adaptive Boolean Monotonicity Testing in Total Influence Time, by Deeparnab Chakrabarty and C. Seshadhri (arXiv ECCC). The problem of testing monotonicity of Boolean functions \(f:\{0,1\}^n \to \{0,1\}\) has seen a lot of progress recently. After the breakthrough results of Khot-Minzer-Safra giving a \(\widetilde{O}(\sqrt{n})\) non-adaptive tester, Blais-Belovs proved the first polynomial lower bound for adaptive testers, recently improved to \(O(n^{1/3})\) by Chen, Waingarten, and Xi. The burning question: does adaptivity help? This result shows gives an adaptive tester that runs in \(O(\mathbf{I}(f))\), the total influence of \(f\). Thus, we can beat these lower bounds (and the non-adaptive complexity) for low influence functions.

Adaptive Lower Bound for Testing Monotonicity on the Line, by Aleksandrs Belovs (arXiv). More monotonicity testing! But this time on functions \(f:[n] \to [r]\). Classic results on property testing show that monotonicity can be tested in \(O(\varepsilon^{-1}\log n)\) time. A recent extension of these ideas by Pallavoor-Raskhodnikova-Varma replace the \(\log n\) with \(\log r\), an improvement for small ranges. This paper proves an almost matching lower bound of \((\log r)/(\log\log r)\). The main construction can be used to give a substantially simpler proof of an \(\Omega(d\log n)\) lower bound for monotonicity testing on hypergrids \(f:[n]^d \to \mathbb{N}\). The primary contribution is giving explicit lower bound constructions and avoiding Ramsey-theoretical arguments previously used for monotonicity lower bounds.

Earthmover Resilience and Testing in Ordered Structures, by Omri Ben-Eliezer and Eldar Fischer (arXiv). While there has been much progress on understanding the constant-time testability of graphs, the picture is not so clear for ordered structures (such as strings/matrices). There are a number of roadblocks (unlike the graph setting): there are no canonical testers for, say, string properties, there are testable properties that are not tolerant testable, and Szemeredi-type regular partitions may not exist for such properties. The main contribution of this paper is to find a natural, useful condition on ordered properties such that the above roadblocks disappear hold, and thus we have strong testability results. The paper introduces the notion of Earthmover Resilient properties (ER). Basically, a graph properties is a property of symmetric matrices that is invariant under permutation of base elements (rows/columns). An ER property is one that is invariant under mild perturbations of the base elements. The natural special cases of ER properties are those over strings and matrices, and it is includes all graph properties as well as image properties studied in this context. There are a number of characterization results. Most interestingly, for ER properties of images (binary matrices) and edge-colored ordered graphs, the following are equivalent: existence of canonical testers, tolerant testability, and regular reducibility.

Nondeterminisic Sublinear Time Has Measure 0 in P, by John Hitchcock and Adewale Sekoni (arXiv). Not your usual property testing paper, but on sublinear (non-deterministic) time nonetheless. Consider the complexity class of \(NTIME(n^\delta)\), for \(\delta < 1\). This paper shows that this complexity class is a "negligible" fraction of \(P\). (The analogous result was known for \(\alpha < 1/11\) by Cai-Sivakumar-Strauss.) This requires a technical concept of measure for languages and complexity classes. While I don’t claim to understand the details, the math boils down to understanding the following process. Consider some language \(\mathcal{L}\) and a martingale betting process that repeatedly tries to guess the membership of strings \(x_1, x_2, \ldots\) in a well-defined order. If one can define such a betting process with a limited computational resource that has unbounded gains, then \(\mathcal{L}\) has measure 0 with respect to that (limited) resource.

News for December 2017

December 2017 concluded the year in style, with seven property testing papers spanning quite the range. Let’s hope 2018 will keep up on that trend! (And, of course, if we missed any, please point it out in the comments.The blame would be on our selves from the past year.)

We begin with graphs:

High Dimensional Expanders, by Alexander Lubotzky (arXiv). This paper surveys the recent developments in studying high dimensional expander graphs, a recent generalization of expanders which has become quite active in the past years and has intimate connections to property testing.

Generalized Turán problems for even cycles, by Dániel Gerbner, Ervin Győri, Abhishek Methuku, and Máté Vizer (arXiv).
A Generalized Turán Problem and its Applications, by Lior Gishboliner and Asaf Shapira (arXiv).
In these two independent works, the authors study questions of a following flavor: two subgraphs (patterns) \(H,H’\), what is the maximum number of copies of \(H\) which can exist in a graph \(G\) promised to be \(H’\)-free? They consider the case where the said patterns are cycles on \(\ell,k\) vertices respectively, and obtain asymptotic bounds on the above quantity (the two papers obtain somewhat incomparable bounds, and the first focuses on the case where both \(\ell,k\) are even). These estimates, in turn, have applications to graph removal lemmata, as discussed in the second work (Section 1.2): specifically, it implies the existence of a removal lemma with a
tight super-polynomial bound, a question which was previously open.

Approximating the Spectrum of a Graph, by David Cohen-Steiner, Weihao Kong, Christian Sohler, and Gregory Valiant (arXiv). The authors obtain constant-time and query algorithm for the task of approximating (in \(\ell_1\) norm) the spectrum of a graph \(G\), i.e. the eigenvalues of its Laplacian, given random query access to the nodes of \(G\) and to the neighbors of any given node. They also study the applications of this result to property testing in the bounded-degree model, showing that a large class of spectral properties of high-girth graphs is testable.

Then, we go quantum:

Schur-Weyl Duality for the Clifford Group with Applications: Property Testing, a Robust Hudson Theorem, and de Finetti Representations, by David Gross, Sepehr Nezami, and Michael Walter (arXiv).
Introducing and studying a duality theory for the Clifford group, the authors are able (among other results) to resolve an open question in quantum property testing, establishing a constant-query tester (indeed, making 6 queries) for testing whether an unknown quantum state is a stabilizer state. The previous best upper bound was linear in the number of qubits, as it proceeded by learning the state (“testing by learning”).

Quantum Lower Bound for a Tripartite Version of the Hidden Shift Problem, by Aleksandrs Belovs (arXiv). This work introduces and studies a generalization of (both) the hidden shift and 3-sum problems, and shows an \(\Omega(n^{1/3})\) lower bound on its quantum query complexity. The author also considers a property testing version of the problem, for which he proves a similar lower bound—interestingly, this polynomial lower bound is shown using the adversary method, evading the “property testing barrier” which states that (a restricted version of) this method cannot yield better than a constant-query lower bound.

And to conclude, a distribution testing paper:

Approximate Profile Maximum Likelihood, by Dmitri S. Pavlichin, Jiantao Jiao, and Tsachy Weissman (arXiv) This paper proposes an efficient (linear-time) algorithm to approximate the profile maximum likelihood of a sequence of i.i.d. samples from an unknown distribution, i.e. the probability distribution which, ignoring the labels of the samples and keeping only the collision counts, maximizes the likelihood of the sequence observed. This provides a candidate solution to an open problem suggested by Orlitsky in a FOCS’17 workshop (see also Open problem 84), and one which would have direct implications to tolerant testing and estimation of symmetric properties of distributions.

News for November 2017

A quiet month, with only two papers. Perhaps the calm before the storm? Please let us know in the comments if something slipped under our radar.

Agreement tests on graphs and hypergraphs, by Irit Dinur, Yuval Filmus, and Prahladh Harsha (ECCC). This work looks at agreement tests and agreement theorems, which argue that if one checks if a number of local functions agree, then there exists a global function which agrees with most of them. This work extends previous work on direct product testing to local functions of higher degree, which corresponds to agreement tests on graphs and hypergraphs.

Testing Conditional Independence of Discrete Distributions, by Clément L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart (arXiv). This paper focuses on testing whether a bivariate discrete distribution has independent marginals, conditioned on the value of a tertiary discrete random variable. More precisely, given realizations of \((X, Y, Z)\), test if \(X \perp Y \mid Z\). Unconditional independence testing (corresponding to the case when \(Z\) is constant) has been extensively studied by the community, with tight upper and lower bounds showing that the sample complexity has two regimes, depending on the tradeoff between the support size and the accuracy desired. This paper shows gives upper and lower bounds for this more general problem, showing a rich landscape depending on the relative value of the parameters.

New Book: Introduction to Property Testing, by Oded Goldreich

A tad less than one year and a half ago, we announced on this blog a then-upcoming book on property testing, by Oded Goldreich. Now, just in time for the Holiday season, or for perusing on the beach if you live in the south hemisphere, the book is out!

Covering a broad swath of topics in property testing, Introduction to Property Testing, by Oded Goldreich (published by Cambridge University Press) is, quoting Amazon:

An extensive and authoritative introduction to property testing, the study of super-fast algorithms for the structural analysis of large quantities of data in order to determine global properties. This book can be used both as a reference book and a textbook, and includes numerous exercises.

See the book’s website for more resources and the (detailed) table of contents.

News for October 2017

After the flood of papers over the past few months, it’s reduced to “merely” six papers this October. And only two papers on distribution testing 🙂 (Update: Gautam pointed out another paper that also does distribution testing under Wasserstein distance.)

Proofs of Proximity for Distribution Testing, by Alessandro Cheisa and Tom Gur (ECCC). Take 1 on distribution testing. As usual, we have a distribution \(\mathcal{D}\) over \([n]\), a property \(\mathbf{P}\) of distributions, and a proximity parameter \(\varepsilon > 0\). But our tester is now aided by a purported proof (or a prover). If \(\mathcal{D} \in \mathbf{P}\), there must exist a proof/prover strategy that makes the tester accept. If \(\mathcal{D}\) is \(\varepsilon\)-far from \(\mathbf{P}\), for any proof/prover strategy, the tester must reject with high probability. This paper studies a number of settings: deterministic vs randomized testers, proof (a la \(\mathbb{NP}\) or \(\mathbb{MA}\)) vs provers (a la \(\mathbb{IP}\)). There are a number of very intriguing results, so here are some highlights. For every property, there is a near-linear proof that allows for \(O(\sqrt{n})\) testers. For every property, either the proof length or the tester (sample) complexity is at least \(\Omega(\sqrt{s})\), where \(s\) is the optimal complexity of a vanilla tester. But there exist prover strategies that can beat this lower bound.

Wasserstein Identity Testing, by Shichuan Deng, Wenzheng Li, and Xuan Wu (arXiv). For Take 2 on distribution testing, this paper considers the problem on continuous distributions. In all results on distribution testing, the sample complexity depends on the support size. This breaks down in this setting, so this paper focuses on identity testing according to Wasserstein distance (as opposed to \(\ell_p\)-norms). A previous paper of Do Ba-Nguyen-Nguyen-Rubinfeld also considers the same problem, where the domain is assumed to be \([\Delta]^d\). In this paper, we assume that the domain \(X\) is endowed with a metric space, to allow for the definition of Wasserstein/earthmover distance between distributions. The final result is technical, depends on the size of nets in \(X\), and is shown to be optimal for testing equality with a known distribution \(p\). The paper also gives an instance optimal for (almost) all \(p\), a la Valiant-Valiant for the discrete case.

Improved Bounds for Testing Forbidden Order Patterns, by Omri Ben-Elizer and Clément Canonne (arXiv). Any function from \(f:D \to \mathbb{R}\), where \(D\) is a total order, can be thought to induce a permutation, based on the order of function values. Consider \(f:[n] \to \mathbb{R}\) and permutation \(\pi \in S_k\). The property of \(\pi\)-free permutations is the set of \(f\) such that no restriction of \(f\) (to a subdomain of size \(k\)) induces \(\pi\). Newman et al proved that this property can be tested non-adaptively in (ignoring \(\varepsilon\)) \(O(n^{1-1/k})\) samples. Furthermore, for non-monotone \(\pi\), there is a non-adaptive \(\Omega(\sqrt{n})\) lower bound. This paper has a number of results that shed significant light on the non-adaptive complexity. The upper bound is improved to \(O(n^{1-1/(k-1)})\), and there exist a class of permutations (for every \(k\)) for which this is the optimal complexity. Furthermore, for a random \(\pi\), the paper shows a lower bound of \(\Omega(n^{1-1/(k-3)})\). There is an intriguing conjecture on the optimal complexity for any \(k\), that has an intricate dependence on the structure of \(k\). On the adaptive side, there is an interesting hierarchy for testing \((1,3,2)\)-freeness, depending on the number of rounds of adaptivity. There is an \(O(n^{1/(r+1)})\) tester with \(r\)-rounds of adaptivity, and any such tester requires \(O(n^{1/2^{r+3}})\) queries.

A \(o(d)\cdot \mathrm{poly}\log n\) Monotonicity Tester for Boolean Functions
over the Hypergrid \([n]^d\)
, by Hadley Black, Deeparnab Chakrabarty, and C. Seshadhri (arXiv). Monotonicity testing has seen much progress in the past few years. This paper focuses on monotonicity testing over the hypergrid, of functions \(f:[n]^d \to \{0,1\}\). For \(n=2\) (the hypercube), the result of Khot-Minzer-Safra gives a \(\widetilde{\sqrt{d}}\) time (non-adaptive) tester. Previously, the best known tester for general \(n\) took \(O(d\log d)\) queries, by Berman-Raskhodnikova-Yaroslavtsev. This paper breaks the barrier of \(d\) queries for the hypergrid, by giving a \(O(d^{5/6}\log n)\) time tester. The main technical ingredient is a new isoperimetric inequality for the directed “augmented” hypergrid, where pairs differing on one coordinate by a power of 2 are also connected. The intriguing open question that remains is the existence of testers with query complexity sublinear in \(d\) and independent of \(n\).

Provable and practical approximations for the degree distribution using sublinear graph samples, by Talya Eden, Shweta Jain, Ali Pinar, Dana Ron, and C. Seshadhri (arXiv). The degree distribution of a simple, undirected graph \(G\) is the sequence \(\{n_d\}\), where \(n_d\) is the number of vertices of degree \(d\). The properties of the degree distribution have played a significant role in network science and the mathematical study of real-world graphs. It is often the first quantity computed in a “data analysis” of a massive graph. This paper addresses the problem of getting (bicriteria) estimates for all values of the degree distribution, in sublinear time. The main result gives sublinear time algorithms for computing degree distributions with sufficiently “fat” tails, as measured by certain fatness indices. For the important special case, when the degree distribution is a power law (\(n_d \propto 1/d^{\gamma}\)), this result yields strictly sublinear algorithms. Interestingly, the main result involves the h-index of the degree sequence, inspired by bibliographic metrics. This problem is practically important, and the paper demonstrates the quality of the sublinear algorithm in practice.

On the Complexity of Sampling Vertices Uniformly from a Graph, by Flavio Chierichetti and Shahrzad Haddadan (arXiv). This paper isn’t your traditional property testing paper, but is very relevant to those of us interested in graph property testing. One common query model in graph property testing is access to uniform random vertices. In practice though (think of a graph like the Facebook friendship network or the web graph), this is quite challenging. We typically have access to a few seed vertices, and we can “crawl” the graph. A natural approach is to perform a random walk (in the hope of mixing quickly) to generate random vertices. We might attempt rejection sampling or Metropolis-Hastings on top of that to get uniform random vertices. A recent result of Chierichetti et al give an algorithm for this problem using \(O(t_{mix} d_{avg})\) samples, where \(t_{mix}\) is the mixing time (of the random walk on the graph) and \(d_{avg}\) is the average degree. This paper proves that this bound is optimal, for most reasonable choices of these parameters.

FOCS 2017: Workshop on Distribution Testing

As it may have transpired by now, FOCS 2017 (a.k.a. the 58th Annual Symposium on Foundations of Computer Science) is around the corner.

I am not going to go into the details of the talks on or related to property testing at the conference itself,\({}^{(\ast)}\) but instead focus on (one of) the workshops.

This FOCS will include 3 workshops,\({}^{(\dagger)}\) on Saturday 14th (gird your loins — they last the full day!):

  • Linear Sketching as a Tool for Everything
  • Frontiers in Distribution Testing
  • Hardness Escalation in Communication Complexity and Query Complexity

The second,  Frontiers in Distribution Testing, is specifically about recent advances, new questions, and novel directions in property testing of distributions. Co-organized by Gautam Kamath and me, its official objective is to “catch the community up in recent developments, and highlight some of the most interesting frontiers in distribution testing.”

It’ll feature a stellar lineup of speakers\({}^{(\ddagger)}\) (namely, Ilias Diakonikolas, Jiantao Jiao, Alon Orlitsky, Constantinos Daskalakis, Ryan O’Donnell, Ronitt Rubinfeld, and Tom Gur), as well as an open problem session — and, very possibly, pictures of eggs. The program is available, with all relevant details, on the workshop webpage.

If you are around next Saturday, consider attending — and submit your favorite open problems!

\({}^{(\ast)}\) Hint: look at Sessions 7A and 10B.
\({}^{(\dagger)}\) Thanks to the FOCS workshop chairs James Lee and Aleksander Mądry.
\({}^{(\ddagger)}\) A lineup of stellar speakers.

News for September 2017

Summer came and went, and Fall begins at full throttle with a (metric) ton of papers. Eight that we counted — if any was missed, please mention it in the comments!

Efficient Removal without Efficient Regularity, by Lior Gishboliner, Asaf Shapira (arXiv). Obtaining efficient removal lemmata for graphs pattern (such as triangle, to name the most famous), that is removal results with bounds on the number of copies of the pattern that is not mind-blowingly huge like a tower of \(\varepsilon\), is a classic and longstanding problem. This work makes significant progress for the last remaining case, i.e. for the pattern \(C_4\): providing bounds that are merely exponential in \(\varepsilon\).

Local decoding and testing of polynomials over grids, by Srikanth Srinivasan, Madhu Sudan (arXiv, ECCC). In this work, the authors study the local decodability and local testability of error-correcting codes corresponding to low-degree polynomials on the grid \(\{0,1\}^n\) (over a field \(\mathbb{F}\supseteq \{0,1\}\)). Obtaining both positive and negative results on these, a consequence of their results is a separation between local testability and local decodability for a natural family of codes.

Lower Bounds for Approximating Graph Parameters via Communication Complexity, by Talya Eden and Will Rosenbaum (arXiv). This paper establishes an analogue of the framework of Blais, Brody, and Matulef (2012), which enabled one to obtain property testing lower bounds by a reduction from communication complexity, for the setting of graph parameter estimation. The authors then leverage this technique to give new and simpler proofs of lower bounds for several such estimation tasks.

A Note on Property Testing Sum of Squares and Multivariate Polynomial Interpolation, by Aaron Potechin and Liu Yang (arXiv). The authors introduce and study the question of testing “sum-of-square-ness,” i.e. the property of a degree-\(d\)-polynomial being a sum of squares. Specifically, they show that one-sided sample-based testers cannot do much better than the trivial approach, that is that they require sample complexity \(n^{\Omega(d)}\) — while learning the polynomial can be done with \(n^{O(d)}\) samples.

Sharp Bounds for Generalized Uniformity Testing, by Ilias Diakonikolas, Daniel Kane, and Alistair Stewart (arXiv, ECCC). Remember the post from last month, which included a paper on “Generalized Uniformity Testing”? Well, this paper more or less settles the question, establishing tight bounds on the sample complexity of testing whether an (unknown) probability distribution over an (unknown) discrete domain is uniform on its support, or far from every uniform distribution. Specifically, the authors significantly strengthen the previous upper bound, by getting the right dependence on \(\varepsilon\) for all regimes; and complement it by a matching worst-case lower bound.

Sample-Optimal Identity Testing with High Probability, by Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price  (ECCC). Usually, in property testing we do not care too much about the error probability \(\delta\): if one can achieve \(1/3\), then simple repetition can bring it down to \(\delta\) at the mild price of a \(\log(1/\delta)\) factor in the query/sample complexity. Is that necessary, though? This paper shows that for uniformity and identity testing of distributions, the answer is “no”: for some regimes, this repetition trick is strictly suboptimal, as one can pay instead only a multiplicative \(\sqrt{\log(1/\delta)}\). And quite interestingly, this improvement is achieved with the simplest algorithm one can think of: by considering the empirical distribution obtained from the samples.

A Family of Dictatorship Tests with Perfect Completeness for 2-to-2 Label Cover, by Joshua Brakensiek and Venkatesan Guruswami (ECCC). While I tried to paraphrase the original abstract, but my attempts only succeeded in making it less clear; and, for fear of botching the job, decided to instead quote said abstract: “[the authors] give a family of dictatorship tests with perfect completeness [that is, one-sided] and low-soundness for 2-to-2 constraints. The associated 2-to-2 conjecture has been the basis of some previous inapproximability results with perfect completeness. However, evidence towards the conjecture in the form of integrality gaps even against weak semidefinite programs has been elusive. [Their] result provides some indication of the expressiveness and non-triviality of 2-to-2 constraints, given the close connections between dictatorship tests and satisfiability and approximability of CSPs.”

A polynomial bound for the arithmetic \(k\)-cycle removal lemma in vector spaces, by Jacob Fox, László Miklós Lovász, and Lisa Sauermann (arXiv). And back to removal lemmata! This work proves a generalization of Green’s arithmetic \(k\)-cycle removal lemma, which held for any \(k\geq 3\) and abelian group \(G\); however, the bounds in this lemma were quite large — i.e., tower-ype. Here, the authors establish an efficient lemma (with polynomial bounds) for the case of the group \(\mathbb{F}_p^n\) (where \(p\geq 2\) is any fixed prime, and \(k\geq 3\)).

Update (10/04): Finally, a paper we covered last summerThe Dictionary Testing Problem, by Siddharth Barman, Arnab Bhattacharyya, and Suprovat Ghoshal, went under signficant changes. Now titled Testing Sparsity over Known and Unknown Bases, it now includes (in addition to the previous results) a testing algorithm for sparsity with regard to a specific basis: given a matrix \(A \in \mathbb{R}^{d \times m}\) and unknown input vector \(y \in \mathbb{R}^d\), does \(y\) equal \(Ax\) for some \(k\)-sparse vector \(x\), or is it far from all such representations?

Update (10/5): we missed a recent paper of Benjamin Fish, Lev Reyzin, and Benjamin Rubinstein on Sublinear-Time Adaptive Data Analysis (arXiv). While not directly falling into the umbrella of property testing, this work considers sublinear-time algorithms for adaptive data analysis — similar in goal and spirit to property testing.