Category Archives: Monthly digest

News for September 2018

Last month was slower than usual, with only one property testing paper*

Property testing and expansion in cubical complexes, by David Garber, Uzi Vishne (arXiv). Consider the question of testing if an arbitrary function \(f\colon V\times V \to\{-1,1\}\) is of the form \(f(x,y) = h(x)h(y)\) for some \(h\colon V\to\{-1,1\}\). An intuitive one-sided test, shown to work by Lubotzky and Kaufman (2014), is to pick uniformly random \(x,y,z\in V\) and check that \(f(x,y)f(y,z)f(z,x)=1\). This paper considers the high-dimensional generalization of testing the property that a function\(f\colon V\times V \times V\times V \to\{-1,1\}\) is of the form \(f(w,x,y,z) = \alpha\cdot h(w,x)h(y,x)h(y,z) h(w,z)\), for some \(h\colon V\times V\to\{-1,1\}\) and sign \(\alpha\in\{-1,1\}\). The authors derive necessary and sufficient conditions for testability of this property, by formulating it in the language of incidence geometry and exploiting this connection.

* If we missed any, please let us know in the comments.

News for August 2018

Three papers this month close out Summer 2018.

Test without Trust: Optimal Locally Private Distribution Testing, by Jayadev Acharya, Clément L. Canonne, Cody Freitag, and Himanshu Tyagi (arXiv). This work studies distribution testing in the local privacy model. While private distribution testing has recently been studied, requiring that the algorithm’s output is differentially private with respect to the input dataset, local privacy has this requirement for each individual datapoint. The authors prove optimal upper and lower bounds for identity and independence testing, using a novel public-coin protocol named RAPTOR which can outperform any private-key protocol.

Testing Graph Clusterability: Algorithms and Lower Bounds, by Ashish Chiplunkar, Michael Kapralov, Sanjeev Khanna, Aida Mousavifar, and Yuval Peres (arXiv). This paper studies the problem of testing whether a graph is \(k\)-clusterable (based on the conductance of each cluster), or if it is far from all such graphs — this is a generalization of the classical problem of testing whether a graph is an expansion. It manages to solve this problem under weaker assumptions than previously considered. Technically, prior work embedded a subset of the graph into Euclidean space and clustered based on distances between vertices. This work uses richer geometric structure, including angles between the points, in order to obtain stronger results.

Near log-convexity of measured heat in (discrete) time and consequences, by Mert Saglam (ECCC). Glancing at the title, it might not be clear how this paper relates to property testing. The primary problem of study is the quantity \(m_t = uS^tv\), where \(u, v\) are positive unit vectors and \(S\) is a symmetric substochastic matrix. This quantity can be viewed as a measurement of the heat measured at vector \(v\), after letting the initial configuration of \(u\) evolve according to \(S\) for \(t\) time steps. The author proves an inequality which roughly states \(m_{t+2} \geq t^{1 – \varepsilon} m_t^{1 + 2/t}\), which can be used as a type of log-convexity statement. Surprisingly, this leads to lower bounds for the communication complexity of the \(k\)-Hamming problem, which in turns leads to optimal lower bounds for the complexity of testing \(k\)-linearity and \(k\)-juntas.

News for July 2018

Three Four papers for July. New sublinear graph algorithms, distribution testing under new models, and sublinear matrix algorithms. Onward ho…
(Sorry Amit, for missing your paper on sublinear matrix algorithms.)

Metric Sublinear Algorithms via Linear Sampling, by Hossein Esfandiari and Michael Mitzenmacher (arXiv). Consider a weighted clique \(G = (V,E)\) where \(V\) is a set of points in a metric space and edge weights are metric distances. In this setting, sublinear algorithms are those that make \(o(n^2)\) edge queries. This paper studies problems like densest subgraph and maxcut in this setting. The key method is a sparsifying algorithm that achieves the following. (I paraphrase their language.) Consider a positive parameter \(\alpha\), and let \(w(e)\) denote the weight of edge \(e\). The aim is to get a subgraph \(H\) that contains every edge \(e\) (in \(G\)) with independent probability \(\min(w(e)/\alpha, 1)\). Furthermore, this subgraph should be obtained in time linear in the number of edges in \(H\) (hence the title of the paper). For problems like 1/2-approximating the densest subgraph and PTASes for maxcut, the results show that for a carefully chosen \(\alpha\), approximate solutions on \(H\) give solutions of comparable quality on \(G\). These results cleanly generalize to settings where edge weights satisfy triangle inequality with some multiplicative penalty.

Sublinear Algorithms for (\(\Delta\) + 1) Vertex Coloring, by Sepehr Assadi, Yu Chen, and Sanjeev Khanna (arXiv). Arguably, the first thing you learn about vertex coloring is that a graph with maximum degree \(\Delta\) admits a \((\Delta+1)\)-coloring, that can be found in linear time. But what about sublinear time/space? I like this! You take a simple classical fact, throw in sublinear constraints, and it opens up a rich theory. This paper shows a non-adaptive \(O(n^{3/2})\)-time algorithm for this problem, and gives a nearly matching lower bound. There are also results for streaming and parallel computation, but let’s focus on the sublinear result. It is remarkable that there is no loss in colors in going to sublinear time. (In contrast, the papers shows an \(\Omega(n^2)\) lower bound for constructing a maximal matching.) The main technical tool is a list coloring result, where each vertex is given a list of colors and much choose its own from that list. Obviously, if each list is \([\Delta + 1]\), such a coloring is possible. The paper proves that even if each list is an independent \(O(\log n)\)-sized sample of \([\Delta+1]\), a valid coloring is still possible. The final algorithm is pretty involved, and uses this meta-algorithm as a building block.

Anaconda: A Non-Adaptive Conditional Sampling Algorithm for Distribution Testing, by Gautam Kamath and Christos Tzamos (ECCC). The standard model for distribution testing is access to samples from the unknown distribution \(\mathcal{D}\) with support \([n]\). This has attracted much attention with a rich set of results, and the complexity classic problems of uniformity, identity, and equivalence are well understood. But there are alternate models, such as the model of conditional samples (Chakraborty-Fischer-Goldhirsh-Matsliah ’13 and Canonne-Ron-Servedio ’14). For any subset \(S \subseteq [n]\), we can get a random sample from \(\mathcal{D}\) restricted to \(S\). This adds an algorithmic dimension to distribution testing. This paper studies the power of non-adaptive conditional (NACOND) queries. The main result is that uniformity, identity, and equivalence are testable with \(\mathrm{poly}(\log n)\) queries. (There are existing \(\Omega(\log n)\) lower bounds for all these problems.) The heart of these algorithms is a procedure ANACONDA that tries to find a set \(S\) where some element has a high probability, relative to the mass of \(S\).

Sublinear-Time Quadratic Minimization via Spectral Decomposition of Matrices, by Amit Levi and Yuichi Yoshida (arXiv). When it comes to fundamental problems, it’s hard to beat quadratic minimization. Given a matrix \(A \in \mathbb{R}^{n\times n}\), we wish to find \(v \in \mathbb{R}^n\) that minimizes \(v^TAv\). (This is basically a singular value/vector problem.) One may have additional terms in the objective, depending on \(v^Tv\) or \(v^Tb\) (for fixed vector \(b\)). This paper gives sublinear algorithms for this problem. A natural approach is to simply subsample \(k\) rows and columns to get submatrix \(B\), solve the problem for \(B\), and hope for the best. This idea has a rich history from seminal work of Frieze-Kannan. Recently, Hayashi-Yoshida show that constant \(k\) (only depending on error parameters) suffice for getting a non-trivial approximation for this problem. Unfortunately, the solution error depends on the \(\ell_\infty\)-norm of the solution. This paper shows that for polylogarithmic \(k\), one can get an error depending on the \(\ell_2\)-norm of the solution. This is a significant improvement, especially for sparse solution vectors. The main technical workhorse is a new matrix decomposition theorem, that shows how any matrix can be written as a sum of a few block matrix, and a low-norm “error” matrix. Admirably, the paper shows a number of experiments,
showing the effectiveness of this technique for eigenvalue computations. It’s very nice to see how ideas from sublinear algorithms might have a practical impact.

News for June 2018

The summer gets off to a flying start, with three property testing papers, spanning differential privacy, distribution testing, and juntas in Gaussian space!

On closeness to \(k\)-wise uniformity, by Ryan O’Donnell and Yu Zhao (arXiv)
In this paper, the authors consider the following structural question about probability distributions over the Boolean hypercube \(\{-1,1\}^n\): ” what is the relation between total variation distance \(\delta\) to \(k\)-wise independence, and bound \(\varepsilon\) on the Fourier coefficients of the distribution on degrees up to \(k\)?”

While this question might seem a bit esoteric at first glance, it has direct and natural applications to derandomization, and of course to distribution testing (namely, to test \(k\)-wise independence and its generalization, \((\varepsilon, k)\)-wise independence of distributions over the hypercube).

The main contribution here is to improve (by a \((\log n)^{O(k)}\) factor) the bounds on \(\delta (n,k,\varepsilon)\) over the previous work by Alon et al. [AAK+07], making them either tight (for \(k\) even) or near-tight. To do so, the authors introduce a new hammer to the game, using linear programming duality in the proof of both their upper and lower bounds.

Property Testing for Differential Privacy, by Anna Gilbert and Audra McMillan (arXiv)
Differential privacy, as introduced by Dwork et al., needs no introduction. Property testing, especially on this website, needs even less. What about a combination of the two? Namely, given black-box access to an algorithm claiming to perform a differentially private computation, how to test whether this is indeed the case?

Introducing and considering this quite natural question for the first time, this work shows, roughly speaking, that testing differential privacy is hard. Specifically, they show that for many notions of differential privacy (pure, approximate, and their distributional counterparts), testing is either impossible or possible but not with a sublinear number of queries (even when the tester is provided with side information about the black-box). In other terms, as the authors put it: trusting the privacy of an algorithm “requires compromise by either the verifier or algorithm owner” (and, in the latter case, even then it’s not a simple matter).

Is your data low-dimensional?, by Anindya De, Elchanan Mossel, and Joe Neeman (arXiv)
(Well, is it?) To state it upfront, I am biased here, as it is a problem I was very eager to see investigated to begin with. To recap, the question is as follows: “given query access to some unknown Boolean-valued function \(f\colon \mathbb{R}^n \to \{-1,1\}\) over the high-dimensional space \(\mathbb{R}^n\) endowed with the Gaussian measure, how can one check whether \(f\) only depends on “few” (i.e., \(k \ll n\)) variables?”

This is the continuous, Gaussian version of the (quite famous) junta testing problem, which has gathered significant attention over the past years (the Gaussian version has, to the best of my knowledge, never been investigated). Now, the above formulation has a major flaw: specifically, it is uninteresting. In Gaussian space*, who cares about the particular basis I expressed my input vector in? So a more relevant question, and that that the authors tackle, is the more robust and natural one: “given query access to some unknown Boolean-valued function \(f\colon \mathbb{R}^n \to \{-1,1\}\) over the high-dimensional space \(\mathbb{R}^n\) endowed with the Gaussian measure, how can one check whether \(f\) only depends on a low-dimensional linear combination of the variables?” Or, put differently, does all the relevant information for \(f\) live in a low-dimensional subspace?

De, Mossel, and Neeman show how can do this, non-adaptively, with a query complexity independent of the dimension \(n\) (hurray!), but instead polynomial in \(k\), the distance parameter \(\varepsilon\), and the surface area \(s\) of \(f\). And since this last parameter may seem quite arbitrary, they also proceed to show that a polynomial dependence in this \(s\) is indeed required.

*”In Gaussian space, no one can hear you change basis?”

News for May 2018

Six papers for May, including new models, hierarchy theorems, separation results, resolution of conjectures, and a lot more fun stuff. A lot of things to read this month!

Lower Bounds for Tolerant Junta and Unateness Testing via Rejection Sampling of Graphs, by Amit Levi and Erik Waingarten (ECCC). This paper proves a number of new lower bounds for tolerant testing of Boolean functions, including non-adaptive \(k\)-junta testing and adaptive and non-adaptive unateness testing. Combined with upper bounds for these and related problems, these results establishes separation between the complexity of tolerant and non-tolerant testing for natural properties of Boolean functions, which have so far been elusive. As a technical tool, the authors introduce a new model for testing graph properties, termed the rejection sampling model. In this model, the algorithm queries a subset \(L\) of the vertices, and the oracle will sample an edge uniformly at random and output the intersection of the edge endpoints with the query set \(L\). The cost of an algorithm is measured as the sum of the query sizes. In order to prove the above lower bounds (in the standard model), they show a non-adaptive lower bound for testing bipartiteness (in their new model).

Hierarchy Theorems for Testing Properties in Size-Oblivious Query Complexity, by Oded Goldreich (ECCC). This work proves a hierarchy theorem for properties which are independent of the size of the object, and depend only on the proximity parameter \(\varepsilon\). Roughly, for essentially every function \(q : (0,1] \rightarrow \mathbb{N}\), there exists a property for which the query complexity is \(\Theta(q(\varepsilon))\). Such results are proven for Boolean functions, dense graphs, and bounded-degree graphs. This complements hierarchy theorems by Goldreich, Krivelevich, Newman, and Rozenberg, which give a hierarchy which depends on the object size.

Finding forbidden minors in sublinear time: a \(O(n^{1/2+o(1)})\)-query one-sided tester for minor closed properties on bounded degree graphs, by Akash Kumar, C. Seshadhri, and Andrew Stolman (ECCC). At the core of this paper is a sublinear algorithm for the following problem: given a graph which is \(\varepsilon\)-far from being \(H\)-minor free, find an \(H\)-minor in the graph. The authors provide a (roughly) \(O(\sqrt{n})\) time algorithm for such a task. As a concrete example, given a graph which is far from being planar, one can efficiently find an instance of a \(K_{3,3}\) or \(K_5\) minor. Using the graph minor theorem, this implies analogous results for any minor-closed property, nearly resolving a conjecture of Benjamini, Schramm and Shapira.

Learning and Testing Causal Models with Interventions, by Jayadev Acharya, Arnab Bhattacharyya, Constantinos Daskalakis, and Saravanan Kandasamy (arXiv). This paper considers the problem of learning and testing on causal Bayesian networks. Bayesian networks are a type of graphical model defined on a DAG, where each node has a distribution defined based on the value of its parents. A causal Bayesian network further allows “interventions,” where one may set nodes to have certain values. This paper gives efficient algorithms for learning and testing the distribution of these models, with \(O(\log n)\) interventions and \(\tilde O(n/\varepsilon^2)\) samples per intervention

Property Testing of Planarity in the CONGEST model, by Reut Levi, Moti Medina, and Dana Ron (arXiv). It is known that, in the CONGEST model of distributed computation, deciding whether a graph is planar requires a linear number of rounds. This paper considers the natural property testing relaxation, where we wish to determine whether a graph is planar, or \(\varepsilon\)-far from being planar. The authors show that this relaxation allows one to bypass this linear lower bound, obtaining a \(O(\log n \cdot \mathrm{poly(1/\varepsilon))}\) algorithm, complemented by an \(\Omega(\log n)\) lower bound.

Flexible models for testing graph properties, by Oded Goldreich (ECCC). Usually when testing graph properties, we assume that the vertex set is \([n]\), implying that we can randomly sample nodes from the graph. However, this assumes that the tester knows the value of \(n\), the number of nodes. This note suggests more “flexible” models, in which the number of nodes may be unknown, and we are only given random sampling access. While possible definitions are suggested, this note contains few results, leaving the area ripe for investigation of the power of these models.

 

News for April 2018

Three Four papers for April: a new take on linearity testing, results on sublinear algorithms with advice, histogram testing, and distributed inference problems. (Edit: Sorry Clément, for missing your paper on distributed inference!)

Testing Linearity against Non-Signaling Strategies, by Alessandro Chiesa, Peter Manohar, and Igor Shinkar (ECCC). This paper gives a new model for property testing, through the notion of non-signaling strategies. The exact definitions are quite subtle, but here’s a condensed view. For \(S \subseteq \{0,1\}^n\), let an \(S\)-partial function be one that is only defined on \(S\). Fix a “consistency” parameter \(k\). Think of the “input” as a collection of distributions, \(\{\mathcal{F}_S\}\), where each \(|S| \leq k\) and \(\mathcal{F}_S\) is a distribution of \(S\)-partial functions. We have a local consistency requirement: \(\{\mathcal{F}_S\}\) and \(\{\mathcal{F}_T\}\) must agree (as distributions) on restrictions to \(S \cap T\). In some sense, if we only view pairs of these distributions of partial functions, it appears as if they come from a single distributions of total functions. Let us focus on the classic linearity tester of Blum-Luby-Rubinfeld in this setting. We pick random \(x, y, x+y \in \{0,1\}^n\) as before, and query these values on a function \(f \sim {\mathcal{F}_{x,y,x+y}}\). The main question addressed is what one can say about \(\{\mathcal{F}_S\}\), if this linearity test passes with high probability. Intuitively (but technically incorrect), the main result is that \(\{\mathcal{F}_S\}\) is approximated by a “quasi-distribution” of linear functions.

An Exponential Separation Between MA and AM Proofs of Proximity, by Tom Gur, Yang P. Liu, and Ron D. Rothblum (ECCC). This result follows a line of work on understanding sublinear algorithms in proof systems. Think of the standard property testing setting. There is a property \(\mathcal{P}\) of \(n\)-bit strings, an input \(x \in \{0,1\}^n\), and a proximity parameter \(\epsilon > 0\). We add a proof \(\Pi\) that the tester (or the verifier) is allowed to use, and we define soundness and completeness in the usual sense of Arthur-Merlin protocols. For a \(\mathbb{MA}\)-proof of proximity, the proof \(\Pi\) can only depend on the string \(x\). In a \(\mathbb{AM}\)-proof of proximity, the proof can additionally depend on the random coins of the tester (which determine the indices of \(x\) queried). Classic complexity results can be used to show that the latter subsume the former, and this paper gives a strong separation. Namely, there is a property \(\mathcal{P}\) where any \(\mathbb{MA}\)-proof of proximity protocol (or tester) requires \(\Omega(n^{1/4})\)-queries of the input \(x\), but there exists an \(\mathbb{AM}\)-proof of proximity protocol making \(O(\log n)\) queries. Moreover, this property is quite natural; it is simply the encoding of permutations.

Testing Identity of Multidimensional Histograms, by Ilias Diakonikolas, Daniel M. Kane, and John Peebles (arXiv). A distribution over \([0,1]^d\) is a \(k\)-histogram if the domain can be partitioned into \(k\) axis-aligned cuboids where the probability density function is constant. Recent results show that such histograms can be learned in \(k \log^{O(d)}k\) samples (ignoring dependencies on accuracy/error parameters). Can we do any better for identity testing? This paper gives an affirmative answer. Given a known \(k\)-histogram \(p\), one can test (in \(\ell_1\)-distance) whether an unknown \(k\)-histogram \(q\) is equal to \(p\) in (essentially) \(\sqrt{k} \log^{O(d)}(dk)\) samples. There is a nearly matching lower bound, when \(k = \exp(d)\).

Distributed Simulation and Distributed Inference, by Jayadev Acharya, Clément L. Canonne, and Himanshu Tyagi (arXiv   ECCC). This papers introduces a model of distributed simulation, which generalizes distribution testing and distributed density estimation. Consider some unknown distribution \(\mathcal{D}\) with support \([k]\), and a “referee” who wishes to generate a single sample from \(\mathcal{D}\) (alternately, she may wish to determine if \(\mathcal{D}\) has some desired property). The referee can communicate with “players”, each of whom can generate a single independent sample from \(\mathcal{D}\). The catch is that each player can communicate at most \(\ell\) < \(log_2k\) bits (otherwise, the player can simply communicate the sampled element). How many players are needed for the referee to generate a single sample? The paper first proves that this task is basically impossible with a (worst-case) finite number of players, but can be done with expected \(O(k/2^\ell)\) players (and this is optimal). This can plugged into standard distribution testing results, to get inference results in this distributed, low-communication setting. For example, the paper shows that identity testing can be done with \(O(k^{3/2}/2^\ell)\) players.

News for March 2018

March has been a relatively slow month for property testing, with 3 works appearing online. (If you notice we missed some, please leave a comment pointing it out)

Edge correlations in random regular hypergraphs and applications to subgraph testing, by Alberto Espuny Díaz, Felix Joos, Daniela Kühn, and Deryk Osthus (arXiv). While testing subgraph-freness in the dense graph model is now well-understood, after a series of works culminating in a complete characterization of the testing problems which admit constant-query testers, the corresponding question for hypergraphs is far from resolved. In this paper, the authors develop new techniques for the study of study of random regular hypergraphs, which imply new testing results for subhypergraph-freeness testing, improving on the state-of-the-art for some parameter regimes (e.g., when the input graph satisfies some average-degree condition).

Back from hypergraphs to graphs, we also have:
The Subgraph Testing Model, by Oded Goldreich and Dana Ron (ECCC). Here, the authors introduce a new model for property testing of graphs, where the goal is to test if an unknown subgraph \(F\) of an explicitly given graph \(G=(V,E)\) satisfies the desired property. The testing algorithm is provided access to \(F\) via membership queries, i.e., through query access to the indicator function \(\mathbf{1}_F\colon E \to \{0,1\}\). (In some very hazy sense, this is reminiscent of the active learning or testing frameworks, where one gets more or less free access to unlabeled data but pays to see their label.) As a sample of the results obtained, the paper establishes that this new model and the bounded-degree graph model are incomparable: there exist properties easier to test in one model than the other, and vice-versa — and some properties equally easy to test in both.

And finally, to drive home the point that “models matter a lot,” we have our third paper:
Every set in P is strongly testable under a suitable encoding, by Irit Dinur, Oded Goldreich, and Tom Gur (ECCC). It is known that the choice of representation of the objects has a large impact in property testing: for instance, the complexity of testing a given property can change drastically between the dense and bounded-degree graph models. This work provides another example of such a strong dependence on the representation: while membership to some sets in \(P\) is known to be hard to test, the authors here prove that, for every set \(S\in P\), there exists a (polynomial-time, invertible) encoding \(E_S\colon \{0,1\}^\ast\to \{0,1\}^\ast\) such that testing membership to \(S\) under this encoding is easy. (They actually show even stronger a statement: namely, that under this encoding the set admits a “proximity-oblivious tester,” that is a constant-query testing algorithm which rejects with probability function of the distance to \(S\).)

Finally, on a non-property testing note: Edith Cohen, Vitaly Feldman, Omer Reingold, and Ronitt Rubinfeld recently wrote a pledge for inclusiveness in the TCS community, available here: https://www.gopetition.com/petitions/a-pledge-for-inclusiveness-in-toc.html
If you haven’t seen it already, we encourage you to read it.

Update: Fixed a mistake in the overview of the second paper; as pointed out by Oded in the comments, the main comparison was between the new model and the bounded-degree graph model, not the dense graph one.

News for February 2018

February had a flurry of conference deadlines, and they seem to have produced six papers for us to enjoy, including three on estimating symmetric properties of distributions.

Locally Private Hypothesis Testing, by Or Sheffet (arXiv). We now have a very mature understanding of the sample complexity of distributional identity testing — given samples from a distribution \(p\), is it equal to, or far from, some model hypothesis \(q\)? Recently,  several papers have studied this problem under the additional constraint of differential privacy. This paper strengthens the privacy constraint to local privacy, where each sample is locally noised before being provided to the testing algorithm.

Distribution-free Junta Testing, by Xi Chen, Zhengyang Liu, Rocco A. Servedio, Ying Sheng, and Jinyu Xie (arXiv). Testing whether a function is a \(k\)-junta is very well understood — when done with respect to the uniform distribution. In particular, the adaptive complexity of this problem is \(\tilde \Theta(k)\), while the non-adaptive complexity is \(\tilde \Theta(k^{3/2})\). This paper studies the more challenging task of distribution-free testing, where the distance between functions is measured with respect to some unknown distribution. The authors show that, while the adaptive complexity of this problem is still polynomial (at \(\tilde O(k^2)\)), the non-adaptive complexity becomes exponential: \(2^{\Omega(k/3)}\). In other words, there’s a qualitative gap between the adaptive and non-adaptive complexity, which does not appear when testing with respect to the uniform distribution.

The Vertex Sample Complexity of Free Energy is Polynomial, by Vishesh Jain, Frederic Koehler, and Elchanan Mossel (arXiv). This paper studies the classic question of estimating (the logarithm of) the partition function of a Markov Random Field, a highly-studied topic in theoretical computer science and statistical physics. As the title suggests, the authors show that the vertex sample complexity of this quantity is polynomial. In other words, randomly subsampling a \(\mathrm{poly}(1/\varepsilon)\)-size graph and computing its free energy gives a good approximation to the free energy of the overall graph. This is in contrast to more general graph properties, for the vertex sample complexity is super-exponential in \(1/\varepsilon\).

Entropy Rate Estimation for Markov Chains with Large State Space, by Yanjun Han, Jiantao Jiao, Chuan-Zheng Lee, Tsachy Weissman, Yihong Wu, and Tiancheng Yu (arXiv). Entropy estimation is now quite well-understood when one observes independent samples from a discrete distribution — we can get by with a barely-sublinear sample complexity, saving a logarithmic factor compared to the support size. This paper shows that these savings can also be enjoyed in the case where we observe a sample path of observations from a Markov chain.

Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance, by Yanjun Han, Jiantao Jiao, and Tsachy Weissman (arXiv). Speaking more generally of the above problem: there has been significant work into estimating symmetric properties of distributions, i.e., those which do not change when the distribution is permuted. One natural method for estimating such properties is to estimate the sorted distribution, then apply the plug-in estimator for the quantity of interest. The authors give an improved estimator for the sorted distribution, improving on the results of Valiant and Valiant.

INSPECTRE: Privately Estimating the Unseen, by Jayadev Acharya, Gautam Kamath, Ziteng Sun, and Huanyu Zhang (arXiv). One final work in this area — this paper studies the estimation of symmetric distribution properties (including entropy, support size, and support coverage), but this time while maintaining differentially privacy of the sample. By using estimators for these tasks with low sensitivity, one can additionally obtain privacy for a low or no additional cost over the non-private sample complexity.

News for January 2018

And now, for the first papers of 2018! It’s a slow start with only four papers (or technical three “standard property testing” papers, and one non-standard paper).

Adaptive Boolean Monotonicity Testing in Total Influence Time, by Deeparnab Chakrabarty and C. Seshadhri (arXiv ECCC). The problem of testing monotonicity of Boolean functions \(f:\{0,1\}^n \to \{0,1\}\) has seen a lot of progress recently. After the breakthrough results of Khot-Minzer-Safra giving a \(\widetilde{O}(\sqrt{n})\) non-adaptive tester, Blais-Belovs proved the first polynomial lower bound for adaptive testers, recently improved to \(O(n^{1/3})\) by Chen, Waingarten, and Xi. The burning question: does adaptivity help? This result shows gives an adaptive tester that runs in \(O(\mathbf{I}(f))\), the total influence of \(f\). Thus, we can beat these lower bounds (and the non-adaptive complexity) for low influence functions.

Adaptive Lower Bound for Testing Monotonicity on the Line, by Aleksandrs Belovs (arXiv). More monotonicity testing! But this time on functions \(f:[n] \to [r]\). Classic results on property testing show that monotonicity can be tested in \(O(\varepsilon^{-1}\log n)\) time. A recent extension of these ideas by Pallavoor-Raskhodnikova-Varma replace the \(\log n\) with \(\log r\), an improvement for small ranges. This paper proves an almost matching lower bound of \((\log r)/(\log\log r)\). The main construction can be used to give a substantially simpler proof of an \(\Omega(d\log n)\) lower bound for monotonicity testing on hypergrids \(f:[n]^d \to \mathbb{N}\). The primary contribution is giving explicit lower bound constructions and avoiding Ramsey-theoretical arguments previously used for monotonicity lower bounds.

Earthmover Resilience and Testing in Ordered Structures, by Omri Ben-Eliezer and Eldar Fischer (arXiv). While there has been much progress on understanding the constant-time testability of graphs, the picture is not so clear for ordered structures (such as strings/matrices). There are a number of roadblocks (unlike the graph setting): there are no canonical testers for, say, string properties, there are testable properties that are not tolerant testable, and Szemeredi-type regular partitions may not exist for such properties. The main contribution of this paper is to find a natural, useful condition on ordered properties such that the above roadblocks disappear hold, and thus we have strong testability results. The paper introduces the notion of Earthmover Resilient properties (ER). Basically, a graph properties is a property of symmetric matrices that is invariant under permutation of base elements (rows/columns). An ER property is one that is invariant under mild perturbations of the base elements. The natural special cases of ER properties are those over strings and matrices, and it is includes all graph properties as well as image properties studied in this context. There are a number of characterization results. Most interestingly, for ER properties of images (binary matrices) and edge-colored ordered graphs, the following are equivalent: existence of canonical testers, tolerant testability, and regular reducibility.

Nondeterminisic Sublinear Time Has Measure 0 in P, by John Hitchcock and Adewale Sekoni (arXiv). Not your usual property testing paper, but on sublinear (non-deterministic) time nonetheless. Consider the complexity class of \(NTIME(n^\delta)\), for \(\delta < 1\). This paper shows that this complexity class is a "negligible" fraction of \(P\). (The analogous result was known for \(\alpha < 1/11\) by Cai-Sivakumar-Strauss.) This requires a technical concept of measure for languages and complexity classes. While I don’t claim to understand the details, the math boils down to understanding the following process. Consider some language \(\mathcal{L}\) and a martingale betting process that repeatedly tries to guess the membership of strings \(x_1, x_2, \ldots\) in a well-defined order. If one can define such a betting process with a limited computational resource that has unbounded gains, then \(\mathcal{L}\) has measure 0 with respect to that (limited) resource.

News for December 2017

December 2017 concluded the year in style, with seven property testing papers spanning quite the range. Let’s hope 2018 will keep up on that trend! (And, of course, if we missed any, please point it out in the comments.The blame would be on our selves from the past year.)

We begin with graphs:

High Dimensional Expanders, by Alexander Lubotzky (arXiv). This paper surveys the recent developments in studying high dimensional expander graphs, a recent generalization of expanders which has become quite active in the past years and has intimate connections to property testing.

Generalized Turán problems for even cycles, by Dániel Gerbner, Ervin Győri, Abhishek Methuku, and Máté Vizer (arXiv).
A Generalized Turán Problem and its Applications, by Lior Gishboliner and Asaf Shapira (arXiv).
In these two independent works, the authors study questions of a following flavor: two subgraphs (patterns) \(H,H’\), what is the maximum number of copies of \(H\) which can exist in a graph \(G\) promised to be \(H’\)-free? They consider the case where the said patterns are cycles on \(\ell,k\) vertices respectively, and obtain asymptotic bounds on the above quantity (the two papers obtain somewhat incomparable bounds, and the first focuses on the case where both \(\ell,k\) are even). These estimates, in turn, have applications to graph removal lemmata, as discussed in the second work (Section 1.2): specifically, it implies the existence of a removal lemma with a
tight super-polynomial bound, a question which was previously open.

Approximating the Spectrum of a Graph, by David Cohen-Steiner, Weihao Kong, Christian Sohler, and Gregory Valiant (arXiv). The authors obtain constant-time and query algorithm for the task of approximating (in \(\ell_1\) norm) the spectrum of a graph \(G\), i.e. the eigenvalues of its Laplacian, given random query access to the nodes of \(G\) and to the neighbors of any given node. They also study the applications of this result to property testing in the bounded-degree model, showing that a large class of spectral properties of high-girth graphs is testable.

Then, we go quantum:

Schur-Weyl Duality for the Clifford Group with Applications: Property Testing, a Robust Hudson Theorem, and de Finetti Representations, by David Gross, Sepehr Nezami, and Michael Walter (arXiv).
Introducing and studying a duality theory for the Clifford group, the authors are able (among other results) to resolve an open question in quantum property testing, establishing a constant-query tester (indeed, making 6 queries) for testing whether an unknown quantum state is a stabilizer state. The previous best upper bound was linear in the number of qubits, as it proceeded by learning the state (“testing by learning”).

Quantum Lower Bound for a Tripartite Version of the Hidden Shift Problem, by Aleksandrs Belovs (arXiv). This work introduces and studies a generalization of (both) the hidden shift and 3-sum problems, and shows an \(\Omega(n^{1/3})\) lower bound on its quantum query complexity. The author also considers a property testing version of the problem, for which he proves a similar lower bound—interestingly, this polynomial lower bound is shown using the adversary method, evading the “property testing barrier” which states that (a restricted version of) this method cannot yield better than a constant-query lower bound.

And to conclude, a distribution testing paper:

Approximate Profile Maximum Likelihood, by Dmitri S. Pavlichin, Jiantao Jiao, and Tsachy Weissman (arXiv) This paper proposes an efficient (linear-time) algorithm to approximate the profile maximum likelihood of a sequence of i.i.d. samples from an unknown distribution, i.e. the probability distribution which, ignoring the labels of the samples and keeping only the collision counts, maximizes the likelihood of the sequence observed. This provides a candidate solution to an open problem suggested by Orlitsky in a FOCS’17 workshop (see also Open problem 84), and one which would have direct implications to tolerant testing and estimation of symmetric properties of distributions.