News for February 2015

The following three very interesting papers came out in the month of February.

A polynomial regularity lemma for semi-algebraic hypergraphs and its applications in geometry and property testing by Jacob Fox, Janos Pach and Andrew Suk (arXiv). Testability of graph or hyper-graph properties is all about regularity. Alon and Shapira used regularity lemma for graphs to prove that every hereditary property of graphs is testable. Generalization of this result for hyper-graphs has been obtain recently. The upper bound on the query complexity for testing hereditary properties for graphs (or hyper-graphs) is a function that is independent of the size of the input but dependent on the promise parameter. The query complexity as a function of the promise parameter, ε, is the same as the function that appears in the regularity lemma (where ε is the approximation parameter), which is usually a tower of ε. In this paper it has been shown that for special hereditary graph properties (and for other generalized objects) the constant as a function of the approximation parameter is polynomial. Thus for these special classes of hereditary properties (for example H-freeness, where H is k-uniform) the query complexity for testing is a polynomial in the promise parameter.  This is an important step towards understanding the dependence of the query complexity for testing hereditary graph (and hyper-graph) properties on the promise parameter.

Proofs of Proximity for Context-Free Languages and Read-Once Branching Programs by Oded Goldreich, Tom Gur and Ron Rothblum (ECCC). In a recent paper of Gur and Rothblum, they studied the property testing model where the tester/verifier is given free access to a short proof in addition to the query access to the input. A related model called, interactive proof of proximity, by proposed by Rothblum, Vadhan and Wigderson where the verifier is allowed to interact with an all powerful prover.  There is a tradeoff between the size of the short proof or the amount of interaction and the amount of queries made to the input. In both these models the verifier is more powerful than in the traditional property testing setting  So for properties that are not testable in the standard property testing setting, understanding the query complexity in the stronger models is a natural question. In this paper context-free languages and small read-once branching programs has been studied.

Sample Complexity for Winner Prediction in Elections by Arnab Bhattacharyya and Palash Dey (arXiv). Number of random samples necessary to predict the winner in an election, where there are a number of contestants, is not only an interesting theoretical problem but also has lots of real life applications. Different voting rules may have different complexity in terms of number of samples required. In this paper a number of different standard voting rules have been considered and bounds for number of samples required has been presented. It is possibly the first time sampling has been analyzed in the context of social choice theory. This is indeed a area where a lot more theoretical works are expected to follow.

Sublinear Algorithms Day at MIT on April 10

An announcement from Gautam that will be of interest to readers who will be in (or are looking for a good reason to go to!) New England next month:

On Friday, April 10th, MIT will be hosting the second Sublinear Algorithms Day. This event will bring together researchers in the northeast for a day of interaction and discussion.

Sublinear Day will feature talks by five experts in the areas of sublinear and streaming algorithms: Costis Daskalakis, Robert Krauthgamer, Jelani Nelson, Shubhangi Saraf, and Paul Valiant — each giving a 45-minute presentation on the hot and latest developments in their fields.

Additionally, for the first time this year, we will have a poster session! We strongly encourage young researchers (particularly students and postdocs) to present work related to sublinear algorithms. Abstract submission details are available here.

So what are you waiting for? Registration is available here, and we hope to see you at the event!

Contact with any questions.

Gautam Kamath

Open problem for February 2015

Today’s post by Clément Canonne.

Following the Boolean monotonicity testing bonanza, here’s an open problem. In short, does adaptivity help for monotonicity testing of Boolean functions?

Problem: Consider the problem of monotonicity testing for Boolean functions on the hypercube. Given oracle access to \(f\colon \{0,1\}^n \to \{0,1\}\), we wish to decide if \(f\) is (i) monotone vs. (ii) \(\epsilon\)-far from monotone (in Hamming distance). For either the one-sided or two-sided version of the problem, what is the exact status of adaptive testers?

State of the art:
Fischer et al. [FLN+02] showed one-sided non-adaptive testers require \(\sqrt{n}\) queries. This implies an \(\Omega(\log n)\) lower bound for one-sided adaptive testers.
Chen et al. [CDST15] proved that two-sided non-adaptive testers require (essentially) \(\Omega(\sqrt{n}/\epsilon^2)\) queries. This implies an \(\Omega(\log n)\) lower bound for 2-sided adaptive testers.
Khot et al. [KMS15] recently gave a one-sided non-adaptive tester making \(\tilde{O}(\sqrt{n}/\epsilon^2)\) queries. The story is essentially complete for non-adaptive testing.

Comments: As of now, it is not clear whether adaptivity can help. Berman et al. [BRY14] showed the benefit of adaptivity for Boolean monotonicity testing over the domain \([n]^2\) (switch the \(2\) and the \(n\) from the hypercube). A gap provably exists between adaptive and non-adaptive testers: \(O(1/\epsilon)\) vs. \(\Omega(\log(1/\epsilon)/\epsilon)\).


[FLN+02] E. Fischer, E. Lehman, I. Newman, S. Raskhodnikova, R. Rubinfeld, and A. Samorodnitsky. Monotonicity testing over general poset domains. Symposium on Theory of Computing, 2002

[BRY14] P. Berman, S. Raskhodnikova, and G. Yaroslavtsev. \(L_p\) testing. Symposium on Theory of Computing, 2014

[CDST15] X. Chen, A. De, R. Servedio, L.-Y. Tang. Boolean function monotonicity testing requires (almost) \(n^{1/2}\) non-adaptive queries. Symposium on Theory of Computing, 2015

[KMS15] S. Khot, D. Minzer, and S. Safra. On monotonicity testing and Boolean Isoperimetric type theorems. ECCC, 2015

News for January 2015

We ended our last post discussing results from 2014 by discussing the monotonicity testing conjecture that \(O(\sqrt{n})\) queries suffice to test whether a Boolean function is monotone.  As it turns out, a proof of this conjecture was just around the corner…

On Monotonicity Testing and Boolean Isoperimetric type Theorems by Subhash Khot, Dor Minzer, and Muli Safra (ECCC). This remarkable paper shows that, indeed, we can test whether \(f : \{0,1\}^n \to \{0,1\}\) is monotone with \(O(\sqrt{n})\) queries using a natural tester. The authors establish this result by building on the connection between monotonicity testers and directed analogues of isoperimetric inequalities on the Boolean hypercube first established by Chakrabarty and Seshadhri. Specifically, the authors establish a directed analogue of a classic inequality of Talagrand and combine it with a number of other interesting innovations to analyze their monotonicity tester.

Big Data on the Rise: Testing monotonicity of distributions by Clément Canonne (arXiv). Another setting in which monotonicity testing has received a lot of attention recently is in the setting of testing properties of distributions. A distribution \(D\) on \(\{1,2,\ldots,n\}\) is monotone (non-increasing) if its probability mass function is non-increasing: \(D(1) \ge D(2) \ge \cdots \ge D(n)\). When we must test whether a function is monotone by drawing samples from the distribution, \(\tilde{\Theta}(\sqrt{n})\) samples are both necessary and sufficient. This work shows that, in contrast, testers with stronger query access (such as conditional sampling) to the distribution can test monotonicity much more efficiently.

News for December 2014

Happy new year! I just looked back at our archives, and saw that PTReview has been on from July 2013. I’m happy (and maybe mildly surprised) that it’s still going strong. And so the chronicling of \(o(n)\) continues…

Much coolness we have to report: permutation testing, linearity testing, distribution testing, and monotonicity testing. Without further ado:

Large permutations and parameter testing by Roman Glebov, Carlos Hoppen, Tereza Klimosova, Yoshiharu Kohayakawa, Daniel Kral, and Hong Liu (arXiv). Just as typical dense graph testing involves checking properties on a random, constant-sized induced subgraph, we can look at permutation testers that test properties of permutations by sampling sub-permutations. The theory of dense graph testing is closely tied to the Szemeredi regularity lemma, the notion of graph limits, and the behavior of subgraph densities. (Check out this survey by Borgs et al.) An analogous theory for permutations has been built by a subset of the authors in a series of papers (survey). There is a notion of permutation properties/parameters that are forcible, meaning that two permutations that have similar frequencies of (a fixed, finite set of) subpermutations have close values of the parameter. This seems naturally connected to testing. Indeed, a permutation parameter that is forcible can be approximated in constant-time by simply approximating the frequencies of the subpermutations. This paper shows that converse is false: there is an constant-time approximable property that is not forcible.

A Self-Tester for Linear Functions over the Integers with an Elementary Proof of Correctness by Sheela Devadas and Ronitt Rubinfeld (arXiv). The venerable property of linearity testing needs no introduction, and has enough history to fill up this month’s post (and the next and the next). But what if the domain is n-bit integers and we care about running time in terms of bit operations? Suppose we wish to check if a program supposed to compute \(f(x) = b\cdot x\) (for fixed, known \(b\)). Inputs like \(2^k\) are easy to evaluate by the checker, and could be used to get faster checkers. This paper gives testers for linearity, multivariate linear functions, and multilinear functions, where computation time is always linear in sample complexity.

\(\ell_p\) Testing and Learning of Discrete Distributions by Bo Waggoner (arXiv). Distribution testing needs little introduction, and we have seen much progress over our time in PTReview. Let’s start with the basics. The seminal work of Batu et al. and later Paninksi showed that the sample complexity of testing uniformity of a distribution \(\mathcal{D}\) over universe \([n]\) is \(\Theta(\sqrt{n}/\epsilon^2)\). Meaning, the sample complexity of checking \(\|\mathcal{D} – \mathcal{U}\|_1 > \epsilon\) (where \(\mathcal{U}\) is uniform on \([n]\)) is \(\Theta(\sqrt{n}/\epsilon^2)\). But what if we had \(\ll \sqrt{n}/\epsilon^2\) samples? From prior work, nothing could be inferred. (Update: As Ilias Diakonikolas pointed out to me, the \(\ell_2\) results were previously known, both by the Batu et al. work and a recent paper by Chan et al. that settles the \(\ell_2\) question.) This paper changes that, and says that we can still infer something about other \(\ell_p\) norms, \(\|\mathcal{D} – \mathcal{U}\|_p\). What impressed me about this paper is the detailed understand of the interplay between \(n, \epsilon, p\). For example, the sample complexity of uniformity testing over \(\ell_p\)-norm for any \(p > 1\) is independent of \(n\). There are many, many results in this paper with an intriguing threshold phenomenon at \(p=2\). For distribution testing in practice, I would think that this result would be of much significance.

New algorithms and lower bounds for monotonicity testing by Xi Chen, Rocco A. Servedio, and Li-Yang Tan (arXiv). Ah yes, Boolean monotonicity testing. Consider the standard coordinate wise partial order on \(\{0,1\}^n\), given by \(\prec\). A function \(f:\{0,1\}^n \rightarrow \{0,1\}\) is monotone if \(\forall x \prec y, f(x) \leq f(y)\). The complexity of property testing (Boolean) monotonicity is one of those tantalizing, simple-to-state questions that is still quite open. I’ll spare you the full story and the epsilons, but here’s the status. The best upper bound is a non-adaptive, one-sided \(O(n^{7/8})\) tester by Chakrabarty and Seshadhri. The best lower bound is a non-adaptive, one-sided lower bound of \(\Omega(\sqrt{n})\) by Fischer et al. This implies an \(\Omega(\log\log n)\) lower bound for general testers. This paper changes all of this. The authors prove a \(\Omega(n^{1/5})\) lower bound for two-sided non-adaptive testers, leading to an exponentially better \(\Omega(\log n)\) lower bound for general testers. The main insight is to focus on monotone vs non-monotone families of linear threshold functions, and show that procedures making few (non-adaptive) queries cannot distinguish between such families. The main hammer is to use recent Central Limit Theorems. As an added bonus, this paper improves the upper bound for monotonicity testing to \(O(n^{5/6})\), with a modified tester (and better analysis) of Chakrabarty and Seshadhri. But can the lower bound be improved?

Boolean function monotonicity testing requires (almost) \(n^{1/2}\) non-adaptive queries by Xi Chen, Anindya De, Rocco A. Servedio, and Li-Yang Tan (arXiv). Yes, the lower bound can be improved. This paper gets the non-adaptive two-sided bound to (almost) \(\Omega(\sqrt{n})\), matching the one-sided Fischer et al bound. The paper proves improved Central Limit Theorems, tailored for this application. The authors, and I with them, believe that this is the true complexity. At that intriguing note, we end 2014!

News for November 2014

With the submission deadline for many conferences in November it is no surprise that we have good a number of nice articles related to property testing submitted in ArXiv and ECCC this month. Here they are:

Sunflowers and Testing Triangle-Freeness of Functions by Ishay Haviv and Ning Xie (arXiv). Testing triangle freeness is one of the central problems in property testing. The upper bound on the query complexity is obtained using the regularity lemma. The upper bound is constant that comes from the regularity lemma. The lower bound (in terms of the distance parameter, ε) is a polynomial in ε. In another line of recent works different variations of sunflower lemma has been studied.  Some important problems in complexity theory and algorithms, like matrix multiplication, has been connected to different variations of the sunflower lemma. In this work a new approach, using variations of sunflower lemma, is given to obtain lower bounds for triangle freeness. This work gives a new lower bound technique. It also reconfirms the need to understand the combinatorial problem of sunflower lemma and its variations.

Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts by Madhav Jha, C. Seshadhri and Ali Pinar (arXiv). Counting or approximately counting the number of occurrences of a particular subgraph in a given graph is a very important problem – not only as a mathematically interesting problem but also from the application point of view. The hard part is to design an algorithm that is provably good as well as empirically satisfactory. There is a huge line of research on this topic that solves this problem for a particular subgraph.  In this paper a 3-path sampling based algorithm is presented that can approximate the number of occurrences of any 4-vertex subgraph. Not only is the correctness and the running time of the algorithm is proved mathematically but the speed of the algorithm has been tested empirically also. This work should motivate us to design sub-linear algorithms that are both provably good as well as can be implementable.

Forrelation: A Problem that Optimally Separates Quantum from Classical Computing by Scott Aaronson and Andris Ambainis (arXiv). Understanding the power of quantum query over classical queries is well studied subject and has a long line of interesting results. A long standing open problem posed by Buhrman et al is related to the problem of what is the largest separation between classical query complexity and quantum query complexity.  In this paper it is shown that the Forrelation problem, where one needs to decide whether one Boolean function is highly correlated with the Fourier transform of a second function, can be solved using 1 quantum query where as classically one needs Ω(√n/log n) number of queries. They also prove this is almost tight. This improves the previous know gap of 1 vs. Θ(n^{1/4}).

Query complexity in expectation by Jedrzej Kaniewski, Troy Lee and Ronald de Wolf (arXiv). In query complexity usually we consider the number of queries required for computing a particular given function f(x) exactly when x has to be queried. But if we are required to only output a value whose expected value (expectation taken over random coin tosses of the algorithm) is f(x), then what happens to the query complexity.  This natural question has been address in this paper for the first time. They prove that in this model both the randomized and quantum query complexity is equal to two natural definition of degree for the function f. This is very natural model of computing and seems like a very interesting direction to pursue. it should yield some nice results and applications in near future.

A Chasm Between Identity and Equivalence Testing with Conditional Queries by Jayadev Acharya, Clément L. Canonne and Gautam Kamath (arXiv). Recently a number of works have studied the power of conditional sampling. Conditional sampling in general gives lot more power compared to standard sampling model. For example testing identity of an unknown distribution, with support size n, can be done using constant number of conditional sampling, while in the standard sampling model Θ(√n) number of samples are required. For a related problem of testing equivalence of two unknown distributions, in the standard sampling model Θ(n^{2/3}) number of queries are required while only polylog(n) conditional samples are sufficient.  It was not known if this bound for conditional samples is tight for this problem of testing equivalence of unknown distribution. In this paper a lower bound of Ω(√(loglog n)) is proved.

News for October 2014

Halloween came early for the property testing community this year, with lots of treats throughout the month of October.

There was FOCS 2014, with a number of relevant contributions including On Learning and Testing Dynamic Environments by Oded Goldreich and Dana Ron (discussed previously here), New algorithms and lower bounds for monotonicity testing by Xi Chen, Rocco Servedio, and Li-Yang Tan and An Automatic Inequality Prover and Instance Optimal Identity Testing by Greg and Paul Valiant.

There were also two workshops on the first day of FOCS with very close connections to property testing. The first, Sparse Fourier Transform: Theory and Applications, was centered around recent developments in sublinear-time algorithms for computing the Fourier transform of signals that have a sparse spectrum (or are close to it). The second, Higher-order Fourier Analysis, discussed recent applications of this area of research in computer science; one of which is in testing algebraic properties of functions. The slides for the talks at these workshops are available online.

Another treat (or a promise of future treats) came in the form of the list of accepted papers for ITCS 2015. As we can see from the list, there will be many presentations at the conference on property testing topics, and we can look forward to lots of interesting reading when these papers are posted online.

Finally, we have this month’s contributions:

Gowers Norm, Function Limits, and Parameter Estimation by Yuichi Yoshida. (arXiv) There has been a lot of recent progress on the problem of characterizing the set of algebraic properties of functions that can be tested with a constant number of queries. This line of work has been notable not just for its success, but also for the connections it has established between property testing and other areas of mathematics (such as higher-order Fourier analysis, for example). In this work, this problem (and generalizations of it) is revisited from a different angle, by considering a new notion of function limits. As the results in the paper show, this notion leads to alternative proofs of the constant-query testability of many classic algebraic properties of functions, and promises to offer a fruitful new line of inquiry.

Testing Identity of Structured Distributions by Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin. (arXiv) The most fundamental problem in distribution testing is that of testing identity: given sample access to some unknown distribution q, is it identical (or far from) some known distribution p? This problem has been studied extensively and is well-understood: \(\Theta(\sqrt{n})\) queries are both necessary and sufficient to test identity of distributions with support size \(n\). The current paper considers a natural follow-up question: if our distributions p and q are not arbitrary distributions but instead come from some (potentially large but structured) class of distributions, can we test identity more efficiently in this setting? The results of the paper give an emphatic affirmative answer to this question, showing that for many natural classes of distributions, identity can be tested quite efficiently.

Testing Poisson Binomial Distributions by Jayadev Acharya and Constantinos Daskalakis. (arXiv) This paper considers another fundamental problem in distribution testing: how many samples must we draw from some unknown distribution q on \([n]\) to test whether it is a sum of Bernoulli distributions or not? Such distributions are known as Poisson Binomial distributions, and the current paper gives tight upper and lower bounds of \(\Theta(n^{1/4})\) samples for this testing task.

News for September 2014

It is unfortunate that in the month of september we had very few papers in theory that appeared in ECCC or Arxiv. And of them I could not find any paper that falls in the category of property testing / query complexity or sublinear algorithms.  It is possible I have overlooked some papers. Please do point out any paper that you may think relevant that appeared in the month of September.


New for August 2014

We’ve got a nice collection of papers this August, ranging from graph property testing, locally testable codes, to new lower bound approaches.

Deconstructions of Reductions from Communication Complexity to Property Testing using Generalized Parity Decision Trees by Roei Tell (ECCC). Blais, Brody, and Matulef (BBM) introduced the beautiful and powerful method of using communication complexity to derive property testing lower bounds. This result is all about understanding this approach is greater depth, and really trying to get at the crux of the reductions. The BBM approach uses rather heavy hammers (communication complexity lower bounds), so could we get away with weaker hammers to get property testing lower bounds? The key insight is that less powerful models can be reduced to existing property testing problems, and these are equivalent to the BBM reductions. While this (as of now) does not lead to newer bounds, it does shed significant light on existing bounds. The weaker model is essentially that of linear decision trees. Such reductions were also discovered by Brushundi, Chakraborty, and Kulkarni, proving \(\Omega(k)\) lower bounds for testing \(k\)-linearity.

Locally Correctable and Testable Codes Approaching the Singleton Bound by Or Meir (ECCC). Locally Testable Codes (LTCs) are codes that have a constant (or polylogarithmic) time tester. It is known that such codes with minor caveats cannot have a constant rate (a number of papers on this topic: Katz and Trevisan, Dinur and Kaufman, Ben-Sasson and Viderman). But when the query complexity can be \(n^\beta\) (where \(n\) is the codeword size), the rate can be arbitrarily close to \(1\) (Viderman, and more refs in the paper). This result takes this fact to the very extreme. The Singleton Bound is a lower bound between the tradeoff between rate and relative distance of a code. Amazingly, this lower bound tradeoff can be achieved by an LTC (and also Locally Correctable Code) testable with \(n^\beta\) queries!

Complexity of Nondeterministic Graph Parameter Testing by Marek Karpinksi and Ronald Mark&#243 (arXiv). It is easiest to think of nondeterminstic graph testing with the example of testing bipartiteness for dense graphs. Suppose we wish to test \(G\) for bipartiteness. As a “certificate”, we are provided a candidate partition \(V = V_1 \cup V_2\). There is a “simple” constant-time tester that can property testing if \(G\) is bipartite with respect to this partition. (Just sample two uniform random vertices, and check if the edge between them respects the bipartition.) So we say that bipartiteness is non-determinstically testable: if \(G\) is far from being bipartite, no certificate can convince the simple tester. The full definition is somewhat complex, but it is basically the same idea. (In this context, “testablility” always refers to constant time testable.) Lovász and Vestergombi proved the striking result that any non-deterministically testable property is also deterministically testable, but their proof did not give an explicit bound on the query complexity. Gishboliner and Shapira fixed that, and gave a tower-type bound for the query complexity. So if the query complexity of the underlying “simple” tester is \(q\), then the non-deterministic property has a tester whose query complexity is a tower of height \(q\). This results goes much further, and shows the final query complexity is at most triply exponential in \(q\). This is quite significant, since it is uses a weaker regularity approach than previous results.

News for July 2014

In this month’s post, we review papers that raise two questions that I find particularly interesting. First, can sublinear algorithms find applications in efficient approximation of CSPs? And second, can property testing be extended to give meaningful results in settings where Hamming distance is not the “right” measure of distance between functions?

Going for Speed: Sublinear Algorithms for Dense r-CSPs by Grigory Yaroslavtsev (arXiv). In their seminal paper, Goldreich, Goldwasser, and Ron considered the MAX-CUT problem. They showed that both the testing variant of the problem (where the algorithm must distinguish between graphs with a cut of size \(s\) from those with maximum cut size at most \(s – \epsilon N^2\)) and the approximation variant (where the algorithm must find a cut that has at most \(\epsilon N^2\) fewer edges than the largest cut in the graph) can be solved in sublinear time. This result inspired the very successful line of research on testing properties of graphs. We can also view this result as motivation for studying the approximability of general CSPs: given some constraint satisfaction problem, can we find an assignment that satisfies almost as many constraints as the maximally satisfying assignment in sublinear time? This current paper begins the exploration of this question with an improved bound on the query complexity of the MAX-CUT problem and with the first general result for approximating CSPs of arity greater than 2 (i.e., where the constraints are on more than 2 variables).

Lp testing by Piotr BermanSofya Raskhodnikova, and Grigory Yaroslavtsev (preprint). Almost all of the work in testing properties of functions is based on the Hamming distance: for a given property P, a tester is required to accept functions that have the property P and to reject the functions that differ from every function with the property P on an \(\epsilon\) fraction of the inputs. This notion of distance is appropriate for algebraic properties of functions and for properties of boolean functions, but for other properties of real-valued functions, \(L_p\) distances are sometimes more appropriate. This paper provides the first systematic study of property testing under these distances. An important high-level message from this work is that the testing problems can be very different under these distances than under Hamming distance. For example, the classic problem of \(\epsilon\)-testing whether \(f : [n] \to \mathbb{R}\) is monotone requires \(\Theta(\frac{\log n}{\epsilon})\) queries in the Hamming distance setting, but only \(\Theta(\frac1{\epsilon^p})\) queries in the \(L_p\) distance setting!