# News for December 2014

Happy new year! I just looked back at our archives, and saw that PTReview has been on from July 2013. I’m happy (and maybe mildly surprised) that it’s still going strong. And so the chronicling of $$o(n)$$ continues…

Much coolness we have to report: permutation testing, linearity testing, distribution testing, and monotonicity testing. Without further ado:

Large permutations and parameter testing by Roman Glebov, Carlos Hoppen, Tereza Klimosova, Yoshiharu Kohayakawa, Daniel Kral, and Hong Liu (arXiv). Just as typical dense graph testing involves checking properties on a random, constant-sized induced subgraph, we can look at permutation testers that test properties of permutations by sampling sub-permutations. The theory of dense graph testing is closely tied to the Szemeredi regularity lemma, the notion of graph limits, and the behavior of subgraph densities. (Check out this survey by Borgs et al.) An analogous theory for permutations has been built by a subset of the authors in a series of papers (survey). There is a notion of permutation properties/parameters that are forcible, meaning that two permutations that have similar frequencies of (a fixed, finite set of) subpermutations have close values of the parameter. This seems naturally connected to testing. Indeed, a permutation parameter that is forcible can be approximated in constant-time by simply approximating the frequencies of the subpermutations. This paper shows that converse is false: there is an constant-time approximable property that is not forcible.

A Self-Tester for Linear Functions over the Integers with an Elementary Proof of Correctness by Sheela Devadas and Ronitt Rubinfeld (arXiv). The venerable property of linearity testing needs no introduction, and has enough history to fill up this month’s post (and the next and the next). But what if the domain is n-bit integers and we care about running time in terms of bit operations? Suppose we wish to check if a program supposed to compute $$f(x) = b\cdot x$$ (for fixed, known $$b$$). Inputs like $$2^k$$ are easy to evaluate by the checker, and could be used to get faster checkers. This paper gives testers for linearity, multivariate linear functions, and multilinear functions, where computation time is always linear in sample complexity.

$$\ell_p$$ Testing and Learning of Discrete Distributions by Bo Waggoner (arXiv). Distribution testing needs little introduction, and we have seen much progress over our time in PTReview. Let’s start with the basics. The seminal work of Batu et al. and later Paninksi showed that the sample complexity of testing uniformity of a distribution $$\mathcal{D}$$ over universe $$[n]$$ is $$\Theta(\sqrt{n}/\epsilon^2)$$. Meaning, the sample complexity of checking $$\|\mathcal{D} – \mathcal{U}\|_1 > \epsilon$$ (where $$\mathcal{U}$$ is uniform on $$[n]$$) is $$\Theta(\sqrt{n}/\epsilon^2)$$. But what if we had $$\ll \sqrt{n}/\epsilon^2$$ samples? From prior work, nothing could be inferred. (Update: As Ilias Diakonikolas pointed out to me, the $$\ell_2$$ results were previously known, both by the Batu et al. work and a recent paper by Chan et al. that settles the $$\ell_2$$ question.) This paper changes that, and says that we can still infer something about other $$\ell_p$$ norms, $$\|\mathcal{D} – \mathcal{U}\|_p$$. What impressed me about this paper is the detailed understand of the interplay between $$n, \epsilon, p$$. For example, the sample complexity of uniformity testing over $$\ell_p$$-norm for any $$p > 1$$ is independent of $$n$$. There are many, many results in this paper with an intriguing threshold phenomenon at $$p=2$$. For distribution testing in practice, I would think that this result would be of much significance.

New algorithms and lower bounds for monotonicity testing by Xi Chen, Rocco A. Servedio, and Li-Yang Tan (arXiv). Ah yes, Boolean monotonicity testing. Consider the standard coordinate wise partial order on $$\{0,1\}^n$$, given by $$\prec$$. A function $$f:\{0,1\}^n \rightarrow \{0,1\}$$ is monotone if $$\forall x \prec y, f(x) \leq f(y)$$. The complexity of property testing (Boolean) monotonicity is one of those tantalizing, simple-to-state questions that is still quite open. I’ll spare you the full story and the epsilons, but here’s the status. The best upper bound is a non-adaptive, one-sided $$O(n^{7/8})$$ tester by Chakrabarty and Seshadhri. The best lower bound is a non-adaptive, one-sided lower bound of $$\Omega(\sqrt{n})$$ by Fischer et al. This implies an $$\Omega(\log\log n)$$ lower bound for general testers. This paper changes all of this. The authors prove a $$\Omega(n^{1/5})$$ lower bound for two-sided non-adaptive testers, leading to an exponentially better $$\Omega(\log n)$$ lower bound for general testers. The main insight is to focus on monotone vs non-monotone families of linear threshold functions, and show that procedures making few (non-adaptive) queries cannot distinguish between such families. The main hammer is to use recent Central Limit Theorems. As an added bonus, this paper improves the upper bound for monotonicity testing to $$O(n^{5/6})$$, with a modified tester (and better analysis) of Chakrabarty and Seshadhri. But can the lower bound be improved?

Boolean function monotonicity testing requires (almost) $$n^{1/2}$$ non-adaptive queries by Xi Chen, Anindya De, Rocco A. Servedio, and Li-Yang Tan (arXiv). Yes, the lower bound can be improved. This paper gets the non-adaptive two-sided bound to (almost) $$\Omega(\sqrt{n})$$, matching the one-sided Fischer et al bound. The paper proves improved Central Limit Theorems, tailored for this application. The authors, and I with them, believe that this is the true complexity. At that intriguing note, we end 2014!

# News for November 2014

With the submission deadline for many conferences in November it is no surprise that we have good a number of nice articles related to property testing submitted in ArXiv and ECCC this month. Here they are:

Sunflowers and Testing Triangle-Freeness of Functions by Ishay Haviv and Ning Xie (arXiv). Testing triangle freeness is one of the central problems in property testing. The upper bound on the query complexity is obtained using the regularity lemma. The upper bound is constant that comes from the regularity lemma. The lower bound (in terms of the distance parameter, ε) is a polynomial in ε. In another line of recent works different variations of sunflower lemma has been studied.  Some important problems in complexity theory and algorithms, like matrix multiplication, has been connected to different variations of the sunflower lemma. In this work a new approach, using variations of sunflower lemma, is given to obtain lower bounds for triangle freeness. This work gives a new lower bound technique. It also reconfirms the need to understand the combinatorial problem of sunflower lemma and its variations.

Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts by Madhav Jha, C. Seshadhri and Ali Pinar (arXiv). Counting or approximately counting the number of occurrences of a particular subgraph in a given graph is a very important problem – not only as a mathematically interesting problem but also from the application point of view. The hard part is to design an algorithm that is provably good as well as empirically satisfactory. There is a huge line of research on this topic that solves this problem for a particular subgraph.  In this paper a 3-path sampling based algorithm is presented that can approximate the number of occurrences of any 4-vertex subgraph. Not only is the correctness and the running time of the algorithm is proved mathematically but the speed of the algorithm has been tested empirically also. This work should motivate us to design sub-linear algorithms that are both provably good as well as can be implementable.

Forrelation: A Problem that Optimally Separates Quantum from Classical Computing by Scott Aaronson and Andris Ambainis (arXiv). Understanding the power of quantum query over classical queries is well studied subject and has a long line of interesting results. A long standing open problem posed by Buhrman et al is related to the problem of what is the largest separation between classical query complexity and quantum query complexity.  In this paper it is shown that the Forrelation problem, where one needs to decide whether one Boolean function is highly correlated with the Fourier transform of a second function, can be solved using 1 quantum query where as classically one needs Ω(√n/log n) number of queries. They also prove this is almost tight. This improves the previous know gap of 1 vs. Θ(n^{1/4}).

Query complexity in expectation by Jedrzej Kaniewski, Troy Lee and Ronald de Wolf (arXiv). In query complexity usually we consider the number of queries required for computing a particular given function f(x) exactly when x has to be queried. But if we are required to only output a value whose expected value (expectation taken over random coin tosses of the algorithm) is f(x), then what happens to the query complexity.  This natural question has been address in this paper for the first time. They prove that in this model both the randomized and quantum query complexity is equal to two natural definition of degree for the function f. This is very natural model of computing and seems like a very interesting direction to pursue. it should yield some nice results and applications in near future.

A Chasm Between Identity and Equivalence Testing with Conditional Queries by Jayadev Acharya, Clément L. Canonne and Gautam Kamath (arXiv). Recently a number of works have studied the power of conditional sampling. Conditional sampling in general gives lot more power compared to standard sampling model. For example testing identity of an unknown distribution, with support size n, can be done using constant number of conditional sampling, while in the standard sampling model Θ(√n) number of samples are required. For a related problem of testing equivalence of two unknown distributions, in the standard sampling model Θ(n^{2/3}) number of queries are required while only polylog(n) conditional samples are sufficient.  It was not known if this bound for conditional samples is tight for this problem of testing equivalence of unknown distribution. In this paper a lower bound of Ω(√(loglog n)) is proved.

# News for October 2014

Halloween came early for the property testing community this year, with lots of treats throughout the month of October.

There was FOCS 2014, with a number of relevant contributions including On Learning and Testing Dynamic Environments by Oded Goldreich and Dana Ron (discussed previously here), New algorithms and lower bounds for monotonicity testing by Xi Chen, Rocco Servedio, and Li-Yang Tan and An Automatic Inequality Prover and Instance Optimal Identity Testing by Greg and Paul Valiant.

There were also two workshops on the first day of FOCS with very close connections to property testing. The first, Sparse Fourier Transform: Theory and Applications, was centered around recent developments in sublinear-time algorithms for computing the Fourier transform of signals that have a sparse spectrum (or are close to it). The second, Higher-order Fourier Analysis, discussed recent applications of this area of research in computer science; one of which is in testing algebraic properties of functions. The slides for the talks at these workshops are available online.

Another treat (or a promise of future treats) came in the form of the list of accepted papers for ITCS 2015. As we can see from the list, there will be many presentations at the conference on property testing topics, and we can look forward to lots of interesting reading when these papers are posted online.

Finally, we have this month’s contributions:

Gowers Norm, Function Limits, and Parameter Estimation by Yuichi Yoshida. (arXiv) There has been a lot of recent progress on the problem of characterizing the set of algebraic properties of functions that can be tested with a constant number of queries. This line of work has been notable not just for its success, but also for the connections it has established between property testing and other areas of mathematics (such as higher-order Fourier analysis, for example). In this work, this problem (and generalizations of it) is revisited from a different angle, by considering a new notion of function limits. As the results in the paper show, this notion leads to alternative proofs of the constant-query testability of many classic algebraic properties of functions, and promises to offer a fruitful new line of inquiry.

Testing Identity of Structured Distributions by Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin. (arXiv) The most fundamental problem in distribution testing is that of testing identity: given sample access to some unknown distribution q, is it identical (or far from) some known distribution p? This problem has been studied extensively and is well-understood: $$\Theta(\sqrt{n})$$ queries are both necessary and sufficient to test identity of distributions with support size $$n$$. The current paper considers a natural follow-up question: if our distributions p and q are not arbitrary distributions but instead come from some (potentially large but structured) class of distributions, can we test identity more efficiently in this setting? The results of the paper give an emphatic affirmative answer to this question, showing that for many natural classes of distributions, identity can be tested quite efficiently.

Testing Poisson Binomial Distributions by Jayadev Acharya and Constantinos Daskalakis. (arXiv) This paper considers another fundamental problem in distribution testing: how many samples must we draw from some unknown distribution q on $$[n]$$ to test whether it is a sum of Bernoulli distributions or not? Such distributions are known as Poisson Binomial distributions, and the current paper gives tight upper and lower bounds of $$\Theta(n^{1/4})$$ samples for this testing task.

# News for September 2014

It is unfortunate that in the month of september we had very few papers in theory that appeared in ECCC or Arxiv. And of them I could not find any paper that falls in the category of property testing / query complexity or sublinear algorithms.  It is possible I have overlooked some papers. Please do point out any paper that you may think relevant that appeared in the month of September.

# New for August 2014

We’ve got a nice collection of papers this August, ranging from graph property testing, locally testable codes, to new lower bound approaches.

Deconstructions of Reductions from Communication Complexity to Property Testing using Generalized Parity Decision Trees by Roei Tell (ECCC). Blais, Brody, and Matulef (BBM) introduced the beautiful and powerful method of using communication complexity to derive property testing lower bounds. This result is all about understanding this approach is greater depth, and really trying to get at the crux of the reductions. The BBM approach uses rather heavy hammers (communication complexity lower bounds), so could we get away with weaker hammers to get property testing lower bounds? The key insight is that less powerful models can be reduced to existing property testing problems, and these are equivalent to the BBM reductions. While this (as of now) does not lead to newer bounds, it does shed significant light on existing bounds. The weaker model is essentially that of linear decision trees. Such reductions were also discovered by Brushundi, Chakraborty, and Kulkarni, proving $$\Omega(k)$$ lower bounds for testing $$k$$-linearity.

Locally Correctable and Testable Codes Approaching the Singleton Bound by Or Meir (ECCC). Locally Testable Codes (LTCs) are codes that have a constant (or polylogarithmic) time tester. It is known that such codes with minor caveats cannot have a constant rate (a number of papers on this topic: Katz and Trevisan, Dinur and Kaufman, Ben-Sasson and Viderman). But when the query complexity can be $$n^\beta$$ (where $$n$$ is the codeword size), the rate can be arbitrarily close to $$1$$ (Viderman, and more refs in the paper). This result takes this fact to the very extreme. The Singleton Bound is a lower bound between the tradeoff between rate and relative distance of a code. Amazingly, this lower bound tradeoff can be achieved by an LTC (and also Locally Correctable Code) testable with $$n^\beta$$ queries!

Complexity of Nondeterministic Graph Parameter Testing by Marek Karpinksi and Ronald Mark&#243 (arXiv). It is easiest to think of nondeterminstic graph testing with the example of testing bipartiteness for dense graphs. Suppose we wish to test $$G$$ for bipartiteness. As a “certificate”, we are provided a candidate partition $$V = V_1 \cup V_2$$. There is a “simple” constant-time tester that can property testing if $$G$$ is bipartite with respect to this partition. (Just sample two uniform random vertices, and check if the edge between them respects the bipartition.) So we say that bipartiteness is non-determinstically testable: if $$G$$ is far from being bipartite, no certificate can convince the simple tester. The full definition is somewhat complex, but it is basically the same idea. (In this context, “testablility” always refers to constant time testable.) Lovász and Vestergombi proved the striking result that any non-deterministically testable property is also deterministically testable, but their proof did not give an explicit bound on the query complexity. Gishboliner and Shapira fixed that, and gave a tower-type bound for the query complexity. So if the query complexity of the underlying “simple” tester is $$q$$, then the non-deterministic property has a tester whose query complexity is a tower of height $$q$$. This results goes much further, and shows the final query complexity is at most triply exponential in $$q$$. This is quite significant, since it is uses a weaker regularity approach than previous results.

# News for July 2014

In this month’s post, we review papers that raise two questions that I find particularly interesting. First, can sublinear algorithms find applications in efficient approximation of CSPs? And second, can property testing be extended to give meaningful results in settings where Hamming distance is not the “right” measure of distance between functions?

Going for Speed: Sublinear Algorithms for Dense r-CSPs by Grigory Yaroslavtsev (arXiv). In their seminal paper, Goldreich, Goldwasser, and Ron considered the MAX-CUT problem. They showed that both the testing variant of the problem (where the algorithm must distinguish between graphs with a cut of size $$s$$ from those with maximum cut size at most $$s – \epsilon N^2$$) and the approximation variant (where the algorithm must find a cut that has at most $$\epsilon N^2$$ fewer edges than the largest cut in the graph) can be solved in sublinear time. This result inspired the very successful line of research on testing properties of graphs. We can also view this result as motivation for studying the approximability of general CSPs: given some constraint satisfaction problem, can we find an assignment that satisfies almost as many constraints as the maximally satisfying assignment in sublinear time? This current paper begins the exploration of this question with an improved bound on the query complexity of the MAX-CUT problem and with the first general result for approximating CSPs of arity greater than 2 (i.e., where the constraints are on more than 2 variables).

Lp testing by Piotr BermanSofya Raskhodnikova, and Grigory Yaroslavtsev (preprint). Almost all of the work in testing properties of functions is based on the Hamming distance: for a given property P, a tester is required to accept functions that have the property P and to reject the functions that differ from every function with the property P on an $$\epsilon$$ fraction of the inputs. This notion of distance is appropriate for algebraic properties of functions and for properties of boolean functions, but for other properties of real-valued functions, $$L_p$$ distances are sometimes more appropriate. This paper provides the first systematic study of property testing under these distances. An important high-level message from this work is that the testing problems can be very different under these distances than under Hamming distance. For example, the classic problem of $$\epsilon$$-testing whether $$f : [n] \to \mathbb{R}$$ is monotone requires $$\Theta(\frac{\log n}{\epsilon})$$ queries in the Hamming distance setting, but only $$\Theta(\frac1{\epsilon^p})$$ queries in the $$L_p$$ distance setting!

# News for June 2014

Unfortunately the month of June was a very dry month for property testing. We could not find any papers on property testing (in ECCC or ArXiv) during the month of June. Hopefully we will have lots of papers in the month of July.

# News for May 2014

May seems to be a slow month for property testing, with only one paper on locally testable codes. Nonetheless, we have unearthed a paper from April (missed despite our best efforts) on a related topic of hidden set approximations.

The Power of an Example: Hidden Set Size Approximation Using Group Queries and Conditional Sampling by Dana Ron and Gilad Tsur (arXiv). Consider a known universe $$U$$ and an unknown subset $$S$$. Our aim is to approximately determine $$|S|$$. We can perform subset queries: for a query subset $$T$$, an oracle returns whether $$T \cap S \neq \emptyset$$ or (a more powerful model) the oracle returns a uniform random element in $$T \cap S$$. This paper gives a detailed study of this problem under various settings. There is a fascinating array of upper and lower bounds, including situations where $$U$$ is ordered and $$T$$ must be an interval, the difference between adaptivity and non-adaptivity, arbitrary subset queries, and much more. The topic seems to be quite rich, and it should yield some nice sets of problems to study further. It appears that some lower bounds are quite open. Students, pay attention!

Limitations on Testable Affine-Invariant Codes in the High-Rate Regime by Venkatesan Guruswami, Madhu Sudan, Ameya Velingker, and Carol Wang (ECCC). Locally Testable Codes (LTCs) are codes that (typically) have a constant time tester. Interestingly, recent work on testing codes using a linear number of queries (some small constant fraction of the code length) has connections to the Small Set Expansion (SSE) problem, which in turn is related to the famous Unique Games Conjecture (UGC). A paper by Barak et al. gives a construction of a small set expander (a graph where all small sets have high conductance) where the Laplacian has many small eigenvalues (so it’s “difficult” to find a low conductance cut). This construction uses high-rate LTCs that are linear time testable. In some sense, the better the rate, the better construction one gets. This paper asks how far this idea can go. So the idea is to construct the best rate linear-time testable LTC. Reed-Muller codes fall in this category, but their rate is quite far from optimal. Recent work by Guo et al. gives a slightly better construction, but this is far from the (near) optimal rate achieved by BCH codes (which are not known to be testable). All in all, this paper shows that for affine-invariant codes, the Guo et al. construction is essentially the best testable high-rate code one can get. I have obviously done no justice to the depth of the connections, the intricacies of the parameters, or the overall coolness of this subject. So go read the paper!

# News for April 2014

As winter finally releases its grip here on the east coast, we celebrate with forests, with bounded-derivative properties, and with hypergraphs. (I believe these are the traditional ingredients for the beginning-of-spring celebrations in New England, but I could be wrong.)

Testing Forest-Isomorphism in the Adjacency List Model by Mitsuru Kusumoto and Yuichi Yoshida (arXiv). Which properties of sparse graphs can we test efficiently? To a large extent, this fundamental question remains open. The natural setting in which to study this question is the adjacency list query model. In this model, a testing algorithm can select any vertex $$v$$ and index $$i$$; it then receives the identity $$w$$ of the $$i$$th neighbor of $$v$$. Kusumoto and Yoshida show that in this model, we can test if two forests (i.e., collections of trees) on $$n$$ vertices are identical up to relabeling of the vertices with polylog($$n$$) queries. They also show that, remarkably, this tester can be used to test any property of forests in the adjacency list model with polylog($$n$$) queries.

Property Testing on Product Distributions: Optimal Testers for Bounded Derivative Properties by Deeparnab Chakrabarty, Kashyap Dixit, Madhav Jha, and C. Seshadhri (arXiv). Two basic properties of a function $$f : [n]^d \to \mathbb{R}$$ on the hypergrid are monotonicity and the Lipschitz property. These two properties are special cases of a more general class of properties called bounded-derivative properties. In this paper, the authors give optimal bounds on the number of queries required to test these properties over every product distribution on the hypergrid. These results are obtained by deriving new dimension-reduction results and, most interestingly, by establishing and exploiting a strong connection between bounded-derivative properties and binary search trees.

Any Monotone Property of 3-uniform Hypergraphs is Weakly Evasive by Raghav Kulkarni, Youming Qiao, and Xiaoming Sun (ECCC). What happens if we remove the distance promise in the definition of property testing? Or, equivalently, how efficient can testers be if they must distinguish objects with some property $$P$$ from every object that does not have the property $$P$$? The answer to this question seems completely obvious: no non-trivial property can be tested in this setting with a sublinear number of queries. We can easily verify that this answer is indeed correct for our favorite properties… but is it really true for every non-trivial property? As it turns out, this question is far from trivial, and in fact leads to the famous evasiveness conjectures and to the celebrated results of Rivest and Vuillemin and of Kahn, Saks, and Sturvevant. The current paper combines and extends ideas from both of these papers to show that deterministic testers for any monotone non-trivial property of 3-uniform hypergraphs indeed have linear query complexity.