Author Archives: Eric Blais

News for December 2015

Greetings from the exciting Workshop on Sublinear Algorithms at John Hopkins University! As this workshop and the upcoming SODA and ITCS conferences get 2016 to a roaring start, let us take one last look back at property testing news from last year. In December, one work in particular caught my eye:

 Non-Local Probes Do Not Help with Graph Problems by Mika Göös, Juho Hirvonen, Reut Levi, Moti Medina, and Jukka Suomela (arXiv). A generalization of property testing that has recently seen some fascinating developments in the past few years is the local computation algorithms (LCA) model, in which the algorithm is asked to answer some local query (such as “what is the color of this vertex in some fixed, legal coloring of the graph?”) in sublinear-time. This paper relates the LCA model to message-passing models and in the process gives a powerful new tool for establishing lower bounds in LCAs.



News for August

This month saw more development on testing properties of distributions and a result with intriguing connections to property testing. And for readers who may have missed it, Clément Canonne and Gautam Kamath wrote an engaging survey of some recent work on testing properties of distribution here.

Optimal algorithms and lower bounds for testing closeness of structured distributions by Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin (arXiv). One of the fundamental results in testing properties of distributions is that if we want to estimate the (\(L_1\)) distance between two unknown distributions on a domain of size \(n\) up to some constant additive factor,  we need to draw \(O(n^{2/3})\) samples from these distributions, and this sample complexity is tight in general. But what if we consider the same problem in the setting where we are promised that the distributions come from some (known) class of distribution? This paper shows that, for many natural classes of distributions, we can obtain much more efficient algorithms for testing the closeness of distributions in this setting.

Smooth Boolean functions are easy: efficient algorithms for low-sensitivity functions by Parikshit Gopalan, Noam Nisan, Rocco A. Servedio, Kunal Talwar, and Avi Wigderson (arXiv). The (maximum) sensitivity of a Boolean function \(f : \{0,1\}^n \to \{0,1\}\) is the size of the largest set \(S \subseteq [n]\) such that there is an input \(x \in \{0,1\}^n\) for which \(f(x) = f(y)\) for every input \(y\) obtained by flipping the value of the \(i\)th coordinate of \(x\) for some \(i \in S\). One of the results in this paper shows that functions with low sensitivity can be locally self-corrected: given query access to a function \(r : \{0,1\}^n \to \{0,1\}\) that is close to a function \(f\) with low sensitivity (think of \(r\) as a corrupted version of \(f\)), there is an algorithm that for any input $x \in \{0,1\}^n$, queries \(r\) on a few inputs and outputs with high probability the value \(f(x)\). This notion of local self-correction is of course closely related to locally-decodable codes; it is also one of the fundamental techniques used to obtain many results in property testing as well (see for example Chapter 3 of Dana Ron’s survey). Can this result, or the techniques used to obtain it, also yield new results in property testing?


News for April 2015

April was a busy time in property testing with 9 (!) papers posted online this month. So let’s jump right in.

Testing properties of graphs

Approximately Counting Triangles in Sublinear Time by Talya Eden, Amit Levi, and Dana Ron (ECCC). Can we estimate the number of triangles in a sparse graph in sublinear time? When we can only query a vertex to learn its degree or its i-th neighbor, then the answer is no. But this paper shows that if we can also ask whether a pair of vertices are connected by an edge, then the answer is yes!

Testing Cluster Structure of Graphs by Artur Czumaj, Pan Peng, and Christian Sohler (arXiv). Another fundamental problem in the analysis of graphs is that of determining if the vertices of a graph can be partitioned into a small number of good clusters. This paper shows that the appropriate formalization of this problem can be solved in sublinear-time, and that in fact \(O(\sqrt{n})\) queries suffice to test whether a graph with \(n\) vertices can be partitioned into \(k = O(1)\) clusters.

Testing properties of distributions

A Survey on Distribution Testing: Your Data is Big. But is it Blue? by Clément Canonne (ECCC). As we have seen on this blog, there has been a lot of recent development in testing properties of distributions. This latest survey provides an up-to-date introduction to this research.

Faster Algorithms for Testing under Conditional Sampling by Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapathi, and Ananda Theertha Suresh (arXiv). One of the recent developments in distribution testing has been the discovery that natural conditional sampling models offer a surprising amount of algorithmic power. This paper gives yet more evidence of this fact, notably showing that testing the equivalence of two distributions with support size \(n\) can be done with roughly \(O(\log \log n)\) queries—a doubly-exponential improvement over the best possible algorithm for the same problem in the standard sampling model!

Sampling Correctors by Clément Canonne, Themis Gouleakis, and Ronitt Rubinfeld (arXiv). This paper initiates the study of a research direction related, but not exactly equivalent to distribution testing: given that we assume that the true distribution has some structural property (like monotonicity, for example), but that the samples we draw from this distribution may contain errors, can we somehow use these noisy samples to generate “corrected” samples from the original underlying distribution?

Locally-testable codes

High rate locally-correctable and locally-testable codes with sub-polynomial query complexity by Swastik Kopparty, Or Meir, Noga Ron-Zewi, and Shubhangi Saraf (ECCC). The central question in locally-testable error-correcting codes is to determine the tradeoff (if any!) that we must have between the number of queries require to test if an input is a codeword, the rate of the code, and the distance between the codewords. This paper shows that there are codes with constant rate and distance that can also be tested and decoded with sub-polynomial query complexity \(\exp(\tilde{O}(\sqrt{\log n}))\).

Robust testing of lifted codes with applications to low-degree testing by Alan Guo, Elad Haramaty, and Madhu Sudan (ECCC). Typically, we require a local test to reject inputs that are far from codewords with a reasonably large probability (that is: we want the testers to have good soundness properties). A stronger requirement might be to want the local view of a tester on those inputs to be far from the local view of any codeword with good probability. Testers that satisfy this property are called robust, and have many useful properties. This paper shows that natural testers of low-degree polynomials and their generalizations known as lifted codes are indeed robust testers.

General property testing results

On Being Far from Far and on Dual Problems in Property Testing by Roei Tell (ECCC). If I can efficiently test that an input has property \(\Pi\), can I also efficiently test if an input is far from having the property \(\Pi\)? At first glance, it seems that both these problems (that is, the direct and dual version of the problem of testing \(\Pi\)) are equivalent, but this is not always the case. This paper explores this subtle question and provides initial results for dual problems in testing properties of functions, graphs, and distributions.

Trading query complexity for sample-based testing and multi-testing scalability by Eldar Fischer, Oded Lachish, Yadu Vasudev (arXiv). What properties of combinatorial objects can we test in sublinear time when we have only sample access to it (instead of full query access)? This paper provides a general and powerful result showing that every property that can be tested non-adaptively with a constant number of queries can also be tested with a sublinear number of samples.


Sublinear Algorithms Day at MIT on April 10

An announcement from Gautam that will be of interest to readers who will be in (or are looking for a good reason to go to!) New England next month:

On Friday, April 10th, MIT will be hosting the second Sublinear Algorithms Day. This event will bring together researchers in the northeast for a day of interaction and discussion.

Sublinear Day will feature talks by five experts in the areas of sublinear and streaming algorithms: Costis Daskalakis, Robert Krauthgamer, Jelani Nelson, Shubhangi Saraf, and Paul Valiant — each giving a 45-minute presentation on the hot and latest developments in their fields.

Additionally, for the first time this year, we will have a poster session! We strongly encourage young researchers (particularly students and postdocs) to present work related to sublinear algorithms. Abstract submission details are available here.

So what are you waiting for? Registration is available here, and we hope to see you at the event!

Contact with any questions.

Gautam Kamath

News for January 2015

We ended our last post discussing results from 2014 by discussing the monotonicity testing conjecture that \(O(\sqrt{n})\) queries suffice to test whether a Boolean function is monotone.  As it turns out, a proof of this conjecture was just around the corner…

On Monotonicity Testing and Boolean Isoperimetric type Theorems by Subhash Khot, Dor Minzer, and Muli Safra (ECCC). This remarkable paper shows that, indeed, we can test whether \(f : \{0,1\}^n \to \{0,1\}\) is monotone with \(O(\sqrt{n})\) queries using a natural tester. The authors establish this result by building on the connection between monotonicity testers and directed analogues of isoperimetric inequalities on the Boolean hypercube first established by Chakrabarty and Seshadhri. Specifically, the authors establish a directed analogue of a classic inequality of Talagrand and combine it with a number of other interesting innovations to analyze their monotonicity tester.

Big Data on the Rise: Testing monotonicity of distributions by Clément Canonne (arXiv). Another setting in which monotonicity testing has received a lot of attention recently is in the setting of testing properties of distributions. A distribution \(D\) on \(\{1,2,\ldots,n\}\) is monotone (non-increasing) if its probability mass function is non-increasing: \(D(1) \ge D(2) \ge \cdots \ge D(n)\). When we must test whether a function is monotone by drawing samples from the distribution, \(\tilde{\Theta}(\sqrt{n})\) samples are both necessary and sufficient. This work shows that, in contrast, testers with stronger query access (such as conditional sampling) to the distribution can test monotonicity much more efficiently.

News for October 2014

Halloween came early for the property testing community this year, with lots of treats throughout the month of October.

There was FOCS 2014, with a number of relevant contributions including On Learning and Testing Dynamic Environments by Oded Goldreich and Dana Ron (discussed previously here), New algorithms and lower bounds for monotonicity testing by Xi Chen, Rocco Servedio, and Li-Yang Tan and An Automatic Inequality Prover and Instance Optimal Identity Testing by Greg and Paul Valiant.

There were also two workshops on the first day of FOCS with very close connections to property testing. The first, Sparse Fourier Transform: Theory and Applications, was centered around recent developments in sublinear-time algorithms for computing the Fourier transform of signals that have a sparse spectrum (or are close to it). The second, Higher-order Fourier Analysis, discussed recent applications of this area of research in computer science; one of which is in testing algebraic properties of functions. The slides for the talks at these workshops are available online.

Another treat (or a promise of future treats) came in the form of the list of accepted papers for ITCS 2015. As we can see from the list, there will be many presentations at the conference on property testing topics, and we can look forward to lots of interesting reading when these papers are posted online.

Finally, we have this month’s contributions:

Gowers Norm, Function Limits, and Parameter Estimation by Yuichi Yoshida. (arXiv) There has been a lot of recent progress on the problem of characterizing the set of algebraic properties of functions that can be tested with a constant number of queries. This line of work has been notable not just for its success, but also for the connections it has established between property testing and other areas of mathematics (such as higher-order Fourier analysis, for example). In this work, this problem (and generalizations of it) is revisited from a different angle, by considering a new notion of function limits. As the results in the paper show, this notion leads to alternative proofs of the constant-query testability of many classic algebraic properties of functions, and promises to offer a fruitful new line of inquiry.

Testing Identity of Structured Distributions by Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin. (arXiv) The most fundamental problem in distribution testing is that of testing identity: given sample access to some unknown distribution q, is it identical (or far from) some known distribution p? This problem has been studied extensively and is well-understood: \(\Theta(\sqrt{n})\) queries are both necessary and sufficient to test identity of distributions with support size \(n\). The current paper considers a natural follow-up question: if our distributions p and q are not arbitrary distributions but instead come from some (potentially large but structured) class of distributions, can we test identity more efficiently in this setting? The results of the paper give an emphatic affirmative answer to this question, showing that for many natural classes of distributions, identity can be tested quite efficiently.

Testing Poisson Binomial Distributions by Jayadev Acharya and Constantinos Daskalakis. (arXiv) This paper considers another fundamental problem in distribution testing: how many samples must we draw from some unknown distribution q on \([n]\) to test whether it is a sum of Bernoulli distributions or not? Such distributions are known as Poisson Binomial distributions, and the current paper gives tight upper and lower bounds of \(\Theta(n^{1/4})\) samples for this testing task.

News for July 2014

In this month’s post, we review papers that raise two questions that I find particularly interesting. First, can sublinear algorithms find applications in efficient approximation of CSPs? And second, can property testing be extended to give meaningful results in settings where Hamming distance is not the “right” measure of distance between functions?

Going for Speed: Sublinear Algorithms for Dense r-CSPs by Grigory Yaroslavtsev (arXiv). In their seminal paper, Goldreich, Goldwasser, and Ron considered the MAX-CUT problem. They showed that both the testing variant of the problem (where the algorithm must distinguish between graphs with a cut of size \(s\) from those with maximum cut size at most \(s – \epsilon N^2\)) and the approximation variant (where the algorithm must find a cut that has at most \(\epsilon N^2\) fewer edges than the largest cut in the graph) can be solved in sublinear time. This result inspired the very successful line of research on testing properties of graphs. We can also view this result as motivation for studying the approximability of general CSPs: given some constraint satisfaction problem, can we find an assignment that satisfies almost as many constraints as the maximally satisfying assignment in sublinear time? This current paper begins the exploration of this question with an improved bound on the query complexity of the MAX-CUT problem and with the first general result for approximating CSPs of arity greater than 2 (i.e., where the constraints are on more than 2 variables).

Lp testing by Piotr BermanSofya Raskhodnikova, and Grigory Yaroslavtsev (preprint). Almost all of the work in testing properties of functions is based on the Hamming distance: for a given property P, a tester is required to accept functions that have the property P and to reject the functions that differ from every function with the property P on an \(\epsilon\) fraction of the inputs. This notion of distance is appropriate for algebraic properties of functions and for properties of boolean functions, but for other properties of real-valued functions, \(L_p\) distances are sometimes more appropriate. This paper provides the first systematic study of property testing under these distances. An important high-level message from this work is that the testing problems can be very different under these distances than under Hamming distance. For example, the classic problem of \(\epsilon\)-testing whether \(f : [n] \to \mathbb{R}\) is monotone requires \(\Theta(\frac{\log n}{\epsilon})\) queries in the Hamming distance setting, but only \(\Theta(\frac1{\epsilon^p})\) queries in the \(L_p\) distance setting!

News for April 2014

As winter finally releases its grip here on the east coast, we celebrate with forests, with bounded-derivative properties, and with hypergraphs. (I believe these are the traditional ingredients for the beginning-of-spring celebrations in New England, but I could be wrong.)

Testing Forest-Isomorphism in the Adjacency List Model by Mitsuru Kusumoto and Yuichi Yoshida (arXiv). Which properties of sparse graphs can we test efficiently? To a large extent, this fundamental question remains open. The natural setting in which to study this question is the adjacency list query model. In this model, a testing algorithm can select any vertex \(v\) and index \(i\); it then receives the identity \(w\) of the \(i\)th neighbor of \(v\). Kusumoto and Yoshida show that in this model, we can test if two forests (i.e., collections of trees) on \(n\) vertices are identical up to relabeling of the vertices with polylog(\(n\)) queries. They also show that, remarkably, this tester can be used to test any property of forests in the adjacency list model with polylog(\(n\)) queries.

Property Testing on Product Distributions: Optimal Testers for Bounded Derivative Properties by Deeparnab Chakrabarty, Kashyap Dixit, Madhav Jha, and C. Seshadhri (arXiv). Two basic properties of a function \(f : [n]^d \to \mathbb{R}\) on the hypergrid are monotonicity and the Lipschitz property. These two properties are special cases of a more general class of properties called bounded-derivative properties. In this paper, the authors give optimal bounds on the number of queries required to test these properties over every product distribution on the hypergrid. These results are obtained by deriving new dimension-reduction results and, most interestingly, by establishing and exploiting a strong connection between bounded-derivative properties and binary search trees.

Any Monotone Property of 3-uniform Hypergraphs is Weakly Evasive by Raghav Kulkarni, Youming Qiao, and Xiaoming Sun (ECCC). What happens if we remove the distance promise in the definition of property testing? Or, equivalently, how efficient can testers be if they must distinguish objects with some property \(P\) from every object that does not have the property \(P\)? The answer to this question seems completely obvious: no non-trivial property can be tested in this setting with a sublinear number of queries. We can easily verify that this answer is indeed correct for our favorite properties… but is it really true for every non-trivial property? As it turns out, this question is far from trivial, and in fact leads to the famous evasiveness conjectures and to the celebrated results of Rivest and Vuillemin and of Kahn, Saks, and Sturvevant. The current paper combines and extends ideas from both of these papers to show that deterministic testers for any monotone non-trivial property of 3-uniform hypergraphs indeed have linear query complexity.

News for January 2014

As we already covered in earlier posts, quite a few notable property testing papers were presented at the SODA and ITCS conferences this month. In addition, an exciting new set of results on a variant of the linearity testing problem was also posted on ECCC.

Direct Sum Testing by Roee David, Irit Dinur, Elazar Goldenberg, Guy Kindler, and Igor Shinkar. One of the true gems of property testing is the linearity testing: as Blum, Luby, and Rubinfeld first showed, we can test whether a boolean function is linear with a (very natural) 3-query test. A fascinating variant of this problem that is still largely open to this day is the following: is linearity still testable even when we consider functions defined on a subset of the hypercube? This paper considers the problem when the subsets in question are layers of the hypercube, a problem that is closely connected to PCP constructions and direct sum operations on multi-player games. The main result of the paper shows that we can also test linearity for functions defined on layers of the hypercube with a natural 3-query tester.

News for October 2013

We saw two new property testing papers in October. During the month, we also saw a great presentation on estimating the distance to testable affine-invariant properties (discussed here) at FOCS, and we saw some intriguing property testing papers in the list of accepted papers for ITCS 2014.

Survey on Quantum Property Testing by Ashley Montanaro and Ronald de Wolf (arxiv). We already mentioned this survey here, but it is worth discussing it again. With a clear presentation of fundamental results in the area, this survey is a great reference for any researcher in property testing or in quantum complexity that wants to learn about the intersection of the two fields. And with 12 open questions presented throughout the text, this is also the ideal starting point for any researcher who wants to jump into the area.

Property Testing Bounds for Linear and Quadratic Functions via Parity Decision Trees by Abhishek Bhrushundi, Sourav Chakraborty, and Raghav Kulkarni (ECCC). The authors observe that the problem of testing a property \(P\) of linear or quadratic boolean functions is equivalent to determining the parity decision tree complexity of a function \(f\) determined by \(P\). This connection is used to obtain strong lower bounds on the query complexity required to test \(k\)-linearity, bent functions, and other related properties. It also provides further motivation for the the study of parity decision trees and the related problems on the Fourier structure of boolean functions.