News for August 2013

August saw a couple of conferences: RANDOM-APPROX and a seminar at the Simon’s institute. Here’s a report on the new papers we saw in August, including much progress on distribution testing.

On sample based testers by Oded Goldreich and Dana Ron (ECCC).The usual setting for property testing is beloved query model, where the tester gets to make any query it pleases. A much weaker setting would be to simply get random (labeled) samples of the domain. These are called sample based testers, and are more akin to setting in learning theory. Sampled based testers were also discussed in the seminal Goldreich, Goldwasser, and Ron paper. Since then, there have been variants, such as distribution-free testing and active testing. In this paper, it is shown that constant-query proximity-oblivious testers imply the existence of sublinear (polynomial dependence in size) sample based testers.

Instance-by-instance optimal identity testing by Gregory Valiant and Paul Valiant (ECCC). The problem of distribution testing is a problem that we all love. Given a known discrete distribution \(p\), we wish to test equality (with \(p\)) of an unknown distribution \(q\). How many independent samples are required of \(q\)? This paper gives an optimal algorithm for each distribution \(p\), subsuming much past work. The analysis involves a characterization of Hölder and norm-monotonicity type inequalities. Definitely on my list to read!

Optimal algorithms for testing closeness of discrete distributions by Siu-On Chan, Ilias Diakonikolas, Gregory Valiant, Paul Valiant (Arxiv). Now consider the setting where both \(p\) and \(q\) are unknown (over a support of size \(n\)), and we wish to test equality. (We define “far” in terms of variation distance.) A (by-now) classic paper of Batu et al. gives an \(O(n^{2/3}\log n/\varepsilon^{8/3})\) and Valiant proved an \(\Omega(n^{2/3})\) lower bound. This paper completely resolves this problem problem with an algorithm and matching lower bound of \(\max(n^{2/3}/\varepsilon^{4/3}, n^{1/2}/\varepsilon^2)\).

Aug 21-23: RANDOM-APPROX 2013

RANDOM-APPROX 2013 saw a bunch of property testing talks.

An optimal lower bound for monotonicity testing over hypergrids (Deeparnab Chakrabarty and C. Seshadhri). I talked about a lower bound for monotonicity testing over hypergrids. Given a function \(f:[n]^d \rightarrow \mathbb{N}\), we wish to test if \(f\) is monotone with respect to the coordinate-wise partial order. We proved a adaptive, two-sided \(\Omega(\varepsilon^{-1} d\log n)\) lower bound, matching recent upper bounds.

Testing Membership in Counter Automaton Languages (Yonatan Goldhirsh and Michael Viderman). In a seminal result, Alon, Krivelevich, Newman, and Szegedy proved that membership in regular languages is testable in constant queries. Can we extend this to richer languages? Lachish and Newman proved that membership in languages given by a pushdown automaton with a single stack symbol (a counter automaton) is not testable with constant queries. Yonathan talked about a weaker version of counter automata, for which membership is testable in constant queries.

Tight lower bounds for testing linear isomorphism (Elena Grigorescu, Karl Wimmer and Ning Xie). Understanding testability of functions that exhibit symmetries (like linear invariances) is an important program in property testing. A function \(f : \{0, 1\}^n \rightarrow \{0, 1\}\) is linear isomorphic to a \(g\) if \(f = g \circ A\), for non-singular transformation \(A\). Ning talked about a lower bound of \(\Omega(n^2)\) for testing if a function is linear isomorphic to the inner product function. (This is tight, as a simple algorithm shows.) In other words, consider the property of functions linear isomorphic to the inner product. This property is not testable in constant queries, suggesting that symmetries themselves do not suffice for (constant) testability.

Local reconstructors and tolerant testers for connectivity and diameter (Andrea Campagna, Alan Guo and Ronitt Rubinfeld). Local reconstructors are like property testers on steroids. Suppose we have access to some function \(f\) that does not satisfy some property \(\mathcal{P}\). A local reconstructor provides oracle access to a function \(g \in \mathcal{P}\) that is (approximately optimally) close to \(f\), and the reconstructor requires only sublinear queries to \(f\) to output a value of \(g\). Alan talked about reconstructors for \(k\)-connectivity in the general sparse-graph model. This also leads to new tolerant testers. Pretty intricate and neat algorithms, IMHO.

A local computation approximation scheme to maximum matching (Yishay Mansour and Shai Vardi). If you put local reconstructors on steroids, you get a local computation algorithm. Consider a graph \(G\), where want to “locally” find a maximum matching. We want a sublinear time procedure that given an edge of \(G\) outputs whether it is part of the matching or not. All these answers must be globally consistent and represent the final matching. Shai talked about a local \(\mathop{poly}(\log n)\) algorithm for finding \((1-\epsilon)\)-approximate matchings. Another impressive local algorithm.

Absolutely Sound Testing of Lifted Codes (Noga Ron-Zewi, Elad Haramaty and Madhu Sudan). Elad talked about more progress on the testing of affine-invariant properties. Previous work define a “lifting operation” for an affine-invariant base code (defined as say a function \(f:\mathbb{F}^t \rightarrow \mathbb{F}\). This is a codeword in many more variables (say a function \(f:\mathbb{F}^n \rightarrow \mathbb{F}\)), whose restriction to any t-dimensional affine subspace belongs to the base code. This paper gives absolutely sound testers for the lift of any affine-invariant base code. This work is a nice way of interpreting (and generalizing) the ideas behind low-degree testing.

News for July 2013

Helly-Type Theorems in Property Testing by Sourav Chakraborty, Rameshwar Pratap, Sasanka Roy, and Shubhangi Saraf (arXiv). Helly’s Theorem states that if every \(d+1\) sets in a family \(F\) of convex sets in \(\mathbb{R}^d\) have non-empty intersection, then the intersection of the whole family \(F\) is also non-empty. This paper establishes a new connection between Helly’s theorem and clustering problems. Specifically, extensions of Helly’s theorem are used to analyze algorithms that test whether a set of points can be partitioned into a small number of clusters or not.

On active and passive testing by Noga Alon, Rani Hod, and Amit Weinstein (arXiv). In the standard property testing setting, the tester is able to query any input it chooses. This model is unrealistic in some settings, where instead it is more appropriate to consider models where the tester can only choose queries from a restricted set of the inputs (active testing) or where the tester only sees bits sampled at random from the input (passive testing). This paper gives new upper and lower bounds on the number of queries required to test many natural properties of boolean functions—including juntas, partially symmetric functions, and low degree polynomials—in these two settings.

Explicit Maximally Recoverable Codes with Locality by Parikshit Gopalan, Cheng Huang, Bob Jenkins, and Sergey Yekhanin (ECCC). The central goal in the study of locally decodable codes is to maintain the rate and error tolerance while adding the condition that we can recover any bit of the original input with a small number of queries to the codeword. This paper considers and makes progress on an interesting relaxation of this problem: here the goal is to maintain local decoding when a small number of errors are introduced, while allowing more queries to be made to decode input bits when more errors occur.

News for June 2013

In June, we had a property testing workshop at Haifa where a lot of recent PT work was discussed. Check out the post on that. June also saw some interesting developments on testing affine-invariant properties, on testing sub-properties, on local computation algorithms, and on PCP constructions.

TESTING AFFINE-INVARIANT PROPERTIES

Over the last few years, there has been much progress in determining which affine-invariant properties of boolean functions can be tested with a constant number of queries. In Estimating the distance from testable affine-invariant properties (arXivECCC), Hamed Hatami and Shachar Lovett show that for every such property, we can not only test it but also estimate the distance to the property with a constant number of queries.

The function linear isomorphism testing problem asks: How many queries do we need to test if a given (unknown) function is equivalent, up to a linear transformation of the input space, to some (known) function \(f\)? The answer to this question depends on the choice of the function \(f\). Elena Grigorescu, Karl Wimmer, and Ning Xie, in Tight lower bounds for testing linear isomorphism (ECCC) and Abhishek Bhrushundi, in the concurrent and independent paper On testing bent functions (ECCC), show that the query complexity for testing linear isomorphism is maximized when \(f\) is the Inner Product function. Interestingly, the proofs of this result (and other more general results) are obtained using completely different methods: Elena, Karl, and Ning prove their lower bounds using the communication complexity method, while Abhishek’s proof is obtained by studying the parity decision tree complexity of boolean functions.

TESTING SUB-PROPERTIES

One counter-intuitive aspect of property testing is that the query complexity for testing \(P\) does not in general imply anything about the query complexity for testing a sub-property \(P’ \subseteq P\). For example, while we can test halfspaces (aka, linear threshold functions) with a constant number of queries, testing the subclass of signed majorities is known to require \(\Omega(\log n)\) queries, and in fact the best-known algorithm for this task is a non-adaptive tester that requires \(O(\sqrt{n})\) queries. In Exponentially improved algorithms and lower bounds for testing signed majorities (ECCC), Dana Ron and Rocco Servedio dramatically improve both the upper and lower bounds: they show that non-adaptive algorithms for testing signed majorities require \(\mathrm{poly}(n)\) queries and that signed majorities can be tested by an adaptive algorithm that requires only \(\mathrm{polylog}(n)\) queries.

Another phenomenon that appears in some natural properties is that while a property \(P\) requires many queries to test, it can be partitioned into (slightly) smaller properties \(P’\) that can each be tested with a constant number of queries. It is natural to ask whether this is a universal phenomenon. In Some properties are not even partially testable (arXiv, ECCC), Eldar Fischer, Yonatan Goldhirsh, and Oded Lachish show that it is not: they show that there are properties \(P\) for which every (large enough) sub-property of \(P\) requires a large number of queries to test.

LOCAL COMPUTATION AND PCP CONSTRUCTIONS

A notion that is very closely related to property testing is that of local computation
algorithms: algorithms that, as in the property testing setting, aim to compute the solution of a problem in sublinear time by querying as few bits of the input as possible. In A Local Computation Approximation Scheme to Maximum Matching (arXiv), Yishay Mansour and Shai Vardi give a new local computation algorithm for obtaining a \((1-\epsilon)\)-approximation to the maximum matching in bounded-degree graphs.

The notion of Probabilistically Checkable Proofs (PCPs) is also closely related to property testing, where now the input to the tester is a string \(x\) and a purported proof that \(x\) satisfies some property \(P\); the tester must verify the correctness of the proof while examining as few bits of \(x\) and of the proof as possible. A long-standing open problem in this area is to understand the best possible trade-offs between the query complexity and the length of proofs for PCP constructions. In Constant rate PCPs for circuit-SAT with sublinear query complexity (ECCC), Eli Ben-Sasson, Yohay Kaplan, Swastik Kopparty, Or Meir, and Henning Stichtenoth give a verifier for a special case of PCPs that obtains a sublinear query complexity with proofs that have length only linear in the size of the input.

 

June 17-18, 2013: Property Testing and Sublinear Time Algorithms, at Haifa

To kick off PTReview, we have a (belated!) digest of talks from the Property Testing and Sublinear Time Algorithms workshop held at Haifa, Israel. Eldar Fischer and Ilan Newman hosted a bunch of Property Testing folks in beautiful Haifa, generously funded by the European Research Council under the European Union’s Seventh Framework Programme.

Without further ado, here’s what us at the workshop heard about (with minor editorializing on my part).

Sofya Raskhodnikova (Pennsylvania State University) on Testing Privacy: Sofya talked about algorithms for testing privacy. This is a continuation of work she initiated on connections between property testing and differential privacy. This involves a sort of “average-case” notion of differential privacy and requires property testing for the Lipschitz property for non-uniform distributions. Paper details here.

Gilad Tsur (Weizmann Institute of Science) on Approximate Image and Template Matching: Gilad has a very nice result on using property testing based sampling algorithms to find template matches in images. This is a real, practical vision result (with pretty pictures). I really like this because it takes theoretical ideas to the practical realm, and actually gives an implemented, usable algorithm. Not just putting a story about big data in the introduction and then ignoring it. The theoretical details here, and the practical results here.

Elad Haramati (Technion) on Optimal Testing of Multivariate Polynomials over Small Prime Fields: This is on the classic problem of testing of a function is close to a low-degree polynomial. The natural test is to restrict the function to a random low-dimensional subspace and run a trivial check of whether this restriction is low-degree. Elad talked about an optimal analysis of this tester. Paper here.

Mario Szegedy (Rutgers University) on On the Max-cut Problem: Suppose we had an incidence stream of a graph where we see all edges incident to a vertex together. The aim is to approximate the Max-cut. When a vertex is seen, the algorithm must commit to which side of the cut it is in. Mario discussed an (albeit exponential time) algorithm that can beat the trivial 0.5-approximation. He referred to the general problem as reverse property testing. The algorithm sees (say) 99% of the graph and can do something trivial (say random) here. But it has to do something clever on the remaining 1% to beat the trivial 0.5-approximation. So can we infer something about the 1% from the 99% (as opposed to property testing, where we infer something about the 99% from a random 1%)?

Reut Levi (Tel-Aviv University) on A Quasi-polynomial Time Partition Oracle for Graphs with an Excluded Minor: One of the important tools for property testing on bounded degree graphs is an efficient partition oracle. This is a constant-time local procedure that implicitly partitions the graph into constant sized connected pieces with few edges crossing pieces. A direct consequence of such a procedure is a constant time property testing for minor closed properties. Reut talked about an more efficient partition oracle that runs in time quasipolynomial in \(1/\varepsilon\), a significant improvement over previous work. Paper here.

Asaf Shapira (Tel-Aviv University) on Deterministic vs Non-deterministic Graph Property Testing: A (dense) graph property is non-deterministically testable if there exists a “certificate” whose correctness can be verified in constant time by random sampling. It was earlier shown by Lovasz and Vesztergombi that a non-deterministically testable property is also testable to deterministically testable. Asaf talked about a proof that gives a quantifiable bound on the query complexity. Paper here.

Yonatan Goldhirsh (Technion) on Some Properties are not Even Partially Testable: Consider a property P and a sub-property P’. P’ is called partially testable if inputs in P’ can be distinguished from inputs that are \(\varepsilon\)-far from P. Some natural non-testable properties can be partitioned into a small number of partially testable sub-properties. Is this always true? Yonatan showed us this is not the case, and this requires interesting information theoretic arguments. Paper here.

Arie Matsliah (Google) on On the Power of Conditional Samples in Distribution Testing: The problem of testing distributions is a well-studied problem in property testing. Arie talked about a neat twist on this problem. Suppose the distribution support is known in advance. In addition to simply getting a random sample, we can also get a conditional random sample on any subset of the support. In this model, uniformity testing can be done in constant time (which is known to be impossible in the standard setting). A nice collections of results have been discovered in this framework. Paper here.

Eric Blais (Massachusetts Institute of Technology) on Communication Complexity and Property Testing: beyond the Boolean Hypercube: Blais, Brody, and Matulef had a beautiful paper that uses communication complexity to prove property testing lower bounds. All those bounds were restricted to the Boolean hypercube domain. Eric talked about taking this method to hypergrids, and this requires finding and exploiting the right basis of functions for hypergrids. This leads to the first lower bounds for hypergrid properties. Paper here.

Christian Sohler (Technische Universität Dortmund) on Almost Optimal Canonical Property Testers for Satisfiability: A general class of properties is given by \((k, d)\)-Function-SAT. It can be thought of as a generalized hypergraph coloring problem. Christian’s result is about an almost optimal property tester for dense instances of \((k,d)\)-function-SAT. This is a classic well-studied problem and subsumes many classic graph properties. Paper details here.

Ning Xie (Florida International University) on Testing the Degree Sequence of a Graph: Understanding degree distributions on graphs is a large research area, because of its connections to social networks and the ubiquitous power law/Zipf distributions. Given a fixed degree sequence \(\vec{s}\) and an input graph \(G\), is \(G\) close to any graph with degree sequence? An independent question of interest is: if the degree sequence of \(G\) is close to \(\vec{s}\), is it also close to a graph \(H\) with degree sequence \(\vec{s}\). Ning showed that this not true in general, but does hold for power law degree sequences.

C. Seshadhri (Sandia National Laboratories) on Monotonicity Testing and Alternating Paths: I talked about monotonicity testing of functions on the Boolean hypercube. The main result is that the canonical “edge tester” for monotonicity is optimal. An interesting technique of alternating paths is used in the proof, which generalizes to other properties and hypergrid domains. Paper here.

Ronitt Rubinfeld (Massachusetts Institute of Technology) on Local Decompression: Suppose you zipped 100 files together (through say, gzip) and got a nice compressed representation. Now, you only wanted to read the 50th file. Currently, you would decompress the entire file and read the 50th file (and waste a lot of memory on storing the stuff you don’t want). Can you selectively “locally” decompress a contiguous section of the original input? Awesome question! (Why didn’t I think of that?) Ronitt talked about a locally decompressible version of the famous Lempel-Ziv compression. The aim is to add this new feature without losing much of the LZ compression. This was an actual implemented result that worked on a computer, and I thought that was uber-cool. My favorite talk of the workshop. Paper here.

Sourav Chakraborty (Chennai Mathematical Institute) on Testing of Boolean Function Isomorphism: Two Boolean functions are isomorphic if they are the same up to relabeling of inputs. Determining if two functions are isomorphic is a fundamental property testing problem (or really, area of problems) with a rich host of results. Sourav gave a great survey of the result and a list of tantalizing open questions. In full generality, this problem is hard, but for special subclasses, it is possible. The most prominent open problem is about characterizing functions for which isomorphism is testable.

Dana Ron (Tel-Aviv University) on Exponentially Improved Algorithms and Lower Bounds for Testing Signed Majorities: A signed majority \(f:\{-1,1\}^n \to \{-1,1\}\) is a function of the form \(f(x) = sign(\sum_i \sigma_i x_i)\), where \(\sigma_i \in \{-1,1\}\). Dana talked about a \(\mathop{poly}(\log n)\) adaptive tester and \(n^{\Omega(1)}\) lower bound for non-adaptive testers. These are both exponential improvements on previous results. I personally liked the algorithmics: most property testers are usually “obvious”, and it’s the analysis that is hard. Here, the algorithm itself was quite non-trivial, and used many tools from previous work. Paper here.

A double feature of Oded Goldreich (Weizmann Institute of Science), On the Communication Complexity Methodology and on Multiple Input Problems: The first part of Oded’s talk was a (re)presentation on the communication complexity method introduced by Blais, Brody, Matulef. Oded argued that there was a cleaner (but equivalent) reductions from communication complexity to property testing. Eric Blais appeared to concur, though it is not clear whether out of politeness or conviction. (Ok, ok, I think Eric actually did agree.) Paper here.

The next part was on direct-sum and direct-product problems in property testing. For example: given multiple inputs, we want to accept if all inputs have the property, and reject if one of them is far from the property. How does this compare with the vanilla problem with one input? Oded talked about this and other variants. Paper here.