# Aug. 26-30: Real Analysis in Testing, Learning and Inapproximability at the Simons Institute

The new Simons Institute for the Theory of Computing opened its doors this August, and kicked things off with a workshop on real analysis and its applications in property testing, learning theory, and inapproximability. Among many great talks (see here for the full schedule) were the following presentations on property testing and related topics.

Ryan O’Donnell on Testing Surface Area. Given a subset $$S \in [0,1]^2$$ of the unit square, can we efficiently test whether the set has small perimeter? Buffon’s needle suggests a natural test: drop a (small) needle at random in the square and see if exactly one of its endpoints lands in the set $$S$$. Ryan showed how elegant noise sensitivity arguments show the correctness of this test and its higher-dimensional analogues.

I gave a talk on The Analysis of Partially Symmetric Functions. I gave a brief introduction to partially symmetric functions, tools from the analysis of boolean functions that help us understand them, and their role in theoretical computer science, with a particular emphasis on the function isomorphism testing conjecture.

Hamed Hatami on Testing Affine-Invariant Properties of Algebraic Functions. Hamed gave a fascinating survey of the recent progress on testing affine-invariant properties. He showed how the tools that have been used to analyze the testability of properties of graphs can be combined with tools from higher-order Fourier analysis of boolean functions to analyze the testability of algebraic properties of functions.

Parikshit Gopalan on Locally Testable Codes and Cayley Graphs. Despite a considerable amount of interest and attention, many fundamental questions regarding locally testable codes remain open. Parikshit’s talk gave a very nice connection between LTCs and some Cayley graphs which enables us to view LTCs from a different angle and will hopefully lead to further progress on these open problems.

Nati Linial on Local Combinatorics, Or: What to do with all those gigantic graphs?. Let’s say that you want to keep a snapshot of a huge graph $$G$$, but only have a very small amount of space available. What can you do? One natural solution is to keep a local k-profile of the graph: for each graph $$H$$ on $$k$$ vertices, count the number of embedded copies of $$H$$ in $$G$$. Nati’s talk showed that the study of the connections between local k-profiles of graphs and global properties of graphs leads to very nice results as well as a wealth of intriguing open problems.

Many other interesting workshops on real analysis in computer science and on the theoretical foundations of big data analysis will be held throughout the semester. See the respective pages for all the details, and for videos of the talks.

# News for August 2013

August saw a couple of conferences: RANDOM-APPROX and a seminar at the Simon’s institute. Here’s a report on the new papers we saw in August, including much progress on distribution testing.

On sample based testers by Oded Goldreich and Dana Ron (ECCC).The usual setting for property testing is beloved query model, where the tester gets to make any query it pleases. A much weaker setting would be to simply get random (labeled) samples of the domain. These are called sample based testers, and are more akin to setting in learning theory. Sampled based testers were also discussed in the seminal Goldreich, Goldwasser, and Ron paper. Since then, there have been variants, such as distribution-free testing and active testing. In this paper, it is shown that constant-query proximity-oblivious testers imply the existence of sublinear (polynomial dependence in size) sample based testers.

Instance-by-instance optimal identity testing by Gregory Valiant and Paul Valiant (ECCC). The problem of distribution testing is a problem that we all love. Given a known discrete distribution $$p$$, we wish to test equality (with $$p$$) of an unknown distribution $$q$$. How many independent samples are required of $$q$$? This paper gives an optimal algorithm for each distribution $$p$$, subsuming much past work. The analysis involves a characterization of Hölder and norm-monotonicity type inequalities. Definitely on my list to read!

Optimal algorithms for testing closeness of discrete distributions by Siu-On Chan, Ilias Diakonikolas, Gregory Valiant, Paul Valiant (Arxiv). Now consider the setting where both $$p$$ and $$q$$ are unknown (over a support of size $$n$$), and we wish to test equality. (We define “far” in terms of variation distance.) A (by-now) classic paper of Batu et al. gives an $$O(n^{2/3}\log n/\varepsilon^{8/3})$$ and Valiant proved an $$\Omega(n^{2/3})$$ lower bound. This paper completely resolves this problem problem with an algorithm and matching lower bound of $$\max(n^{2/3}/\varepsilon^{4/3}, n^{1/2}/\varepsilon^2)$$.

# Aug 21-23: RANDOM-APPROX 2013

RANDOM-APPROX 2013 saw a bunch of property testing talks.

An optimal lower bound for monotonicity testing over hypergrids (Deeparnab Chakrabarty and C. Seshadhri). I talked about a lower bound for monotonicity testing over hypergrids. Given a function $$f:[n]^d \rightarrow \mathbb{N}$$, we wish to test if $$f$$ is monotone with respect to the coordinate-wise partial order. We proved a adaptive, two-sided $$\Omega(\varepsilon^{-1} d\log n)$$ lower bound, matching recent upper bounds.

Testing Membership in Counter Automaton Languages (Yonatan Goldhirsh and Michael Viderman). In a seminal result, Alon, Krivelevich, Newman, and Szegedy proved that membership in regular languages is testable in constant queries. Can we extend this to richer languages? Lachish and Newman proved that membership in languages given by a pushdown automaton with a single stack symbol (a counter automaton) is not testable with constant queries. Yonathan talked about a weaker version of counter automata, for which membership is testable in constant queries.

Tight lower bounds for testing linear isomorphism (Elena Grigorescu, Karl Wimmer and Ning Xie). Understanding testability of functions that exhibit symmetries (like linear invariances) is an important program in property testing. A function $$f : \{0, 1\}^n \rightarrow \{0, 1\}$$ is linear isomorphic to a $$g$$ if $$f = g \circ A$$, for non-singular transformation $$A$$. Ning talked about a lower bound of $$\Omega(n^2)$$ for testing if a function is linear isomorphic to the inner product function. (This is tight, as a simple algorithm shows.) In other words, consider the property of functions linear isomorphic to the inner product. This property is not testable in constant queries, suggesting that symmetries themselves do not suffice for (constant) testability.

Local reconstructors and tolerant testers for connectivity and diameter (Andrea Campagna, Alan Guo and Ronitt Rubinfeld). Local reconstructors are like property testers on steroids. Suppose we have access to some function $$f$$ that does not satisfy some property $$\mathcal{P}$$. A local reconstructor provides oracle access to a function $$g \in \mathcal{P}$$ that is (approximately optimally) close to $$f$$, and the reconstructor requires only sublinear queries to $$f$$ to output a value of $$g$$. Alan talked about reconstructors for $$k$$-connectivity in the general sparse-graph model. This also leads to new tolerant testers. Pretty intricate and neat algorithms, IMHO.

A local computation approximation scheme to maximum matching (Yishay Mansour and Shai Vardi). If you put local reconstructors on steroids, you get a local computation algorithm. Consider a graph $$G$$, where want to “locally” find a maximum matching. We want a sublinear time procedure that given an edge of $$G$$ outputs whether it is part of the matching or not. All these answers must be globally consistent and represent the final matching. Shai talked about a local $$\mathop{poly}(\log n)$$ algorithm for finding $$(1-\epsilon)$$-approximate matchings. Another impressive local algorithm.

Absolutely Sound Testing of Lifted Codes (Noga Ron-Zewi, Elad Haramaty and Madhu Sudan). Elad talked about more progress on the testing of affine-invariant properties. Previous work define a “lifting operation” for an affine-invariant base code (defined as say a function $$f:\mathbb{F}^t \rightarrow \mathbb{F}$$. This is a codeword in many more variables (say a function $$f:\mathbb{F}^n \rightarrow \mathbb{F}$$), whose restriction to any t-dimensional affine subspace belongs to the base code. This paper gives absolutely sound testers for the lift of any affine-invariant base code. This work is a nice way of interpreting (and generalizing) the ideas behind low-degree testing.

# News for July 2013

Helly-Type Theorems in Property Testing by Sourav Chakraborty, Rameshwar Pratap, Sasanka Roy, and Shubhangi Saraf (arXiv). Helly’s Theorem states that if every $$d+1$$ sets in a family $$F$$ of convex sets in $$\mathbb{R}^d$$ have non-empty intersection, then the intersection of the whole family $$F$$ is also non-empty. This paper establishes a new connection between Helly’s theorem and clustering problems. Specifically, extensions of Helly’s theorem are used to analyze algorithms that test whether a set of points can be partitioned into a small number of clusters or not.

On active and passive testing by Noga Alon, Rani Hod, and Amit Weinstein (arXiv). In the standard property testing setting, the tester is able to query any input it chooses. This model is unrealistic in some settings, where instead it is more appropriate to consider models where the tester can only choose queries from a restricted set of the inputs (active testing) or where the tester only sees bits sampled at random from the input (passive testing). This paper gives new upper and lower bounds on the number of queries required to test many natural properties of boolean functions—including juntas, partially symmetric functions, and low degree polynomials—in these two settings.

Explicit Maximally Recoverable Codes with Locality by Parikshit Gopalan, Cheng Huang, Bob Jenkins, and Sergey Yekhanin (ECCC). The central goal in the study of locally decodable codes is to maintain the rate and error tolerance while adding the condition that we can recover any bit of the original input with a small number of queries to the codeword. This paper considers and makes progress on an interesting relaxation of this problem: here the goal is to maintain local decoding when a small number of errors are introduced, while allowing more queries to be made to decode input bits when more errors occur.

# News for June 2013

In June, we had a property testing workshop at Haifa where a lot of recent PT work was discussed. Check out the post on that. June also saw some interesting developments on testing affine-invariant properties, on testing sub-properties, on local computation algorithms, and on PCP constructions.

TESTING AFFINE-INVARIANT PROPERTIES

Over the last few years, there has been much progress in determining which affine-invariant properties of boolean functions can be tested with a constant number of queries. In Estimating the distance from testable affine-invariant properties (arXivECCC), Hamed Hatami and Shachar Lovett show that for every such property, we can not only test it but also estimate the distance to the property with a constant number of queries.

The function linear isomorphism testing problem asks: How many queries do we need to test if a given (unknown) function is equivalent, up to a linear transformation of the input space, to some (known) function $$f$$? The answer to this question depends on the choice of the function $$f$$. Elena Grigorescu, Karl Wimmer, and Ning Xie, in Tight lower bounds for testing linear isomorphism (ECCC) and Abhishek Bhrushundi, in the concurrent and independent paper On testing bent functions (ECCC), show that the query complexity for testing linear isomorphism is maximized when $$f$$ is the Inner Product function. Interestingly, the proofs of this result (and other more general results) are obtained using completely different methods: Elena, Karl, and Ning prove their lower bounds using the communication complexity method, while Abhishek’s proof is obtained by studying the parity decision tree complexity of boolean functions.

TESTING SUB-PROPERTIES

One counter-intuitive aspect of property testing is that the query complexity for testing $$P$$ does not in general imply anything about the query complexity for testing a sub-property $$P’ \subseteq P$$. For example, while we can test halfspaces (aka, linear threshold functions) with a constant number of queries, testing the subclass of signed majorities is known to require $$\Omega(\log n)$$ queries, and in fact the best-known algorithm for this task is a non-adaptive tester that requires $$O(\sqrt{n})$$ queries. In Exponentially improved algorithms and lower bounds for testing signed majorities (ECCC), Dana Ron and Rocco Servedio dramatically improve both the upper and lower bounds: they show that non-adaptive algorithms for testing signed majorities require $$\mathrm{poly}(n)$$ queries and that signed majorities can be tested by an adaptive algorithm that requires only $$\mathrm{polylog}(n)$$ queries.

Another phenomenon that appears in some natural properties is that while a property $$P$$ requires many queries to test, it can be partitioned into (slightly) smaller properties $$P’$$ that can each be tested with a constant number of queries. It is natural to ask whether this is a universal phenomenon. In Some properties are not even partially testable (arXiv, ECCC), Eldar Fischer, Yonatan Goldhirsh, and Oded Lachish show that it is not: they show that there are properties $$P$$ for which every (large enough) sub-property of $$P$$ requires a large number of queries to test.

LOCAL COMPUTATION AND PCP CONSTRUCTIONS

A notion that is very closely related to property testing is that of local computation
algorithms: algorithms that, as in the property testing setting, aim to compute the solution of a problem in sublinear time by querying as few bits of the input as possible. In A Local Computation Approximation Scheme to Maximum Matching (arXiv), Yishay Mansour and Shai Vardi give a new local computation algorithm for obtaining a $$(1-\epsilon)$$-approximation to the maximum matching in bounded-degree graphs.

The notion of Probabilistically Checkable Proofs (PCPs) is also closely related to property testing, where now the input to the tester is a string $$x$$ and a purported proof that $$x$$ satisfies some property $$P$$; the tester must verify the correctness of the proof while examining as few bits of $$x$$ and of the proof as possible. A long-standing open problem in this area is to understand the best possible trade-offs between the query complexity and the length of proofs for PCP constructions. In Constant rate PCPs for circuit-SAT with sublinear query complexity (ECCC), Eli Ben-Sasson, Yohay Kaplan, Swastik Kopparty, Or Meir, and Henning Stichtenoth give a verifier for a special case of PCPs that obtains a sublinear query complexity with proofs that have length only linear in the size of the input.