# News for September 2014

It is unfortunate that in the month of september we had very few papers in theory that appeared in ECCC or Arxiv. And of them I could not find any paper that falls in the category of property testing / query complexity or sublinear algorithms.  It is possible I have overlooked some papers. Please do point out any paper that you may think relevant that appeared in the month of September.

# New for August 2014

We’ve got a nice collection of papers this August, ranging from graph property testing, locally testable codes, to new lower bound approaches.

Deconstructions of Reductions from Communication Complexity to Property Testing using Generalized Parity Decision Trees by Roei Tell (ECCC). Blais, Brody, and Matulef (BBM) introduced the beautiful and powerful method of using communication complexity to derive property testing lower bounds. This result is all about understanding this approach is greater depth, and really trying to get at the crux of the reductions. The BBM approach uses rather heavy hammers (communication complexity lower bounds), so could we get away with weaker hammers to get property testing lower bounds? The key insight is that less powerful models can be reduced to existing property testing problems, and these are equivalent to the BBM reductions. While this (as of now) does not lead to newer bounds, it does shed significant light on existing bounds. The weaker model is essentially that of linear decision trees. Such reductions were also discovered by Brushundi, Chakraborty, and Kulkarni, proving $$\Omega(k)$$ lower bounds for testing $$k$$-linearity.

Locally Correctable and Testable Codes Approaching the Singleton Bound by Or Meir (ECCC). Locally Testable Codes (LTCs) are codes that have a constant (or polylogarithmic) time tester. It is known that such codes with minor caveats cannot have a constant rate (a number of papers on this topic: Katz and Trevisan, Dinur and Kaufman, Ben-Sasson and Viderman). But when the query complexity can be $$n^\beta$$ (where $$n$$ is the codeword size), the rate can be arbitrarily close to $$1$$ (Viderman, and more refs in the paper). This result takes this fact to the very extreme. The Singleton Bound is a lower bound between the tradeoff between rate and relative distance of a code. Amazingly, this lower bound tradeoff can be achieved by an LTC (and also Locally Correctable Code) testable with $$n^\beta$$ queries!

Complexity of Nondeterministic Graph Parameter Testing by Marek Karpinksi and Ronald Markó (arXiv). It is easiest to think of nondeterminstic graph testing with the example of testing bipartiteness for dense graphs. Suppose we wish to test $$G$$ for bipartiteness. As a “certificate”, we are provided a candidate partition $$V = V_1 \cup V_2$$. There is a “simple” constant-time tester that can property testing if $$G$$ is bipartite with respect to this partition. (Just sample two uniform random vertices, and check if the edge between them respects the bipartition.) So we say that bipartiteness is non-determinstically testable: if $$G$$ is far from being bipartite, no certificate can convince the simple tester. The full definition is somewhat complex, but it is basically the same idea. (In this context, “testablility” always refers to constant time testable.) Lovász and Vestergombi proved the striking result that any non-deterministically testable property is also deterministically testable, but their proof did not give an explicit bound on the query complexity. Gishboliner and Shapira fixed that, and gave a tower-type bound for the query complexity. So if the query complexity of the underlying “simple” tester is $$q$$, then the non-deterministic property has a tester whose query complexity is a tower of height $$q$$. This results goes much further, and shows the final query complexity is at most triply exponential in $$q$$. This is quite significant, since it is uses a weaker regularity approach than previous results.

# News for July 2014

In this month’s post, we review papers that raise two questions that I find particularly interesting. First, can sublinear algorithms find applications in efficient approximation of CSPs? And second, can property testing be extended to give meaningful results in settings where Hamming distance is not the “right” measure of distance between functions?

Going for Speed: Sublinear Algorithms for Dense r-CSPs by Grigory Yaroslavtsev (arXiv). In their seminal paper, Goldreich, Goldwasser, and Ron considered the MAX-CUT problem. They showed that both the testing variant of the problem (where the algorithm must distinguish between graphs with a cut of size $$s$$ from those with maximum cut size at most $$s – \epsilon N^2$$) and the approximation variant (where the algorithm must find a cut that has at most $$\epsilon N^2$$ fewer edges than the largest cut in the graph) can be solved in sublinear time. This result inspired the very successful line of research on testing properties of graphs. We can also view this result as motivation for studying the approximability of general CSPs: given some constraint satisfaction problem, can we find an assignment that satisfies almost as many constraints as the maximally satisfying assignment in sublinear time? This current paper begins the exploration of this question with an improved bound on the query complexity of the MAX-CUT problem and with the first general result for approximating CSPs of arity greater than 2 (i.e., where the constraints are on more than 2 variables).

Lp testing by Piotr BermanSofya Raskhodnikova, and Grigory Yaroslavtsev (preprint). Almost all of the work in testing properties of functions is based on the Hamming distance: for a given property P, a tester is required to accept functions that have the property P and to reject the functions that differ from every function with the property P on an $$\epsilon$$ fraction of the inputs. This notion of distance is appropriate for algebraic properties of functions and for properties of boolean functions, but for other properties of real-valued functions, $$L_p$$ distances are sometimes more appropriate. This paper provides the first systematic study of property testing under these distances. An important high-level message from this work is that the testing problems can be very different under these distances than under Hamming distance. For example, the classic problem of $$\epsilon$$-testing whether $$f : [n] \to \mathbb{R}$$ is monotone requires $$\Theta(\frac{\log n}{\epsilon})$$ queries in the Hamming distance setting, but only $$\Theta(\frac1{\epsilon^p})$$ queries in the $$L_p$$ distance setting!

# News for June 2014

Unfortunately the month of June was a very dry month for property testing. We could not find any papers on property testing (in ECCC or ArXiv) during the month of June. Hopefully we will have lots of papers in the month of July.

# News for May 2014

May seems to be a slow month for property testing, with only one paper on locally testable codes. Nonetheless, we have unearthed a paper from April (missed despite our best efforts) on a related topic of hidden set approximations.

The Power of an Example: Hidden Set Size Approximation Using Group Queries and Conditional Sampling by Dana Ron and Gilad Tsur (arXiv). Consider a known universe $$U$$ and an unknown subset $$S$$. Our aim is to approximately determine $$|S|$$. We can perform subset queries: for a query subset $$T$$, an oracle returns whether $$T \cap S \neq \emptyset$$ or (a more powerful model) the oracle returns a uniform random element in $$T \cap S$$. This paper gives a detailed study of this problem under various settings. There is a fascinating array of upper and lower bounds, including situations where $$U$$ is ordered and $$T$$ must be an interval, the difference between adaptivity and non-adaptivity, arbitrary subset queries, and much more. The topic seems to be quite rich, and it should yield some nice sets of problems to study further. It appears that some lower bounds are quite open. Students, pay attention!

Limitations on Testable Affine-Invariant Codes in the High-Rate Regime by Venkatesan Guruswami, Madhu Sudan, Ameya Velingker, and Carol Wang (ECCC). Locally Testable Codes (LTCs) are codes that (typically) have a constant time tester. Interestingly, recent work on testing codes using a linear number of queries (some small constant fraction of the code length) has connections to the Small Set Expansion (SSE) problem, which in turn is related to the famous Unique Games Conjecture (UGC). A paper by Barak et al. gives a construction of a small set expander (a graph where all small sets have high conductance) where the Laplacian has many small eigenvalues (so it’s “difficult” to find a low conductance cut). This construction uses high-rate LTCs that are linear time testable. In some sense, the better the rate, the better construction one gets. This paper asks how far this idea can go. So the idea is to construct the best rate linear-time testable LTC. Reed-Muller codes fall in this category, but their rate is quite far from optimal. Recent work by Guo et al. gives a slightly better construction, but this is far from the (near) optimal rate achieved by BCH codes (which are not known to be testable). All in all, this paper shows that for affine-invariant codes, the Guo et al. construction is essentially the best testable high-rate code one can get. I have obviously done no justice to the depth of the connections, the intricacies of the parameters, or the overall coolness of this subject. So go read the paper!

# News for April 2014

As winter finally releases its grip here on the east coast, we celebrate with forests, with bounded-derivative properties, and with hypergraphs. (I believe these are the traditional ingredients for the beginning-of-spring celebrations in New England, but I could be wrong.)

Testing Forest-Isomorphism in the Adjacency List Model by Mitsuru Kusumoto and Yuichi Yoshida (arXiv). Which properties of sparse graphs can we test efficiently? To a large extent, this fundamental question remains open. The natural setting in which to study this question is the adjacency list query model. In this model, a testing algorithm can select any vertex $$v$$ and index $$i$$; it then receives the identity $$w$$ of the $$i$$th neighbor of $$v$$. Kusumoto and Yoshida show that in this model, we can test if two forests (i.e., collections of trees) on $$n$$ vertices are identical up to relabeling of the vertices with polylog($$n$$) queries. They also show that, remarkably, this tester can be used to test any property of forests in the adjacency list model with polylog($$n$$) queries.

Property Testing on Product Distributions: Optimal Testers for Bounded Derivative Properties by Deeparnab Chakrabarty, Kashyap Dixit, Madhav Jha, and C. Seshadhri (arXiv). Two basic properties of a function $$f : [n]^d \to \mathbb{R}$$ on the hypergrid are monotonicity and the Lipschitz property. These two properties are special cases of a more general class of properties called bounded-derivative properties. In this paper, the authors give optimal bounds on the number of queries required to test these properties over every product distribution on the hypergrid. These results are obtained by deriving new dimension-reduction results and, most interestingly, by establishing and exploiting a strong connection between bounded-derivative properties and binary search trees.

Any Monotone Property of 3-uniform Hypergraphs is Weakly Evasive by Raghav Kulkarni, Youming Qiao, and Xiaoming Sun (ECCC). What happens if we remove the distance promise in the definition of property testing? Or, equivalently, how efficient can testers be if they must distinguish objects with some property $$P$$ from every object that does not have the property $$P$$? The answer to this question seems completely obvious: no non-trivial property can be tested in this setting with a sublinear number of queries. We can easily verify that this answer is indeed correct for our favorite properties… but is it really true for every non-trivial property? As it turns out, this question is far from trivial, and in fact leads to the famous evasiveness conjectures and to the celebrated results of Rivest and Vuillemin and of Kahn, Saks, and Sturvevant. The current paper combines and extends ideas from both of these papers to show that deterministic testers for any monotone non-trivial property of 3-uniform hypergraphs indeed have linear query complexity.

# News for March 2014

On Learning and Testing Dynamic Environments by Oded Goldreich and Dana Ron (ECCC). Given a dynamic environment that is evolving according to a local rule, the goal is to query some of the states in the system at some point of time and determine if the system is evolving according to some fixed rule or is far from it. A harder problem is to learn the evolution of the environment. They give some upper and some lower bounds for various versions of these problems. In particular they show that using sub-linear queries a lot can be achieved even in this setting. This is the first of its kind result and possibly many more results in this area will follow soon.

# News for February 2014

So what did we see in February 2014?

Local Algorithms for Sparse Spanning Graphs by Reut Levi, Dana Ron, and Ronitt Rubinfeld (arXiv). Given a graph $$G = (V,E)$$, the aim is the find a small sparse spanning subgraph $$G’$$ in the local algorithm setting. So, given any edge $$(u,v) \in E$$, we must determine in sublinear time whether this edge exists in $$G’$$. The main result is an $$\Omega(|V|)$$ lower bound, and a matching algorithm when $$G$$ is either an expander or hyperfinite.

A Characterization of Locally Testable Affine-Invariant Properties via Decomposition Theorems by Yuichi Yoshida (arXiv). This result gives a complete characterization of (two-sided) locally testable affine-invariant properties. These are based on existing decomposition theorems for functions into structured and pseudorandom parts. This result is analogous to that of Alon, Fischer, Newman, and Shapira (link) on characterizations for testability over dense graphs.

Testing probability distributions underlying aggregated data by Clément Canonne and Ronitt Rubinfeld (arXiv). There has been much recent work on distribution testing, and this work discusses stronger models where better algorithms can be obtained. Given a distribution $$D$$ over $$[n]$$, we can get samples from this distribution, and also query the probability $$D(i)$$ of any $$i \in [n]$$. With these “dual” queries, one gets distribution testers for a variety of problems surpassing previous lower bounds. Interestingly, this is stronger than the conditional model introduced recently (here and here).

Strong Locally Testable Codes with Relaxed Local Decoders by Oded Goldreich, Tom Gur, and Ilan Komargodski (ECCC). For a locally testable code (LTC), the property of being a codeword is efficiently (constant time) testable. A locally decodable code (LDC) is a code from which individual bits of the original message can be recovered in constant time. A relaxed local decoder is allowed to declare failure for recovered a small fraction of message bits. This allows for bypassing the difficult (and prominent) open problem of constructing small LDCs. This result gives a construction of near-linear LTCs with a relaxed local decoder. This result leads to a property that is hard to test but easy to verify (in the MA proofs of proximity sense, refer to last month’s open problem).

# Open problem for February 2014: Better approximations for the distance to monotonicity

Slightly delayed, Feb’s open problem is by one of the three possible “yours truly”s, Sesh. Looking for more reader participation, hint, hint.

Basic setting: Consider $$f:\{0,1\}^n \rightarrow R$$, where $$R$$ is some ordered range. There is a natural coordinate-wise partial order denoted by $$\prec$$. The function is monotone if for all $$x \prec y$$, $$f(x) \leq f(y)$$. The distance to monotonicity, $$\epsilon_f$$ is the minimum fraction of values that must be changed to make $$f$$ monotone. This is an old problem in property testing.

Open problem: Is there an efficient constant factor approximation algorithm for $$\epsilon_f$$? In other words, is there a $$poly(n/\epsilon_f)$$ time procedure that can output a value $$\epsilon’ = \Theta(\epsilon_f)$$?

State of the art: Existing monotonicity testers give an $$O(n)$$-approximation for $$\epsilon_f$$, so there is much much room for improvement. (I’d be happy to see a $$O(\log n)$$-approximation.) Basically, it is known that the number of edges of $$\{0,1\}^n$$ that violate monotonicity is at least $$\epsilon_f 2^{n-1}$$ [GGLRS00], [CS13]. A simple exercise (given below) shows that there are at most $$n\epsilon_f 2^n$$ violated edges. So just estimate the number of violated edges for an $$O(n)$$-approximation.
(Consider $$S \subseteq \{0,1\}^n$$ such that modifying $$f$$ on $$S$$ makes it monotone. Every violated edge must have an endpoint in $$S$$.)

Related work: Fattal and Ron [FR10] is a great place to look at various related problems, especially for hypergrid domains.

References

[CS13] D. Chakrabarty and C. Seshadhri. Optimal Bounds for Monotonicity and Lipschitz Testing over Hypercubes and Hypergrids. Symposium on Theory of Computing, 2013.

[FR10] S. Fattal and D. Ron. Approximating the distance to monotonicity in high dimensions . ACM Transactions on Algorithms, 2010.

[GGL+00] O. Goldreich, S. Goldwasser, E. Lehman, D. Ron, and A. Samorodnitsky. Testing Monotonicity . Combinatorica, 2000.