Alas! A dry month, with no property testing or sublinear papers. Maybe next month…

# Sublinear Algorithms Workshop at Johns Hopkins University

*(Posting an announcement for a workshop on sublinear algorithms.)*

We are organizing a Sublinear Algorithms workshop that will take place at Johns Hopkins University, January 7-9, 2016. The workshop will bring together researchers interested in sublinear algorithms, including sublinear-time algorithms (e.g., property testing and distribution testing), sublinear-space algorithms (e.g., sketching and streaming) and sublinear measurements (e.g., sparse recovery and compressive sensing).

The workshop will be held right before SODA’16, which starts on January 10 in Arlington, VA (about 50 miles from JHU).

Participation in this workshop is open to all, with free registration. In addition to 20+ excellent invited talks, the program will include short contributed talks by graduating students and postdocs, as well as a poster session. To participate in the contributed talk session and/or the poster session, apply by December 1.

For further details and registration, please visit

http://www.cs.jhu.edu/~vova/

Best,

Vladimir Braverman, Johns Hopkins University

Piotr Indyk, MIT

Robert Krauthgamer, Weizmann Institute of Science

Sofya Raskhodnikova, Pennsylvania State University

# News for September 2015

We have a couple of papers this month.

**Are Few Bins Enough: Testing Histogram Distributions** by Clement Canonne (ECCC). Testing whether a distribution is uniform is a very important, well studied and almost completely understood problem. A generalization of this question is whether a distribution on the set *{1,…,n}* is a *k*-histogram, that is, whether the distribution can be represented as a piece-wise constant function over at most *k* contiguous intervals. This very important generalization has been much less understood and huge gap between the upper and lower bound for the query complexity existed. In this paper the gap has been almost closed by improving both the upper and lower bounds.

**Testing Properties of Functions on Finite Groups** by Kenta Oono and Yuichi Yoshida (arXiv). Testing algebraic properties of functions have taken a central place in property testing right from the time of its inception. Different kinds of functions have been studied in literature. In this paper the authors study the functions that map elements of a finite group to square matrices over the complex field with Frobenius norm 1. In this paper various properties of these functions are proved to be testable. Some of the properties considered in this paper are important from the point of view of representation theory. This is the first paper making the connection between property testing and representation theory. With representation theory taking a prominent role in computer science lately, this paper is expected to be the first of a long list of related results to follow.

# News for August

This month saw more development on testing properties of distributions and a result with intriguing connections to property testing. And for readers who may have missed it, Clément Canonne and Gautam Kamath wrote an engaging survey of some recent work on testing properties of distribution here.

**Optimal algorithms and lower bounds for testing closeness of structured**** distributions** by Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin (arXiv). One of the fundamental results in testing properties of distributions is that if we want to estimate the (\(L_1\)) distance between two unknown distributions on a domain of size \(n\) up to some constant additive factor, we need to draw \(O(n^{2/3})\) samples from these distributions, and this sample complexity is tight in general. But what if we consider the same problem in the setting where we are promised that the distributions come from some (known) class of distribution? This paper shows that, for many natural classes of distributions, we can obtain much more efficient algorithms for testing the closeness of distributions in this setting.

**Smooth Boolean functions are easy: efficient algorithms for low-sensitivity functions** by Parikshit Gopalan, Noam Nisan, Rocco A. Servedio, Kunal Talwar, and Avi Wigderson (arXiv). The (maximum) *sensitivity* of a Boolean function \(f : \{0,1\}^n \to \{0,1\}\) is the size of the largest set \(S \subseteq [n]\) such that there is an input \(x \in \{0,1\}^n\) for which \(f(x) = f(y)\) for every input \(y\) obtained by flipping the value of the \(i\)th coordinate of \(x\) for some \(i \in S\). One of the results in this paper shows that functions with low sensitivity can be locally self-corrected: given query access to a function \(r : \{0,1\}^n \to \{0,1\}\) that is close to a function \(f\) with low sensitivity (think of \(r\) as a *corrupted* version of \(f\)), there is an algorithm that for any input $x \in \{0,1\}^n$, queries \(r\) on a few inputs and outputs with high probability the value \(f(x)\). This notion of local self-correction is of course closely related to locally-decodable codes; it is also one of the fundamental techniques used to obtain many results in property testing as well (see for example Chapter 3 of Dana Ron’s survey). Can this result, or the techniques used to obtain it, also yield new results in property testing?

# News for July 2015

The summer month of July brings to us exciting work on distribution testing, scale-free graph testing, personal PageRank estimation, and (hold your breath!) cake cutting. Let’s hear more…

Avid followers of PTReview need little introduction to distribution testing, since we’ve seen much work on this topic over the past year. We have two concurrent results that really push the state-of-the-art of distribution testing.

**Testing Shape Restrictions of Discrete Distributions **by Clément Canonne, Ilias Diakonikolas, Themis Gouleakis, and Ronitt Rubinfeld (arxiv). Most (sublinear) distribution testing results work for structured classes, like monotone, k-modal, Poisson, multinomial etc. distributions. This paper gives an overarching framework that unifies these concepts in terms of *succinctness*. Consider a class \(\mathbb{C}\) of distributions over \([n]\). This class is succinct if for any distribution \(\mathcal{D} \in \mathbb{C}\), one can partition \([n]\) into a “small” number of intervals such that \(\mathcal{D}\) is “approximately constant” with each interval. The main punchline is that any such class has a sublinear sample distribution tester. Amazingly, the generic tester given is optimal (up to poly log factors and the distance parameter)! Much previous work is subsumed as a corollary of this main theorem, and the paper also gives a lower bound framework to get matching lower bounds.

**Optimal Testing for Properties of Distributions **by Jayadev Acharya, Constantinos Daskalakis, and Gautam Kamath (arxiv). This paper concurrently gets many of the results of the previous paper using completely different techniques. The focus is on monotone, product, and log-concave distributions, where the paper gives completely optimal testers (both in terms on the domain size and the distance parameter). The key insight is to focus on the \(\chi^2\)-distance between distributions, though distribution testing typically uses the total-variation (or \(\ell_1\)) distance. The tester is also robust/tolerant, in that yes instances include distributions close to the class considered.

**Bidirectional PageRank Estimation: From Average-Case to Worst-Case by Peter Lofgren**, Siddhartha Banerjee, and Ashish Goel (arxiv). Not your typical sublinear algorithms paper, but that never stopped us. Personalized PageRank (PPR) is a fundamental similarity notion between vertices in a graph, and is (sort of) the probability that a short random walk from \(u\) ends at \(v\). Naturally, estimating the PPR value from \(u\) to \(v\) can be done in time \(O(1/p)\), where \(p\) is the PPR value. (Simply take many random walks from \(u\), and see how often you hit \(v\).) Can you do sublinear in this value? The answer is yes, when the graph is undirected and has bounded degree. One can do this in \(O(1/\sqrt{p})\) time, a significant improvement. It basically uses bidirectional search, by doing forward walks from \(u\) and backward walks from \(v\). These ideas were developed in earlier work of Lofgren et al, based on true-blooded sublinear graph algorithms for expander reconstruction by Kale, Peres, and Seshadhri, which are further based on truer-blooded property testing questions by Goldreich and Ron on expansion testing. The best thing, these algorithms work in practice!

**Every Property is Testable on a Natural Class of Scale-Free Multigraphs – ver. 2** by Hiro Ito (arxiv). Alas, PTRreview missed ver. 1, which appeared in April. One of the weaknesses of property testing on sparse graphs is the restriction to bounded degree. This paper considers arbitrary sparse graphs, and focuses on graphs with a power-law degree distribution. There are classic results of Barabási and Albert showing the significance of such graphs (also called scale-free networks) in the real-world. The main result is that special classes of scale-free graphs defined by clustering properties are hyperfinite. This means that one can remove a small fraction of edges to split the graph into constant-sized pieces. An important result of Newman and Sohler showed (in the bounded degree case) that any property defined by a subset of hyperfinite graphs graphs is testable. This paper leverages that result to show that properties defined by clustered classes of scale-free graphs are also testable.

**How to solve the cake-cutting problem in sublinear time – ver. 2** by Hiro Ito and Takahiro Ueda (arxiv). Again, ver. 1 appeared in April and was missed by us. Apologies! Cake cutting is a classic resource allocation problem. Consider a cake, as the unit interval \([0,1]\). There are \(n\) players, each of whom has a different utility of the cake, each specified by a continuous, real function on \([0,1]\). We wish to give each player a slice (subinterval of \([0,1]\)) such that they all have utility \(1/n\). An old result of Even and Paz give \(O(n \log n)\) divide and conquer algorithms. What can be said in sublinear time? This paper shows that one can give fair assignments to \(o(n)\) players in \(o(n)\) time, such that the assignment can be extended to almost all remaining players.

# News for June 2015

This month we have a number of results that are related to query complexity though not directly related to property testing.

**Diamond Sampling for Approximate Maximum All-pairs Dot-product (MAD) Search** by Grey Ballard, Tamara G. Kolda, Ali Pinar and C. Seshadhri (arXiv). For a long time we have been hoping to use our tools and techniques to solve problems that are useful in the real world One such problem which has importance in the real life applications is, given two sets of vectors the problem is to find the t pairs of vectors with the highest dot product. In this paper they use clever sampling techniques to design algorithms which are better than the state-of-the-art algorithms. They not only give theoretical guarantee but also validate their results empirically. Bridging the gap between theory and practice is an extremely important at the same time a very challenging job. We hope more work will be done in this direction.

**Sub-linear Upper Bounds on Fourier dimension of Boolean Functions in terms of Fourier sparsity** by Swagato Sanyal (arXiv). Understanding the relationship between Fourier dimension of a Boolean function and sparsity is an important problem. In this paper a better bound on the Fourier dimension in terms of sparsity is obtained. The main technique is to use the fact that the Fourier dimension is equivalent to the the non-adaptive parity decision tree and then bounding the parity decision tree in terms of sparsity.

**Relationship between Deterministic Query Complexity, Randomized Query Complexity and Quantum Query Complexity**. In the world of query complexity understanding the exact relationship between the the various models of computation is the main problem. It is known that Deterministic Query Complexity, Randomized Query Complexity and Quantum Query complexity are all polynomially related. But the exact polynomial relation between them is not known. Last month there was a sudden burst of activity in this area with three papers addressing this problem coming out is a span of two weeks. In the papers **Separations in Query Complexity Based on Pointer Functions** by Andris Ambainis, Kaspars Balodis, Aleksandrs Belovs, Troy Lee, Miklos Santha and Juris Smotrovs (arXiv) and **Towards Better Separation between Deterministic and Randomized Query Complexity** by Sagnik Mukhopadhyay Swagato Sanyal (arXiv) it is proved that the Randomized Query Complexity and the Deterministic Query Complexity are quadratically related and that this bound is tight up to logarithmic factors. Very soon after in **A Super-Grover Separation Between Randomized and Quantum Query Complexities** by Shalev Ben-David (arXiv) it was proved that the separation between the Quantum Query Complexity and Randomized query Complexity is super quadratic. With these three results our knowledge about the query complexity is slightly more clearer.

# News for May 2015

Given the outstanding performance of last month it is not surprising (though disappointing) that this month performance has not matched up to last months performance. We have only three paper on property testing this month. Please do let us know if we have missed anything. Hopefully next month will be more exciting.

**A simpler sublinear algorithm for approximating the triangle count** by C. Seshadhri (arXiv). Approximating number of triangles in a graph is a very important problem that also has applications in real life. Using a model (that is possibly more relevant for real life application), in which the queries are more powerful than those allowed in sparse graph and dense graph model, Eden, Levi and Ron gave a sublinear algorithm for approximating the number of triangles in an undirected graph. Seshadri has given an alternate algorithm with similar complexity but much simpler analysis.

**Using higher-order Fourier analysis over general fields** by Arnab Bhattacharyya and Abhishek Bhowmick (arXiv). The holy grail of property testing is to characterize properties that are testable. Many classes of properties have been proved to the testable. For most of results in this direction the main tool has been Fourier analysis. But classical Fourier analysis has its limitations. Recently higher-oder Fourier analysis has been used as a tool in many fields of theory. In this paper also higher-order Fourier analysis has been used to prove that a large class of property is testable.

**Streaming Property Testing of Visibly Pushdown Languages** by Nathanaël François, Frédéric Magniez, Michel de Rougemont, Olivier Serre (arXiv). The subject of streaming property testing deals with the problem of distinguishing if a stream of data satisfies a property or is “far” from satisfying the property. The goal is to minimize the amount of memory space used (number of queries made is not relevant in this model. In this paper they give a streaming property tester for Visibly Pushdown Languages using poly logarithmic space.

# News for April 2015

April was a busy time in property testing with 9 (!) papers posted online this month. So let’s jump right in.

*Testing properties of graphs*

**Approximately Counting Triangles in Sublinear Time** by Talya Eden, Amit Levi, and Dana Ron (ECCC). Can we estimate the number of triangles in a sparse graph in sublinear time? When we can only query a vertex to learn its degree or its *i-*th neighbor, then the answer is no. But this paper shows that if we can also ask whether a pair of vertices are connected by an edge, then the answer is yes!

**Testing Cluster Structure of Graphs** by Artur Czumaj, Pan Peng, and Christian Sohler (arXiv). Another fundamental problem in the analysis of graphs is that of determining if the vertices of a graph can be partitioned into a small number of good clusters. This paper shows that the appropriate formalization of this problem can be solved in sublinear-time, and that in fact \(O(\sqrt{n})\) queries suffice to test whether a graph with \(n\) vertices can be partitioned into \(k = O(1)\) clusters.

*Testing properties of distributions*

**A Survey on Distribution Testing: Your Data is Big. But is it Blue?** by Clément Canonne (ECCC). As we have seen on this blog, there has been a lot of recent development in testing properties of distributions. This latest survey provides an up-to-date introduction to this research.

**Faster Algorithms for Testing under Conditional Sampling** by Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapathi, and Ananda Theertha Suresh (arXiv). One of the recent developments in distribution testing has been the discovery that natural conditional sampling models offer a surprising amount of algorithmic power. This paper gives yet more evidence of this fact, notably showing that testing the equivalence of two distributions with support size \(n\) can be done with roughly \(O(\log \log n)\) queries—a doubly-exponential improvement over the best possible algorithm for the same problem in the standard sampling model!

**Sampling Correctors** by Clément Canonne, Themis Gouleakis, and Ronitt Rubinfeld (arXiv). This paper initiates the study of a research direction related, but not exactly equivalent to distribution testing: given that we assume that the true distribution has some structural property (like monotonicity, for example), but that the samples we draw from this distribution may contain errors, can we somehow use these noisy samples to generate “corrected” samples from the original underlying distribution?

*Locally-testable codes*

**High rate locally-correctable and locally-testable codes with sub-polynomial query complexity** by Swastik Kopparty, Or Meir, Noga Ron-Zewi, and Shubhangi Saraf (ECCC). The central question in locally-testable error-correcting codes is to determine the tradeoff (if any!) that we must have between the number of queries require to test if an input is a codeword, the rate of the code, and the distance between the codewords. This paper shows that there are codes with constant rate and distance that can also be tested and decoded with sub-polynomial query complexity \(\exp(\tilde{O}(\sqrt{\log n}))\).

**Robust testing of lifted codes with applications to low-degree testing** by Alan Guo, Elad Haramaty, and Madhu Sudan (ECCC). Typically, we require a local test to reject inputs that are far from codewords with a reasonably large probability (that is: we want the testers to have good *soundness* properties). A stronger requirement might be to want the local view of a tester on those inputs to be *far* from the local view of any codeword with good probability. Testers that satisfy this property are called *robust*, and have many useful properties. This paper shows that natural testers of low-degree polynomials and their generalizations known as lifted codes are indeed robust testers.

*General property testing results*

**On Being Far from Far and on Dual Problems in Property Testing** by Roei Tell (ECCC). If I can efficiently test that an input has property \(\Pi\), can I also efficiently test if an input is far from having the property \(\Pi\)? At first glance, it seems that both these problems (that is, the direct and *dual* version of the problem of testing \(\Pi\)) are equivalent, but this is not always the case. This paper explores this subtle question and provides initial results for dual problems in testing properties of functions, graphs, and distributions.

**Trading query complexity for sample-based testing and multi-testing scalability** by Eldar Fischer, Oded Lachish, Yadu Vasudev (arXiv). What properties of combinatorial objects can we test in sublinear time when we have only sample access to it (instead of full query access)? This paper provides a general and powerful result showing that *every* property that can be tested non-adaptively with a constant number of queries can also be tested with a sublinear number of samples.

# News for March 2015

This month’s post will liberally link to previous posts, because it turns out we have written enough in the past to motivate this month’s work. Our news this March has more monotonicity testing, non-deterministic testing, and image testing. Let me test your patience no more…

**On the Complexity of Nondeterministically Testable Hypergraph Parameters**

by Marek Karpinski and Ronald Markó (arXiv). The notion of nondeterministic graph property testing was introduced by Lovász and Vestergombi and further developed by Gishboliner and Shapira, and the authors. Our August 2014 news post explains these results, rather (ahem) beautifully, so check that out. This result generalizes nondeterministic property testing to hypergraphs, and proves the equivalence to deterministic (standard) property testing. Unlike their previous result, this does not give an explicit function for the complexity of a deterministic tester for a specified nondeterministically testable property. (We only know that it is “some” constant.) This is analogous to the original work of Lovász and Vestergombi, which was non-constructive. These non-constructive results use limit theorems for dense (hyper)graphs. This certainly leaves the open question of getting explicit complexity bounds.

**Quantum Algorithm for Monotonicity Testing on the Hypercube**by Aleksandrs Belovs and (PTReview’s very own) Eric Blais (arXiv). Much has been said about Boolean monotonicity testing in our reports, and you can refresh everything by checking out the open problem for Feb 2015. This result gives a \(O(n^{1/4}\epsilon^{-1/2})\) in the quantum testing model (check Ashley Montanaro’s survey on PTReview). Standard quantum amplification of the classic edge tester and the recent Khot-Minzer-Safra monotonicity tester yield \(O(\sqrt{n/\epsilon})\) and \((n^{1/4}/\epsilon)\) testers respectively. The point of this result is to get the best dependence on both \(n\) and \(\epsilon\). For \(\epsilon = 1/\sqrt{n}\), the new quantum algorithm gives a cubic speedup over existing classical testers. From a quantum computing perspective, there are few problems with polynomial super-quadratic speedup, and Boolean monotonicity testing now enters that list.

**Constant-Time Testing and Learning of Image Properties**by Piotr Berman, Meiram Murzabulatov, and Sofya Raskhodnikova (arXiv). The notion of testing image properties was first introduced by Raskhodnikova and further developed by Tsur and Ron. There have been some really interesting results on applying these methods in practice for computer vision. This paper studies different query models for the property tester. Typically, the input is a grid of (black and white) pixels, and we wish to determine if the black region is connected, convex, etc. Previous testers were allowed to query any pixel of choice and could be adaptive. This paper focuses on the restricted sample based models, where the tester only sees uniform random pixels, and variants thereof. Interestingly, this limited power suffices to test and even tolerant test the properties of being a half-plane, connectedness, and convexity. There are numerous results for the different models and tolerant vs standard testers. Having richer models would likely have more impact in practice, where query models may not be feasible. Maybe this result will lead to more work in computer vision!

# News for February 2015

The following three very interesting papers came out in the month of February.

**A polynomial regularity lemma for semi-algebraic hypergraphs and its applications in geometry and property testing **by Jacob Fox, Janos Pach and Andrew Suk (arXiv). Testability of graph or hyper-graph properties is all about regularity. Alon and Shapira used regularity lemma for graphs to prove that every hereditary property of graphs is testable. Generalization of this result for hyper-graphs has been obtain recently. The upper bound on the query complexity for testing hereditary properties for graphs (or hyper-graphs) is a function that is independent of the size of the input but dependent on the promise parameter. The query complexity as a function of the promise parameter, ε, is the same as the function that appears in the regularity lemma (where ε is the approximation parameter), which is usually a tower of ε. In this paper it has been shown that for special hereditary graph properties (and for other generalized objects) the constant as a function of the approximation parameter is polynomial. Thus for these special classes of hereditary properties (for example H-freeness, where H is k-uniform) the query complexity for testing is a polynomial in the promise parameter. This is an important step towards understanding the dependence of the query complexity for testing hereditary graph (and hyper-graph) properties on the promise parameter.

**Proofs of Proximity for Context-Free Languages and Read-Once Branching Programs **by Oded Goldreich, Tom Gur and Ron Rothblum (ECCC). In a recent paper of Gur and Rothblum, they studied the property testing model where the tester/verifier is given free access to a short proof in addition to the query access to the input. A related model called, interactive proof of proximity, by proposed by Rothblum, Vadhan and Wigderson where the verifier is allowed to interact with an all powerful prover. There is a tradeoff between the size of the short proof or the amount of interaction and the amount of queries made to the input. In both these models the verifier is more powerful than in the traditional property testing setting So for properties that are not testable in the standard property testing setting, understanding the query complexity in the stronger models is a natural question. In this paper context-free languages and small read-once branching programs has been studied.

**Sample Complexity for Winner Prediction in Elections **by Arnab Bhattacharyya and Palash Dey (arXiv). Number of random samples necessary to predict the winner in an election, where there are a number of contestants, is not only an interesting theoretical problem but also has lots of real life applications. Different voting rules may have different complexity in terms of number of samples required. In this paper a number of different standard voting rules have been considered and bounds for number of samples required has been presented. It is possibly the first time sampling has been analyzed in the context of social choice theory. This is indeed a area where a lot more theoretical works are expected to follow.