# News for September 2016

September has just concluded, and with it the rise of Fall in property testing: no less than 5 papers* this month.

Removal Lemmas for Matrices, by Noga Alon and Omri Ben-Eliezer (arXiv). The graph removal lemma, and its many variants and generalizations, is a cornerstone in graph property testing (among others), and is a fundamental ingredient in many graph testing results. In its simplest form, it states that  for any fixed $$h$$-vertex pattern $$H$$ and $$\varepsilon > 0$$,  there is some constant $$\delta=\delta(\varepsilon)$$ such that any graph $$G$$ on $$n$$ vertices containing at most $$\delta n^h$$ copies of $$H$$ can be “fixed” (made $$H$$-free) by removing at most $$\varepsilon n^2$$ edges.
The authors introduce and establish several analogues of this result, in the context of matrices. These matrix removal lemmas have direct implications in property testing (discussed in Section 1.2); for instance, they imply constant-query testing of $$F$$-freeness of (binary) matrices, where $$F$$ is any family of binary matrices closed under row permutations.

Improving and extending the testing of distributions for shape-restricted properties, by Eldar Fischer, Oded Lachish, and Yadu Vasudev (arXiv). Testing membership to a class (family) is a very natural question in distribution testing, and one which has received significant attention lately. In this paper, the authors build on, improve, and generalize the techniques of Canonne, Diakonikolas, Gouleakis, and Rubinfeld (2016) to obtain a generic, one-size-fits-all algorithm for this question. They also consider the same question in the “conditional oracle” setting, for which their ideas yield significantly more sample-efficient algorithms. At the core of their approach is a better and simpler learning procedure for families of distributions that enjoy some decomposability property.

The Bradley―Terry condition is $$L_1$$–testable, by Agelos Georgakopoulos and  Konstantinos Tyros (arXiv). An incursion in probability models: say that the directed weighted graph $$((V,E),w)$$ on $$n$$ vertices, with $$w\colon E_n\to[0,1]$$ such that $$w(x,y)=w(y,x)$$ for all $$(x,y)\in V^2$$, satisfies the Bradley—Terry condition if there exists an assignment of probabilities $$(a_x)_{x\in V}$$ such that

$$\qquad\displaystyle \frac{w(x,y)}{w(y,x)} = \frac{a_x}{a_y}$$

for all $$(x,y)\in V^2$$. This condition captures whether such a graph corresponds to a Bradley—Terry tournament.
This paper considers this characterization from a property testing point of view: that is, it addresses the task of testing, in the $$L_1$$-sense of Berman, Raskhodnikova, and Yaroslavtsev (2012), if a $$((V,E),w)$$ satisfies the Bradley—Terry condition (or is far from any such graph).

On SZK and PP, by Adam Bouland, Lijie Chen, Dhiraj Holden, Justin Thaler, and Prashant Nalini Vasudevan (arXiv, ECCC). Another (unexpected) connection! This work proves query and communication complexity separations between two complexity classes, $$\textsf{NISZK}$$ (non-interactive statistical zero knowledge) and $$\textsf{UPP}$$ (a generalization of $$\textsf{PP}$$ with unbounded coin flips, and thus arbitrary gap in the error probability). While I probably cannot even begin to list all that goes over my head in this paper, the authors show in Section 2.3.2 some implications of their results for property testing, for instance in distribution testing. In more detail, their results yield some lower bounds on the sample complexity of (tolerant) property testers with success probability arbitrarily close to $$1/2$$ (as opposed to the “usual” setting of $$1/2+\Omega(1)$$).

Quantum conditional query complexity, by Imdad S. B. Sardharwalla, Sergii Strelchuk, and Richard Jozsa (arXiv). Finally, our last paper of the list deals with both quantum algorithms and distribution testing, by introducing a new distribution testing model in a quantum setting, the quantum conditional oracle. This model, the quantum analogue of the conditional query oracle of Chakraborty et al. and Canonne et al., allows the (quantum) testing algorithm to get samples from the probability distribution to test, conditioning on chosen subsets of the domain.
In this setting, they obtain speedups over both (i)  quantum “standard” oracle and (ii)  “classical” conditional query testing for many distribution testing problems; and also consider applications to quantum testing of Boolean functions, namely balancedness.

* As usual, in case we missed or misrepresented any, please signal it in the comments.

# News for August 2016

This summer is a very exciting one for property testing, with seven (!) papers uploaded during the month of August. In particular, there have been a number of results on testing unateness.

An $$\tilde O(n)$$ Queries Adaptive Tester for Unateness, by Subhash Khot and Igor Shinkar (arXivECCC). A Boolean function is unate if, for each $$i \in [n]$$, it is monotone non-increasing or monotone non-decreasing in coordinate $$i$$. This is a generalization of monotonicity, one of the most studied problems in the property testing literature. This paper gives an adaptive algorithm for testing if a Boolean function is unate. The algorithm requires $$\tilde O(n)$$ queries, improving on the $$O(n^{1.5})$$ query tester of Goldreich et al. from 2000.

A $$\tilde O(n)$$ Non-Adaptive Tester for Unateness, by Deeparnab Chakrabarty and C. Seshadhri (arXivECCC). The unateness testing continues! This paper also gives a $$\tilde O(n)$$ algorithm for this problem, but improves upon the previous work by making non-adaptive queries. The algorithm and analysis are very clean and elegant — with a theorem by Chakrabarty et al. in place, they take up just a single page!

Testing Unateness of Real-Valued Functions, by Roksana Baleshzar, Meiram Murzabulatov, Ramesh Krishnan S. Pallavoor, and Sofya Raskhodnikova (arXiv). The final paper this month on testing unateness, this work studies real-valued functions on the hypergrid $$[n]^d$$ (as opposed to Boolean functions on the hypercube). They give an adaptive algorithm requiring $$O\left(\frac{d \log (\max (d,n))}{\varepsilon}\right)$$ queries, generalizing the above result by Khot and Shinkar, which works only for Boolean functions on the hypercube and requires $$O\left(\frac{d \log d}{\varepsilon}\right)$$ queries. Additionally, they prove lower bounds for this setting (of $$\Omega(\min(d, |R|^2))$$, where $$R$$ is the range of the function) and the Boolean hypercube setting (of $$\Omega(\sqrt{d}/\varepsilon)$$). The latter lower bound leaves open a tantalizing question: does a $$O(\sqrt{d})$$ query algorithm exist? In other words, is testing unateness no harder than testing monotonicity?

Testing k-Monotonicity, by Clément L. Canonne, Elena Grigorescu, Siyao Guo, Akash Kumar, Karl Wimmer (arXivECCC). And now, a different generalization of monotonicity! This paper studies testing $$k$$-monotonicity, where a Boolean function over a poset $$\mathcal{D}$$ is $$k$$-monotone if it alternates between the values $$0$$ and $$1$$ at most $$k$$ times on any ascending chain in $$\mathcal{D}$$. Classical monotonicity is then $$1$$-monotone in this definition. On the hypercube domain, the authors prove a separation between testing monotonicity and testing $$k$$-monotonicity and a separation between testing and learning. They also present a tolerant test for $$k$$-monotonicity over the hypergrid $$[n]^d$$ with complexity independent of $$n$$.

The Dictionary Testing Problem, by Siddharth Barman, Arnab Bhattacharyya, and Suprovat Ghoshal (arXiv). The dictionary learning problem has enjoyed extensive study in computer science. In this problem, given a matrix $$Y$$, one wishes to construct a decomposition $$Y = AX$$ such that each column of $$X$$ is $$k$$-sparse. In the settings typically studied, such a decomposition is guaranteed to exist. This paper initiates the dictionary testing problem, in which one wishes to test whether $$Y$$ admits such a decomposition, or is far from it. The authors provide a very elegant characterization — it admits a decomposition iff the columns of $$Y$$ have a sufficiently narrow Gaussian width.

The Sparse Awakens: Streaming Algorithms for Matching Size Estimation in Sparse Graphs, by Graham Cormode, Hossein Jowhari, Morteza Monemizadeh, and S. Muthukrishnan (arXiv). This paper provides streaming algorithms for estimating the size of the maximum matching in graphs with some underlying sparsity. They give a one-pass algorithm for insert-only and dynamic streams with bounded arboricity, and a two-pass algorithm for random order streams with bounded degree. Interestingly, for the latter result, they use local exploration techniques, which have seen previous use in the property testing literature.

Testing Assignments to Constraint Satisfaction Problems, by Hubie Chen, Matt Valeriote, and Yuichi Yoshida (arXiv). Given a CSP from a finite relational structure $$A$$ and query access to an assignment, is the CSP satisfied? This paper shows a dichotomy theorem which characterizes which $$A$$ admit a constant-query test. They go on to show a trichotomy theorem for when instances may include existentially quantified variables, classifying which $$A$$ admit a constant-query test, which admit only a sublinear-query test, and which require a linear number of queries.

Minimizing Quadratic Functions in Constant Time, by Kohei Hayashi and Yuichi Yoshida (arXiv). This paper investigates the problem of (approximately) minimizing quadratic functions in high-dimensions. Prior approaches to this problem scale poorly with the dimension — at least linearly, if not worse. This work uses a sampling-based method to avoid paying this cost: the resulting time complexity is completely independent of the dimension!

# Distribution Testing: a short non-survey

The past years have witnessed a great deal of activity in the field of distribution testing— so much, in fact, that it has become a bit of a challenge to even keep track of what’s happening and what has already happened. While this blog post is not going to be a comprehensive (comprehensible, maybe?) survey or summary, it hopefully will at least shed some light on what distribution testing is — and where to look at if interested in it.

Obligatory XKCD comic.

Recall that in the “vanilla” property testing setting, one is provided with access (typically query access) to an object $$\mathcal{O}$$ from some universe $$\Omega$$ (typically the space of Boolean functions over $$\{0,1\}^n$$ or the set of $$n$$-vertex graphs), a distance measure $$\textrm{d}(\cdot,\cdot)$$ between such objects (usually the Hamming distance), a property $$\mathcal{P}\subseteq \Omega$$ (the subset of “good objects”) and a parameter $$\varepsilon \in (0,1]$$. The goal is to design a randomized algorithm that (i) accepts with high probability if $$\mathcal{O}\in\mathcal{P}$$; and (ii) rejects with high probability if $$d(\mathcal{O},\mathcal{O}’)>\varepsilon$$ for every single object $$\mathcal{O}’\in\mathcal{P}$$.

The distribution testing setting is very much like this one, with of course crucial twists. The object is now a probability distribution $$D$$ over a (discrete) set, typically $$[n]=\{1,\dots,n\}$$; the access model is usually a sample oracle, providing independent samples from $$D$$ on demand; the distance is the total variation distance between distributions (or, equivalently, the $$\ell_1$$ norm between the probability mass functions).

$$\textrm{d}_{\rm TV}(D,D’) = \sup_{S\subseteq [n]} \left( D(S) – D'(S) \right) = \frac{1}{2} \sum_{i\in[n]} \left|D(i)-D'(i) \right|$$

This brings in some quite important distinctions: for instance, randomness is now inherent: even taking a number of samples beyond any reasonable bound will not bring the probability of failure of the algorithm to zero, this time.

## A Brief History

The field made its debut in TCS with a work of Goldreich and Ron [5], who considered the question of testing if an expander graph had mixed — that is, if the distribution induced on its nodes was uniform. However, it is only a tad later, in a work of Batu, Fortnow, Rubinfeld, Smith, and White [6] that the setting was formally defined and studied, for the specific problem of testing whether two unknown random variables have the same distribution (closeness, or equivalence testing).

This started a line of work, which analyzed the sample complexity of testing uniformity (is my random variable uniformly distributed?), identity (is it distributed according to this nice distribution I fully know in advance?), independence (are my random variables independent of each other?), monotonicity (is the probability density function increasing?), and many more. Over the course of the past 16 years, many breakthroughs, new techniques, new questions, and new answers to old problems were made and found. With surprising twists (if your domain size is n, then surely $$\Theta(n)$$ samples are enough and necessary to learn an arbitrary distribution; but who would have thought that $$\Theta(n/\log n)$$ was the right answer to anything? [7]). And game changers (general theorems that apply to many questions or blackbox lemmas are really nice to have in one’s toolbox).

And that’s barely the tip of the iceberg! For more in-depth or better written prose, the interested Internet wanderer may want to consult one of the following (usual non-exhaustivity caveats apply):

• Ronitt Rubinfeld’s introduction [1]
• Dana Ron’s survey on Property testing [2] (Section 11.4)
• Oded Goldreich’s upcoming book [3] (Chapter 11)
• or my own survey [4] on distribution testing (but I cannot promise better written prose for this one).

For the more video-inclined, there are also quite a few survey talks available online, the most recent I know of being this talk by Ronitt Rubinfeld at COLT’16.

## A Brief Comparison

Note: this section is taken verbatim from my survey [4] (Section 1.2).

It is natural to wonder how the above approach to distribution testing compares to classic methods and formulations, as studied in Statistics. While the following will not be a thorough comparison, it may help shed some light on the difference.

• Null and alternative hypotheses. The standard take on hypothesis testing, simple hypothesis testing, relies on defining two classes of distributions, the null and alternative hypotheses $$\mathcal{H}_0$$ and $$\mathcal{H}_1$$. A test statistic is then tailored specifically to these two classes, in order to optimally distinguish between $$\mathcal{H}_0$$ and $$\mathcal{H}_1$$— that is, under the underlying assumption that the unknown distribution $$D\in\mathcal{H}_0\cup\mathcal{H}_1$$.

Something like that.

The test then rejects the null hypothesis $$\mathcal{H}_0$$ if statistical evidence is obtained that $$D\notin\mathcal{H}_0$$. In this view, the distribution testing formulation would be to set $$\mathcal{H}_0$$ to be the property $$\mathcal{P}$$ to be tested, and define the alternative hypothesis as “everything far from $$\mathcal{P}$$.” In this sense, the latter captures a much more adversarial setting, where almost no structure is assumed on the alternative hypothesis— setting known in Statistics as composite hypothesis testing.
• Small-sample regime. A second and fundamental difference resides in the emphasis given to the question. Traditionally, statisticians tend to focus on asymptotic analysis, characterizing— often exactly— the rate of convergence of the statistical tests under the alternative hypothesis, as the number of samples $$m$$ grows to infinity. Specifically, the goal is to pinpoint the error exponent $$\rho$$ such that the probability of error (failing to reject the null hypothesis) asymptotically decays as $$e^{-\rho m}$$. However, this asymptotic behavior will generally only hold for values of $$m$$ greater than the size of the domain (“alphabet”). In contrast, the computer science focus is on the small-sample regime, where the number of samples available is small with regard to the domain size, and one aims at achieving a fixed probability of error.
• Algorithmic flavor. At a more practical level, a third point on which the two approaches deviate is the set of techniques used in order to tackle the question. Namely, the Statistics literature very often relies on relatively simple-looking and “natural” tests and estimators, which need not be computationally efficient. (This is for instance the case for the generalized likelihood ratio test that requires to compute the maximum likelihood of the sequence of samples obtained under the two hypotheses $$\mathcal{H}_0$$ and $$\mathcal{H}_1$$; which is not in general tractable.) On the other hand, works in distribution testing are predominantly algorithmic, with a computational emphasis on the testing algorithms thus obtained.

## Exciting times!

And while the above has hinted at all that has been done so far in the area, there is still plenty ahead — many new questions to be posed, answered, revisited under another light, tackled with new insights and techniques, and connections to be made. As a small sample (sic), we list below a few of the recent developments or trends that herald these exciting times.

### New tools

•  Appeared in the 1800’s thanks to Pearson, testers based on (some tailored variant of) a $$\chi^2$$-based statistic are now aplenty. Yielding simple, concise, very often optimal testing algorithms… [8], [9], [10], [11]
•  Information theory. Techniques and results from this related area have started to permeate distribution testing— mostly to prove lower bounds in an easier or conceptually simpler way. [12], [13,14]
• They were there from the beginning. Now, the $$\ell_2$$-testing subroutines strike back. (For neat, elegant testing algorithms that use $$\ell_2$$ as a proxy “in the right way.”) [5,6,15][14]

### New models

What if modeling the access to the data as a source of i.i.d. samples was not enough? Because we can— the situation allows it— or because we want— what we are trying to achieve is best modeled a bit differently? If we get more ways to query it, can we do more, and by how much?

This recent line of work [20,21,22,23,…] offers many challenges, ranging from the right way to ask (what to model, and how?) to the right way to answer (intuition if often off the hook, in these uncharted waters?). New tools to design, intuition to build, and problems to consider…

### New questions

And of course, even if many of the “classic” problems have by now been fully understood, there are many more that await… from class testing [16,11,17,18] to twists on old questions (uneven sample size for closeness?) to new questions entirely [19], there is no shortage. Not to forget there is a stronger connection with practice to build, and asking the “real-worlders” what makes sense to consider is also unlikely to stop producing challenges.

[1] Ronitt Rubinfeld. Taming Big Probability DistributionsXRDS, 19(1):24-28, 2012. http://xrds.acm.org/article.cfm?aid=2331052
[2] Dana Ron. Algorithmic and Analysis Techniques in Property Testing. Foundations and Trends in Theoretical Computer Science: Vol. 5: No. 2, pp 73-205. 2010. http://dx.doi.org/10.1561/0400000029
[3] Oded Goldreich. Introduction to Property Testing. (In preparation) http://www.wisdom.weizmann.ac.il/~oded/pt-intro.html
[4] Clément Canonne. A Survey on Distribution Testing: Your Data is Big. But is it Blue? 2015. http://eccc.hpi-web.de/report/2015/063/
[5] Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. Electronic Colloquium on Computational Complexity (ECCC), 7:20, 2000.
[6] Tuğkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing that distributions are close. In Proceedings of FOCS, pages 189-197, 2000.
[7] Gregory Valiant and Paul Valiant. The power of linear estimators. In Proceedings of FOCS, pages 403-412, October 2011.
[8] Siu-On Chan, Ilias Diakonikolas, Gregory Valiant, and Paul Valiant. Optimal algorithms for testing closeness of discrete distributions. In Proceedings of SODA, pages 1193-1203. Society for Industrial and Applied Mathematics (SIAM), 2014.
[9] Gregory Valiant and Paul Valiant. An automatic inequality prover and instance optimal identity testing. In Proceedings of FOCS, 2014.
[10] Ilias Diakonikolas, Daniel M. Kane, and Vladimir Nikishkin. Testing Identity of Structured Distributions. In Proceedings of SODA, pages 1841-1854. Society for Industrial and Applied Mathematics (SIAM), 2015.
[11] Jayadev Acharya, Constantinos Daskalakis, Gautam Kamath. Optimal Testing for Properties of Distributions. In Advances in Neural Information Processing Systems (NIPS), 2015.
[12] Yihong Wu and Pengkun Yang. Minimax rates of entropy estimation on large alphabets via best polynomial approximation. ArXiV, abs/1407.0381, 2014.
[13] Yihong Wu and Pengkun Yang. Chebyshev polynomials, moment matching, and optimal estimation of the unseen. ArXiV, abs/1504.01227, 2015.
[14] Ilias Diakonikolas and Daniel Kane. A New Approach for Testing Properties of Discrete Distributions. In Proceedings of FOCS, 2016. (To appear. Also available as abs/1601.05557)
[15] Tuğkan Batu, Eldar Fischer, Lance Fortnow, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. InProceedings of FOCS, pages 442-451, 2001.
[16] Tuğkan Batu, Ravi Kumar, and Ronitt Rubinfeld. Sublinear algorithms for testing monotone and unimodal distributions. In Proceedings of STOC, pages 381-390, New York, NY, USA, 2004. ACM.
[17] Clément L. Canonne, Ilias Diakonikolas, Themis Gouleakis, and Ronitt Rubinfeld. Testing Shape Restrictions of Discrete Distributions, In Proceedings of STACS, pages 25:1–25:14, 2016.
[18] MIT Theory Blog. Distribution Testing: Do It With Class! https://mittheory.wordpress.com/2015/08/09/distribution-testing-do-it-with-class/
[19] Maryam Aliakbarpour, Eric Blais, Ronitt Rubinfeld. Learning and Testing Junta Distributions. In Proceedings of COLT, 2016. pp. 19–46, 2016
[20] Reut Levi, Dana Ron, and Ronitt Rubinfeld. Testing properties of collections of distributions. Theory of Computing, 9:295-347, 2013.
[21] Sourav Chakraborty, Eldar Fischer, Yonatan Goldhirsh, and Arie Matsliah. On the power of conditional samples in distribution testing. In Proceedings of ITCS, pages 561-580, New York, NY, USA, 2013. ACM.
[22] Clément L. Canonne, Dana Ron, and Rocco A. Servedio. Testing probability distributions using conditional samples. SIAM Journal on Computing, 2015.
[23] Clément L. Canonne and Ronitt Rubinfeld. Testing probability distributions underlying aggregated data. In Proceedings of ICALP, pages 283-295, 2014.

# News for July 2016

After a few slow months, property testing is back with a bang! This month we have a wide range on results ranging from classic problems like junta testing to new models of property testing.

Tolerant Junta Testing and the Connection to Submodular Optimization and Function Isomorphism, by Eric Blais, Clément L. Canonne, Talya Eden, Amit Levi, and Dana Ron (arXiv). The problem of junta testing requires little introduction. Given a boolean function $$f:\{-1,+1\}^n \mapsto \{-1,+1\}$$, a $$k$$-junta only depends on $$k$$ of the input variables. A classic problem has been that of property testing $$k$$-juntas, and a rich line of work (almost) resolves the complexity to be $$\Theta(k/\epsilon)$$. But what of tolerant testing, where we want to tester to accept functions close to being a junta? Previous work has shown that existing testers are tolerant, but with extremely weak parameters. This paper proves: there is a $$poly(k/\epsilon)$$-query tester that accepts every function $$\epsilon/16$$-close to a $$k$$-junta and rejects every function $$\epsilon$$-far from being a $$4k$$-junta. Note that the “gap” between the junta classes is a factor of 4, and this is the first result to achieve a constant gap. The paper also gives testers when we wish to reject functions far from being a $$k$$-junta (exactly matching the definition of tolerant testng), but the tester has an exponential dependence on $$k$$. The results have intriguing connections with isomorphism testing, and there is a neat use of constrained submodular minimization.

Testing Pattern-Freeness, by Simon Korman and Daniel Reichman (arXiv). Consider a string $$I$$ of length $$n$$ and a “pattern” $$J$$ of length $$k$$, both over some alphabet $$\Sigma$$. Our aim is to property test if $$I$$ is $$J$$-free, meaning that $$I$$ does not contains $$J$$ as a substring. This problem can also be generalized to the 2-dimensional setting, where $$I$$ and $$J$$ are matrices with entries in $$\Sigma$$. This formulation is relevant in image testing, where we need to search for a template pattern in a large image. The main results show that these testing problems can be solved in time $$O(1/\epsilon)$$, with an intriguing caveat. If the pattern $$J$$ has at least $$3$$ distinct symbols, the result holds. If $$J$$ is truly binary, then $$J$$ is not allowed to be in a specified set of forbidden patterns. The main tool is a modification lemma that shows how to “kill” a specified occurrence of $$J$$ without introducing a new one. This lemma is not true for the forbidden patterns, resulting in the dichotomy of the results.

Erasure-Resilient Property Testing, by Kashyap Dixit, Sofya Raskhodnikova, Abhradeep Thakurta, and Nithin Varma (arXiv). Property testing begins with a query model for $$f: \mathcal{D} \mapsto \mathcal{R}$$, so we can access any $$f(x)$$. But what if an adversary corrupted some fraction of the input? Consider monotonicity testing on the line, so $$f: [n] \mapsto \mathbb{R}$$. We wish to test if $$f$$ is $$\epsilon$$-far from being monotone, but the adversary corrupts/hides an $$\alpha$$-fraction of the values. When we query such a position, we receive a null value. We wish to accept if there exists some “filling” of the corrupted values that makes $$f$$ monotone, and reject if all “fillings” keep $$f$$ far from monotone. It turns out the standard set of property testers are not erasure resilient, and fail on this problem. This papers gives erasure resilient testers for properties like monotonicity, Lipschitz continuity, and convexity. The heart of the techniques involves a randomized binary tree tester for the line that can avoid the corrupted points.

Partial Sublinear Time Approximation and Inapproximation for Maximum Coverage, by Bin Fu (arXiv). Consider the classic maximum coverage problem. We have $$m$$ sets $$A_1, A_2, \ldots, A_m$$ and wish to pick the $$k$$ of them with the largest union size. The query model allows for membership queries, cardinality queries, and generation of random elements from a set. Note that the size of the input can be thought of as $$\sum_i |A_i|$$. Can one approximate this problem without reading the full input? This paper gives a $$(1-1/e)$$-factor approximation that runs in time $$poly(k)m\log m$$, and is thus sublinear in the input size. The algorithm essentially implements a greedy approximation algorithm in sublinear time. It is shown the the linear dependence on $$m$$ is necessary: there is no constant factor approximation that runs in $$q(n) m^{1-\delta}$$ (where $$n$$ denotes the maximum cardinality, $$q(\cdot)$$ is an arbitrary function, and $$\delta > 0$$).

Local Testing for Membership in Lattices, by Karthekeyan Chandrasekaran, Mahdi Cheraghchi, Venkata Gandikota, and Elena Grigorescu (arXiv). Inspired by the theory of locally testable codes, this paper introduces local testing in lattices. Given a set of basis vectors $$b_1, b_2, \ldots$$ in $$\mathbb{Z}^n$$, the lattice $$L$$ is the set of all integer linear combinations of the basic vectors. This is the natural analogue of linear error correcting codes (where everything is done over finite fields). Given some input $$t \in \mathbb{Z}^n$$, we wish to determine if $$t \in L$$, or is far (defined through some norm) from all vectors in $$L$$, by querying a constant number of coordinates in $$t$$. We assume that $$L$$ is fixed, so it can be preprocessed arbitrarily. This opens up a rich source of questions, and this work might be seen as only the first step in this direction. The papers shows a number of results. Firstly, there is a family of “code formula” lattices for which testers exists (with almost matching lower bounds). Furthermore, with high probability over random lattices, testers do not exist. Analogous to testing codes, if a constant query lattice tester exists, then there exists a canonical constant query tester.

# News for June 2016

This month isn’t the most exciting for property testing. Just one paper to report (thanks to Clément for catching!), of which only a minor part is actually related to property testing.

Approximating the Spectral Sums of Large-scale Matrices using Chebyshev Approximation, by Insu Han, Dmitry Malioutov, Haim Avron, and Jinwoo Shin (arXiv). The main results of the paper deal with approximating the trace of matrix functions. Consider symmetric matrix $$M \in \mathbb{R}^{d\times d}$$, with eigenvalues $$\lambda_1, \lambda_2, \ldots, \lambda_d$$. The paper gives new algorithms to approximate $$\sum_i f(\lambda_i)$$. Each “operation” of the algorithm is a matrix-vector product, and the aim is to minimize the number of such operations. The property testing application is as follows. Suppose we wish to test if $$M$$ is positive-definite (all $$\lambda_i > 0$$). We consider a matrix $$\epsilon$$-far from being positive-definite if the smallest eigenvalue is less than $$-\epsilon \|M\|_F = -\epsilon \sum_i \lambda^2_i$$. The main theorems in this paper yield a testing algorithm (under this definition of distance) that makes $$o(d)$$ matrix-vector products. While this is not exactly sublinear under a standard query access model, it is relevant when $$M$$ is not explicitly represented and the only access is through such matrix-vector product queries.

# News for May 2016

Last month witnessed two new property testing papers go online, which May be of interest to the community.

Reducing testing affine spaces to testing linearity, by Oded Goldreich (ECCC). In his recent guest post, the author outlined a reduction between testing if a Boolean function is an $$(n-k)$$-dimensional affine subspace of $$\{0,1\}^n$$, and testing linearity of functions $$f\colon\{0,1\}^n\to \{0,1\}$$. This preprint encompasses details of this reduction, as part as a general approach to testing monomials (resolving along the way one of the open problems from April). Namely, it shows that testing if $$f$$ is a $$k$$-monomial can be reduced to testing two properties, of $$f$$ and of a (related) function $$g\colon\{0,1\}^n\to \{0,1\}^k$$. Moreover, it establishes that the general case ($$k\geq 1$$) of the second part  can itself be reduced to the simpler case ($$k=1$$), giving a unified argument.

Testing Equality in Communication Graphs, by Noga Alon, Klim Efremenko, Benny Sudakov (ECCC). Given a connected graph on $$k$$ nodes, where each node has an $$n$$-bit string, how many bits of communication are needed to test if all strings are equal? This paper investigates this problem for many interesting graphs, resolving the complexity up to lower order terms. For example, if the graph is Hamiltonian, they show that $$\frac{kn}{2} + o(n)$$ bits are sufficient, while at least $$\frac{kn}{2}$$ bits are required.

Edit (06/16): updated the description of the first preprint, which was not entirely accurate.

# Call for feedback: upcoming “Introduction to Property Testing”

Forwarding an announcement by Oded Goldreich:

I would like to seek the help of this community regarding my forthcoming book Introduction to Property Testing (see http://www.wisdom.weizmann.ac.il/~oded/pt-intro.html). Specifically, I would be grateful for feedback regarding any aspect and part of the current text, although detailed comments are likely to be more helpful at this time than suggestions for drastic and/or structural changes. Please do not shy from indicating relevant works that may be mentioned (rather than described in detail), since it is quite possible that I was not aware of them or forgot to mention them (rather than chose not to include a detailed description due to various considerations).

Since I am likely to continue revising the text and posting the revisions on the website, please indicate the text-locations to which you refer in a manner that is most robust to revision (e.g., indicate relative location with respect to sectioning and/or theorem-environment numbers rather than with respect to page numbers).

# News for April 2016

April has been kind to us, providing us with a trio of new papers in sublinear algorithms. Also, in case you missed them, be sure to check out Oded Goldreich’s guest post and the associated open problem.

Sparse Fourier Transform in Any Constant Dimension with Nearly-Optimal Sample Complexity in Sublinear Time, by Michael Kapralov (ArXiv). There is a rich line of work focused on going beyond the celebrated Fast Fourier Transform algorithm by exploiting sparsity in the signal’s Fourier representation. While the one-dimensional case is very well understood, results in higher dimensions have either been lacking with respect to time complexity, sample complexity, or the approximation guarantee. This work is the first to give an algorithm which runs in sublinear time, has near-optimal sample complexity, and provides an arbitrarily accurate approximation guarantee for the multivariate case.

Sublinear Time Estimation of Degree Distribution Moments: The Arboricity Connection, by Talya Eden, Dana Ron, C. Seshadhri (ArXiv). This work revisits the problem of approximating the moments of the degree distribution of a graph. Prior work has shown that the $$s$$-th moment of the degree distribution can be computed with $$\tilde O(n^{1 – 1/(s+1)})$$ queries, which is optimal up to $$\mathrm{poly}(\log n)$$ factors. This work provides a new algorithm which requires only $$\tilde O(n^{1 – 1/s})$$ queries on graphs of bounded arboricity, while still matching previous near-optimal bounds in the worst case. Impressively, the algorithm is incredibly simple, and can be stated in just a few lines of pseudocode.

A Local Algorithm for Constructing Spanners in Minor-Free Graphs, by Reut Levi, Dana Ron, Ronitt Rubinfeld (ArXiv). This paper addresses the problem of locally constructing a sparse spanning graph. For an edge $$e$$ in a graph $$G$$, one must determine whether or not $$e$$ is in a sparse spanning graph $$G’$$ in a manner that is consistent with previous answers. In general graphs, the complexity of this problem is $$\Omega(\sqrt{n})$$, leading to the study of restricted families of graphs. In minor-free graphs (which include, for example, planar graphs), existing algorithms require $$(d/\varepsilon)^{\mathrm{poly}(h)\log(1/\varepsilon)}$$ queries and induce a stretch of $$\mathrm{poly}(d, h, 1/\varepsilon)$$, where $$d$$ is the maximum degree, $$h$$ is the size of the minor, and $$G’$$ is of size at most $$(1 + \varepsilon)n$$. The algorithm in this paper shows the complexity of this problem to be polynomial, requiring only $$\mathrm{poly}(d, h, 1/\varepsilon)$$ queries and inducing a stretch of $$\tilde O(h \log (d)/\varepsilon)$$.

# Open problem for April 2016

As open problems of the month, we state here the two questions discussed in our recent guest post by Oded Goldreich:

Open Problem 1 (obtaining a one-sided error reduction): The reduction from testing affinity of a subspace to that of testing affinity of a function has two-sided error probability. We wonder whether a one-sided error reduction of similar complexity can be found.

Note that this will yield a one-sided error tester for affinity, which we believe is not known. Actually, we would also welcome a two-sided error reduction that, when combined with the linearity tester, yields a tester of complexity $$O(1/\epsilon)$$ rather than $$\tilde{O}(1/\epsilon)$$.

Turning to the task of testing monomials, we recall that the solution of [PRS02] is based on employing self-correction to the following test that refers to the case that $$H=\{x:AX=b\}$$ is an $$(\ell-k)$$-dimensional affine space. Basically, the test selects a random $$x\in H$$ and a random $$y\in\{0,1\}^\ell$$, and checks whether it holds that $$y\in H$$ if and only if $$x\wedge y \in H$$, where $$\wedge$$ denotes the bit-by-bit product of vectors. It is painfully shown in [5] that if $$H$$ is not a translation by $$1^\ell$$ of an axis-aligned linear space, then the check fails with probability $$\Omega(2^{-k})$$. Furthermore, it is shown that for a constant fraction of the $$x$$’s (in $$H$$), the check fails on a $$\Omega(2^{-k})$$ fraction of the $$y$$’s (in $$\{0,1\}^\ell$$). This strengthening is important, since selecting $$x\in H$$ uniformly requires $$\Theta(2^k)$$ trials. Recall that proving the foregoing assertion for $$k=1$$ is quite easy (cf. [5]), which leads us to ask

Open Problem 2 (can a simpler proof be found for the case of $$k>1$$): Is there a relatively simple reduction of the foregoing claim for general $$k$$ to the special case of $$k=1$$?

[PRS02] M. Parnas, D. Ron, and A. Samorodnitsky. Testing Basic Boolean Formulae. SIAM Journal on Disc. Math. and Alg., Vol. 16 (1), pages 20–46, 2002.

# Guest Post: Oded Goldreich, “Reducing testing affine spaces to testing linearity”

[We are delighted to have this month a blog post by Oded Goldreich, on a reduction from testing affinity of subspaces to testing linearity of Boolean functions. Oded also gave us open problems directly related to this post.]

We consider the task of testing whether a Boolean function $$f\colon \{0,1\}^\ell\to\{0,1\}$$ is the indicator function of an $$(\ell-k)$$-dimensional space. An optimal tester for this property was presented by Parnas, Ron, and Samorodnitsky (SIDMA, 2002), by mimicking the celebrated linearity tester (of Blum, Luby and Rubinfeld, JCSS, 1993) and its analysis. We show that this task can be reduced to testing the linearity of a related function $$g\colon\{0,1\}^\ell\to\{0,1\}^k$$, yielding an almost optimal tester.