# News for October 2018

As October draws to a close, we are left with four new papers this month.

Testing Matrix Rank, Optimally, by Maria-Florina Balcan, Yi Li, David P. Woodruff, Hongyang Zhang (arXiv). This work investigates the problem of non-adaptively testing matrix properties, in both the standard query model and the more general sensing model, in which the algorithm may query the component-wise inner product of the matrix with “sensing” matrices. It proves tight upper and lower bounds of $$\tilde \Theta(d^2/\varepsilon)$$ for the query model, and eliminating the dependence on $$\varepsilon$$ in the sensing model. Furthermore, they introduce a bounded entry model for testing of matrices, in which the entries have absolute value bounded by 1, in which they prove various bounds for testing stable rank, Schatten-$$p$$ norms, and SVD entropy.

Testing Halfspaces over Rotation-Invariant Distributions, by Nathaniel Harms (arXiv). This paper studies the problem of testing from samples whether an unknown boolean function over the hypercube is a halfspace. The algorithm requires $$\tilde O(\sqrt{n}/\varepsilon^{7})$$ random samples (which has a dependence on $$n$$ which is tight up to logarithmic factors) and works for any rotation-invariant distribution, generalizing previous works that require the distribution be Gaussian or uniform.

Testing Graphs in Vertex-Distribution-Free Models, by Oded Goldreich (ECCC). While distribution-free testing has been well-studied in the context of Boolean functions, it has not been significantly studied in the context of testing graphs. In this context, distribution-free roughly means that the algorithm can sample nodes of the graph according to some unknown distribution $$D$$, and must be accurate with respect to the measure assigned to nodes by the same distribution. The paper investigates various properties which may be tested with a size-independent number of queries, including relationships with the complexity of testing in the standard model.

A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice, by Hendrik Fichtenberger and Dennis Rohde (arXiv). In the $$k$$-nearest neighbor problem, we are given a set of points $$P$$, and the answer to a query $$q$$ is the set of the $$k$$ points in $$P$$ which are closest to $$q$$. This paper considers the following property testing formulation of the problem: given a set of points $$P$$ and a graph $$G = (P,E)$$, is each point $$p \in P$$ connected to its $$k$$-nearest neighbors, or is it far from being a $$k$$NN graph? The authors prove upper and lower bounds on the complexity of this problem, which are both sublinear in the number of points $$n$$.

# News for August 2018

Three papers this month close out Summer 2018.

Test without Trust: Optimal Locally Private Distribution Testing, by Jayadev Acharya, Clément L. Canonne, Cody Freitag, and Himanshu Tyagi (arXiv). This work studies distribution testing in the local privacy model. While private distribution testing has recently been studied, requiring that the algorithm’s output is differentially private with respect to the input dataset, local privacy has this requirement for each individual datapoint. The authors prove optimal upper and lower bounds for identity and independence testing, using a novel public-coin protocol named RAPTOR which can outperform any private-key protocol.

Testing Graph Clusterability: Algorithms and Lower Bounds, by Ashish Chiplunkar, Michael Kapralov, Sanjeev Khanna, Aida Mousavifar, and Yuval Peres (arXiv). This paper studies the problem of testing whether a graph is $$k$$-clusterable (based on the conductance of each cluster), or if it is far from all such graphs — this is a generalization of the classical problem of testing whether a graph is an expansion. It manages to solve this problem under weaker assumptions than previously considered. Technically, prior work embedded a subset of the graph into Euclidean space and clustered based on distances between vertices. This work uses richer geometric structure, including angles between the points, in order to obtain stronger results.

Near log-convexity of measured heat in (discrete) time and consequences, by Mert Saglam (ECCC). Glancing at the title, it might not be clear how this paper relates to property testing. The primary problem of study is the quantity $$m_t = uS^tv$$, where $$u, v$$ are positive unit vectors and $$S$$ is a symmetric substochastic matrix. This quantity can be viewed as a measurement of the heat measured at vector $$v$$, after letting the initial configuration of $$u$$ evolve according to $$S$$ for $$t$$ time steps. The author proves an inequality which roughly states $$m_{t+2} \geq t^{1 – \varepsilon} m_t^{1 + 2/t}$$, which can be used as a type of log-convexity statement. Surprisingly, this leads to lower bounds for the communication complexity of the $$k$$-Hamming problem, which in turns leads to optimal lower bounds for the complexity of testing $$k$$-linearity and $$k$$-juntas.

# News for May 2018

Six papers for May, including new models, hierarchy theorems, separation results, resolution of conjectures, and a lot more fun stuff. A lot of things to read this month!

Lower Bounds for Tolerant Junta and Unateness Testing via Rejection Sampling of Graphs, by Amit Levi and Erik Waingarten (ECCC). This paper proves a number of new lower bounds for tolerant testing of Boolean functions, including non-adaptive $$k$$-junta testing and adaptive and non-adaptive unateness testing. Combined with upper bounds for these and related problems, these results establishes separation between the complexity of tolerant and non-tolerant testing for natural properties of Boolean functions, which have so far been elusive. As a technical tool, the authors introduce a new model for testing graph properties, termed the rejection sampling model. In this model, the algorithm queries a subset $$L$$ of the vertices, and the oracle will sample an edge uniformly at random and output the intersection of the edge endpoints with the query set $$L$$. The cost of an algorithm is measured as the sum of the query sizes. In order to prove the above lower bounds (in the standard model), they show a non-adaptive lower bound for testing bipartiteness (in their new model).

Hierarchy Theorems for Testing Properties in Size-Oblivious Query Complexity, by Oded Goldreich (ECCC). This work proves a hierarchy theorem for properties which are independent of the size of the object, and depend only on the proximity parameter $$\varepsilon$$. Roughly, for essentially every function $$q : (0,1] \rightarrow \mathbb{N}$$, there exists a property for which the query complexity is $$\Theta(q(\varepsilon))$$. Such results are proven for Boolean functions, dense graphs, and bounded-degree graphs. This complements hierarchy theorems by Goldreich, Krivelevich, Newman, and Rozenberg, which give a hierarchy which depends on the object size.

Finding forbidden minors in sublinear time: a $$O(n^{1/2+o(1)})$$-query one-sided tester for minor closed properties on bounded degree graphs, by Akash Kumar, C. Seshadhri, and Andrew Stolman (ECCC). At the core of this paper is a sublinear algorithm for the following problem: given a graph which is $$\varepsilon$$-far from being $$H$$-minor free, find an $$H$$-minor in the graph. The authors provide a (roughly) $$O(\sqrt{n})$$ time algorithm for such a task. As a concrete example, given a graph which is far from being planar, one can efficiently find an instance of a $$K_{3,3}$$ or $$K_5$$ minor. Using the graph minor theorem, this implies analogous results for any minor-closed property, nearly resolving a conjecture of Benjamini, Schramm and Shapira.

Learning and Testing Causal Models with Interventions, by Jayadev Acharya, Arnab Bhattacharyya, Constantinos Daskalakis, and Saravanan Kandasamy (arXiv). This paper considers the problem of learning and testing on causal Bayesian networks. Bayesian networks are a type of graphical model defined on a DAG, where each node has a distribution defined based on the value of its parents. A causal Bayesian network further allows “interventions,” where one may set nodes to have certain values. This paper gives efficient algorithms for learning and testing the distribution of these models, with $$O(\log n)$$ interventions and $$\tilde O(n/\varepsilon^2)$$ samples per intervention

Property Testing of Planarity in the CONGEST model, by Reut Levi, Moti Medina, and Dana Ron (arXiv). It is known that, in the CONGEST model of distributed computation, deciding whether a graph is planar requires a linear number of rounds. This paper considers the natural property testing relaxation, where we wish to determine whether a graph is planar, or $$\varepsilon$$-far from being planar. The authors show that this relaxation allows one to bypass this linear lower bound, obtaining a $$O(\log n \cdot \mathrm{poly(1/\varepsilon))}$$ algorithm, complemented by an $$\Omega(\log n)$$ lower bound.

Flexible models for testing graph properties, by Oded Goldreich (ECCC). Usually when testing graph properties, we assume that the vertex set is $$[n]$$, implying that we can randomly sample nodes from the graph. However, this assumes that the tester knows the value of $$n$$, the number of nodes. This note suggests more “flexible” models, in which the number of nodes may be unknown, and we are only given random sampling access. While possible definitions are suggested, this note contains few results, leaving the area ripe for investigation of the power of these models.

# News for February 2018

February had a flurry of conference deadlines, and they seem to have produced six papers for us to enjoy, including three on estimating symmetric properties of distributions.

Locally Private Hypothesis Testing, by Or Sheffet (arXiv). We now have a very mature understanding of the sample complexity of distributional identity testing — given samples from a distribution $$p$$, is it equal to, or far from, some model hypothesis $$q$$? Recently,  several papers have studied this problem under the additional constraint of differential privacy. This paper strengthens the privacy constraint to local privacy, where each sample is locally noised before being provided to the testing algorithm.

Distribution-free Junta Testing, by Xi Chen, Zhengyang Liu, Rocco A. Servedio, Ying Sheng, and Jinyu Xie (arXiv). Testing whether a function is a $$k$$-junta is very well understood — when done with respect to the uniform distribution. In particular, the adaptive complexity of this problem is $$\tilde \Theta(k)$$, while the non-adaptive complexity is $$\tilde \Theta(k^{3/2})$$. This paper studies the more challenging task of distribution-free testing, where the distance between functions is measured with respect to some unknown distribution. The authors show that, while the adaptive complexity of this problem is still polynomial (at $$\tilde O(k^2)$$), the non-adaptive complexity becomes exponential: $$2^{\Omega(k/3)}$$. In other words, there’s a qualitative gap between the adaptive and non-adaptive complexity, which does not appear when testing with respect to the uniform distribution.

The Vertex Sample Complexity of Free Energy is Polynomial, by Vishesh Jain, Frederic Koehler, and Elchanan Mossel (arXiv). This paper studies the classic question of estimating (the logarithm of) the partition function of a Markov Random Field, a highly-studied topic in theoretical computer science and statistical physics. As the title suggests, the authors show that the vertex sample complexity of this quantity is polynomial. In other words, randomly subsampling a $$\mathrm{poly}(1/\varepsilon)$$-size graph and computing its free energy gives a good approximation to the free energy of the overall graph. This is in contrast to more general graph properties, for the vertex sample complexity is super-exponential in $$1/\varepsilon$$.

Entropy Rate Estimation for Markov Chains with Large State Space, by Yanjun Han, Jiantao Jiao, Chuan-Zheng Lee, Tsachy Weissman, Yihong Wu, and Tiancheng Yu (arXiv). Entropy estimation is now quite well-understood when one observes independent samples from a discrete distribution — we can get by with a barely-sublinear sample complexity, saving a logarithmic factor compared to the support size. This paper shows that these savings can also be enjoyed in the case where we observe a sample path of observations from a Markov chain.

Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance, by Yanjun Han, Jiantao Jiao, and Tsachy Weissman (arXiv). Speaking more generally of the above problem: there has been significant work into estimating symmetric properties of distributions, i.e., those which do not change when the distribution is permuted. One natural method for estimating such properties is to estimate the sorted distribution, then apply the plug-in estimator for the quantity of interest. The authors give an improved estimator for the sorted distribution, improving on the results of Valiant and Valiant.

INSPECTRE: Privately Estimating the Unseen, by Jayadev Acharya, Gautam Kamath, Ziteng Sun, and Huanyu Zhang (arXiv). One final work in this area — this paper studies the estimation of symmetric distribution properties (including entropy, support size, and support coverage), but this time while maintaining differentially privacy of the sample. By using estimators for these tasks with low sensitivity, one can additionally obtain privacy for a low or no additional cost over the non-private sample complexity.

# News for November 2017

A quiet month, with only two papers. Perhaps the calm before the storm? Please let us know in the comments if something slipped under our radar.

Agreement tests on graphs and hypergraphs, by Irit Dinur, Yuval Filmus, and Prahladh Harsha (ECCC). This work looks at agreement tests and agreement theorems, which argue that if one checks if a number of local functions agree, then there exists a global function which agrees with most of them. This work extends previous work on direct product testing to local functions of higher degree, which corresponds to agreement tests on graphs and hypergraphs.

Testing Conditional Independence of Discrete Distributions, by Clément L. Canonne, Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart (arXiv). This paper focuses on testing whether a bivariate discrete distribution has independent marginals, conditioned on the value of a tertiary discrete random variable. More precisely, given realizations of $$(X, Y, Z)$$, test if $$X \perp Y \mid Z$$. Unconditional independence testing (corresponding to the case when $$Z$$ is constant) has been extensively studied by the community, with tight upper and lower bounds showing that the sample complexity has two regimes, depending on the tradeoff between the support size and the accuracy desired. This paper shows gives upper and lower bounds for this more general problem, showing a rich landscape depending on the relative value of the parameters.

# News for August 2017

This month has been comparatively slow, with only five papers. I suppose we’re lucky that five papers is considered to be a slow month!

Local Testing and Decoding of High-Rate Error-Correcting Codes, by Swastik Kopparty and Shubhangi Saraf (ECCC). This is a survey article, covering recent results in locally testable, correctable, and decodable codes. A good place to start for any who have fallen behind on the recent literature.

Sample-Optimal Identity Testing with High Probability, by Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price (arXiv). Distribution testing results are  often stated in the regime where the probability of failure is some constant, i.e. $$\delta = 1/3$$. These can be boosted to arbitrarily high probability of success at a multiplicative cost of $$\log (1/\delta)$$ using the “median trick” — repeat the test $$\log (1/\delta)$$ times and choose the majority result. This paper shows that this method is suboptimal for identity testing with small values of $$\delta$$. In particular, they give upper and lower bounds for the entire tradeoff curve between $$n, \varepsilon$$, and $$\delta$$.

Sample Efficient Estimation and Recovery in Sparse FFT via Isolation on Average, by Michael Kapralov (arXiv). The Sparse Fourier Transform problem has seen a wealth of study in the past few years, with various tradeoffs between sample and time complexity. This paper gives the first result achieving the optimal sample complexity of $$O(k \log n)$$ while maintaining a time complexity which is sublinear in the length of the signal, $$n$$. In order to obtain this, the author introduces a new technique for analyzing noisy hashing schemes.

Generalized Uniformity Testing, by Tuğkan Batu and Clément L. Canonne (arXiv). Uniformity testing is one of the most studied problems in distribution testing, and it is well known that the complexity is $$\Theta(\sqrt{n})$$. This paper studies a slightly different question: given samples from a distribution $$p$$, is it uniform over some (unknown) subset of its domain? The authors give upper and lower bounds for this question, showing that the complexity is $$\Theta(1/\|p\|_3)$$. Interestingly, when $$p$$ is uniform over a set of size $$\Omega(n)$$, the complexity is $$\Theta(n^{2/3})$$, which is more than the $$\Theta(\sqrt{n})$$ cost of vanilla uniformity testing.

Boolean Unateness Testing with $$\tilde O(n^{3/4})$$ Adaptive Queries, by Xi Chen, Erik Waingarten, and Jinyu Xie (arXiv). At this point, we have a fairly mature understanding of testing monotonicity over the Boolean hypercube: for non-adaptive algorithms, the complexity is $$\tilde \Theta(\sqrt{n})$$. There exists a gap for adaptive algorithms — the best known lower bound is $$\tilde \Omega(n^{1/3})$$, while the best upper bound is the same as for the non-adaptive case $$\tilde O(n^{1/2})$$. However, many conjecture that adaptivity does not help for monotonicity testing. More recently, there has been study of testing unateness — a function is unate if it is monotone non-decreasing or non-increasing with respect to each coordinate. Interestingly, this work proves an adaptive upper bound of $$\tilde O(n^{2/3})$$ which beats the lower bound of $$\tilde \Omega(n)$$ for non-adaptive algorithms, thus showing that adaptivity does help for unateness testing.

# News for May 2017

May brought us a wealth of new papers, including four (!) on distributed property testing.

Distributed Property Testing for Subgraph-Freeness Revisited, by Orr Fischer, Tzlil Gonen, and Rotem Oshman (arXiv). This paper studies subgraph-freeness testing in the CONGEST model of distributed computation. This problem was previously studied — this paper provides a number of new algorithms, which either improve upon the running time of prior work or are the first non-trivial results for these problems. Some cases they study include testing for freeness of $$k$$-cycles, constant-size trees, and $$k$$-cliques. For the latter case, they prove that a dependence on $$\varepsilon$$ is necessary.

Faster and Simpler Distributed Algorithms for Testing and Correcting Graph Properties in the CONGEST-Model, by Guy Even, Reut Levi, and Moti Medina (arXiv). This paper also studies graph property testing in the CONGEST model. There appears to be some overlap in problems with the previous paper, including testing subgraph-freeness for cycles and constant size trees. As the title suggests, they also study graph property correction in the case of $$k$$-cycles, and other properties such as bipartiteness.

K-Monotonicity is Not Testable on the Hypercube, by Elena Grigorescu, Akash Kumar, and Karl Wimmer (arXiv, ECCC). Recent work introduced the concept of $$k$$-monotonicity, where a function may switch value between $$0$$ to $$1$$ at most $$k$$-times on any ascending chain. This generalizes the common notion of monotonicity, which corresponds to $$k = 1$$. Since $$1$$-monotonicity on the hypercube is testable in $$O(\sqrt{n})$$ queries, it was conjectured that $$k$$-monotonicity may be tested in $$poly(n^k)$$ queries. This paper disproves this conjecture, showing that even $$2$$-monotonicity requires a number of queries which is exponential in $$\sqrt{n}$$.

The coin problem for product tests, by Chin Ho Lee and Emanuele Viola (ECCC). The coin problem asks, if $$f$$ is a product of $$k$$ functions on disjoint inputs of length $$n$$ bits, what is the smallest $$\varepsilon$$ such that $$f$$ can distinguish between an input of $$m = nk$$ fair bits versus an input of $$m$$ $$\varepsilon$$-biased bits. This paper proves tight bounds in a few cases of interest — when the range of $$f$$ is $$\{0, 1\}$$ or $$\{\pm 1\}$$, the answer is $$\Theta(1/\sqrt{n\log k})$$, while if the range is the set of unit-norm complex numbers, the answer is $$\Theta(1/\sqrt{nk})$$.

Distributed Testing of Conductance, by Hendrik Fichtenberger and Yadu Vasudev (arXiv). Another paper on testing in the CONGEST model — this one studies the problem of testing conductance. In particular, they give an $$O(\log n)$$ round two-sided tester which distinguishes between graphs with conductance at least $$\Phi$$ and graphs which are $$\varepsilon$$-far from having conductance at least $$\Omega(\Phi)$$. They also prove a lower bound of $$\Omega(\log n)$$, even in the (easier) LOCAL model.

On The Multiparty Communication Complexity of Testing Triangle-Freeness, by Orr Fischer, Shay Gershtein, and Rotem Oshman (arXiv). The final paper in May on distributed graph property testing — in contrast to the other papers, this paper considers multiparty communication complexity in the coordinator model. A graph is divided up among $$k$$ players, and each player can only communicate with a coordinator and not each other. The authors show upper bounds on the communication, in both the vanilla and the simultaneous model (which allows only $$1$$ round of communication). They also show a near-matching lower bound for simultaneous protocols on graphs of constant degree.

Exponentially Small Soundness for the Direct Product Z-test, by Irit Dinur and Inbal Livni Navon (ECCC). This paper studies the problem of direct product testing, in which one tests whether the output of a function on a coordinate depends only on the input to that coordinate (approximately). The authors investigate the Z-test of Impagliazzo, Kabanets, and Wigderson, and show that it achieves the optimal soundness of $$\varepsilon \geq \exp(-O(k))$$.

# News for February 2017

This February, we had four exciting testing papers, covering domains from Boolean functions to distributions to graphs. Read on to catch up on the latest results!

Property Testing of Joint Distributions using Conditional Samples, by Rishiraj Bhattacharyya and Sourav Chakraborty (arXiv). This paper focuses on the problems of identity testing and independence testing for multivariate distributions in the conditional sampling model. Multivariate distribution testing has not previously been considered in the conditional sampling model. While previous algorithms for identity testing would generally carry over to this setting, the authors restrict themselves to algorithms where the queries are on subcubes of the domain. Interestingly, the authors show that the complexity of these problems is polynomial in the dimension (and that a polynomial dependence is necessary). That is, despite the “simple” structure of the queries, they are still powerful enough to avoid the curse of dimensionality which is present in vanilla multivariate distribution testing.

An Improved Dictatorship Test with Perfect Completeness, by Amey Bhangale, Subhash Khot, and Devanathan Thiruvenkatachari (arXiv, ECCC). A function is a dictator if it depends on exactly one variable. This paper provides an improved tester with perfect completeness for this property. In particular, they give a (non-adaptive) $$k$$-query test which never rejects a dictator function, and accepts a far-from-dictator function with probability at most $$\frac{2k +1}{2^k}$$. This improves upon the previous best result, which accepts a far-from-dictator function with probability at most $$\frac{2k+3}{2^k}$$.

An Adaptivity Hierarchy Theorem for Property Testing, by Clément L. Canonne and Tom Gur (arXiv, ECCC). This paper investigates the power of adaptivity in property testing. We most frequently think of algorithms as either adaptive or non-adaptive. In the former, an algorithm may specify each query after viewing the results of all previous queries, while in the latter, all the queries must be specified in advance. The authors focus on the natural interpolation, in which the algorithm is allowed to make queries in $$k$$ rounds, where the queries in a round may depend on the result of queries in all previous rounds. They show that the power of an algorithm can shift dramatically with just one more round of adaptivity — there exist properties which can be tested with $$\tilde O(k)$$ queries with $$k$$ rounds of adaptivity, but require $$\Omega(n)$$ queries with $$k-1$$ rounds of adaptivity.

Beyond Talagrand Functions: New Lower Bounds for Testing Monotonicity and Unateness, by Xi Chen, Erik Waingarten, and Jinyu Xie (arXiv). The main result in this paper is an improved lower bound for the well-studied problem of testing monotonicity of Boolean functions. The authors show that any two-sided and adaptive algorithm requires $$\tilde \Omega(n^{1/3})$$ samples to test whether a function is monotone or far from monotone. This improves upon the previous result of $$\tilde \Omega(n^{1/4})$$ by Belovs and Blais, which was the first to show that adaptive monotonicity testing requires polynomially-many queries. The construction of Belovs and Blais uses Talagrand’s random DNFs, for which they also showed $$\tilde O(n^{1/4})$$ queries are sufficient. This paper analyzes an extension of this family, denoted as two-level Talagrand functions. The result leaves open the tantalizing question: does adaptivity help with monotonicity testing? Or does the complexity remain $$\tilde \Theta(\sqrt{n})$$? The authors also prove lower bounds for the problem of testing unateness, a problem of recent interest and a natural generalization of monotonicity.

# News for November 2016

November was quite eventful for property testing, with six exciting new results for you to peruse.

Alice and Bob Show Distribution Testing Lower Bounds (They don’t talk to each other anymore.), by Eric Blais, Clément L. Canonne, and Tom Gur (ECCC). The authors examine distribution testing lower bounds through the lens of communication complexity, a-la Blais, Brody, and Matulef, who previously showed such a connection for property testing lower bounds in the Boolean function setting. In this work, the authors’ main result involves testing identity to a specific distribution $$p$$. While Valiant and Valiant showed tight bounds involving the $$\ell_{2/3}$$-quasinorm of $$p$$, this paper gives tight bounds using a different quantity, namely Peetre’s $$K$$-functional. Their techniques also give lower bounds for several other properties (some old and some new), including monotonicity, sparse symmetric support, and $$k$$-juntas in the PAIRCOND model.

Fast property testing and metrics for permutations, by Jacob Fox and Fan Wei (arXiv). This paper proves a general testing result for permutations. In particular, it shows that any hereditary property of permutations is two-sided testable with respect to the rectangular distance with a constant number of queries. While in many such testing results on combinatorial objects (such as graphs), a “constant number of queries” may be exorbitantly large (due to complexities arising from an application of the strong regularity lemma), surprisingly, the complexity obtained in this paper is polynomial in $$1/\varepsilon$$.

A Unified Maximum Likelihood Approach for Optimal Distribution Property Estimation, by Jayadev Acharya, Hirakendu Das, Alon Orlitsky, and Ananda Theertha Suresh (arXiv, ECCC). There has been a considerable deal of work recently on estimating several symmetric distribution properties, namely support size, support coverage, entropy, and distance to uniformity. One drawback of these results is that, despite the similarities between these properties, seemingly different techniques are required to obtain optimal rates for each. This paper shows that one concept, pattern maximum likelihood (PML), unifies them all. A PML distribution of a multiset of samples is any distribution which maximizes the likelihood of observing the multiplicities of the multiset, after discarding the labels on the elements. This can behave quite differently from the sequence maximum likelihood (SML), or empirical distribution. In particular, if a multiset of samples on support $$\{x, y\}$$ is $$\{x, y, x\}$$, then the SML is $$(2/3, 1/3)$$, while the PML is $$(1/2, 1/2)$$. The main result of this paper is, if one can approximate PML, then applying the plug-in estimator gives the optimal sample complexity for all of the aforementioned properties. The one catch is that efficient approximation of the PML is currently open. Consider the gauntlet thrown to all our readers!

Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures, by Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart (arXiv, ECCC). While the main focus of this work is on lower bounds for distribution estimation in the statistical query (SQ) model, this paper also has some interesting lower bounds for multivariate testing problems. Namely, they show that it is impossible to achieve a sample complexity which is significantly sublinear in the dimension for either of the following two problems:

• Given samples from an $$n$$-dimensional distribution $$D$$, distinguish whether $$D = \mathcal{N}(0,I)$$ or $$D$$ is $$\varepsilon/100$$-close to any $$\mathcal{N}(\mu, I)$$ where $$\|\mu\|_2 \geq \varepsilon$$.
• Given samples from an $$n$$-dimensional distribution $$D$$, distinguish whether $$D = \mathcal{N}(0,I)$$ or $$D$$ is a mixture of $$k$$ Gaussians with almost non-overlapping components.

Collision-based Testers are Optimal for Uniformity and Closeness, by Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price (arXiv, ECCC). In the TCS community, the seminal results in the field of distribution testing include the papers of Goldreich and Ron and Batu et al., which study uniformity testing and $$\ell_2$$-closeness testing (respectively) using collision based testers. While these testers appeared to be lossy, subsequent results have attained tight upper and lower bounds for these problems. As suggested by the title, this paper shows that collision-based testers actually achieve the optimal sample complexities for uniformity and $$\ell_2$$-closeness testing.

Testing submodularity and other properties of valuation functions, by Eric Blais, and Abhinav Bommireddi (arXiv). This paper studies the query complexity of several properties which have been studied in the context of valuation functions in algorithmic game theory. These properties are real-valued functions over the Boolean hypercube, and include submodularity, additivity, unit-demand, and much more. The authors show that, for constant $$\varepsilon$$ and any $$p \geq 1$$, these properties are constant-sample testable. Their results are obtained via an extension of the testing by implicit learning method of Diakonikolas et al.

# News for August 2016

This summer is a very exciting one for property testing, with seven (!) papers uploaded during the month of August. In particular, there have been a number of results on testing unateness.

An $$\tilde O(n)$$ Queries Adaptive Tester for Unateness, by Subhash Khot and Igor Shinkar (arXivECCC). A Boolean function is unate if, for each $$i \in [n]$$, it is monotone non-increasing or monotone non-decreasing in coordinate $$i$$. This is a generalization of monotonicity, one of the most studied problems in the property testing literature. This paper gives an adaptive algorithm for testing if a Boolean function is unate. The algorithm requires $$\tilde O(n)$$ queries, improving on the $$O(n^{1.5})$$ query tester of Goldreich et al. from 2000.

A $$\tilde O(n)$$ Non-Adaptive Tester for Unateness, by Deeparnab Chakrabarty and C. Seshadhri (arXivECCC). The unateness testing continues! This paper also gives a $$\tilde O(n)$$ algorithm for this problem, but improves upon the previous work by making non-adaptive queries. The algorithm and analysis are very clean and elegant — with a theorem by Chakrabarty et al. in place, they take up just a single page!

Testing Unateness of Real-Valued Functions, by Roksana Baleshzar, Meiram Murzabulatov, Ramesh Krishnan S. Pallavoor, and Sofya Raskhodnikova (arXiv). The final paper this month on testing unateness, this work studies real-valued functions on the hypergrid $$[n]^d$$ (as opposed to Boolean functions on the hypercube). They give an adaptive algorithm requiring $$O\left(\frac{d \log (\max (d,n))}{\varepsilon}\right)$$ queries, generalizing the above result by Khot and Shinkar, which works only for Boolean functions on the hypercube and requires $$O\left(\frac{d \log d}{\varepsilon}\right)$$ queries. Additionally, they prove lower bounds for this setting (of $$\Omega(\min(d, |R|^2))$$, where $$R$$ is the range of the function) and the Boolean hypercube setting (of $$\Omega(\sqrt{d}/\varepsilon)$$). The latter lower bound leaves open a tantalizing question: does a $$O(\sqrt{d})$$ query algorithm exist? In other words, is testing unateness no harder than testing monotonicity?

Testing k-Monotonicity, by Clément L. Canonne, Elena Grigorescu, Siyao Guo, Akash Kumar, Karl Wimmer (arXivECCC). And now, a different generalization of monotonicity! This paper studies testing $$k$$-monotonicity, where a Boolean function over a poset $$\mathcal{D}$$ is $$k$$-monotone if it alternates between the values $$0$$ and $$1$$ at most $$k$$ times on any ascending chain in $$\mathcal{D}$$. Classical monotonicity is then $$1$$-monotone in this definition. On the hypercube domain, the authors prove a separation between testing monotonicity and testing $$k$$-monotonicity and a separation between testing and learning. They also present a tolerant test for $$k$$-monotonicity over the hypergrid $$[n]^d$$ with complexity independent of $$n$$.

The Dictionary Testing Problem, by Siddharth Barman, Arnab Bhattacharyya, and Suprovat Ghoshal (arXiv). The dictionary learning problem has enjoyed extensive study in computer science. In this problem, given a matrix $$Y$$, one wishes to construct a decomposition $$Y = AX$$ such that each column of $$X$$ is $$k$$-sparse. In the settings typically studied, such a decomposition is guaranteed to exist. This paper initiates the dictionary testing problem, in which one wishes to test whether $$Y$$ admits such a decomposition, or is far from it. The authors provide a very elegant characterization — it admits a decomposition iff the columns of $$Y$$ have a sufficiently narrow Gaussian width.

The Sparse Awakens: Streaming Algorithms for Matching Size Estimation in Sparse Graphs, by Graham Cormode, Hossein Jowhari, Morteza Monemizadeh, and S. Muthukrishnan (arXiv). This paper provides streaming algorithms for estimating the size of the maximum matching in graphs with some underlying sparsity. They give a one-pass algorithm for insert-only and dynamic streams with bounded arboricity, and a two-pass algorithm for random order streams with bounded degree. Interestingly, for the latter result, they use local exploration techniques, which have seen previous use in the property testing literature.

Testing Assignments to Constraint Satisfaction Problems, by Hubie Chen, Matt Valeriote, and Yuichi Yoshida (arXiv). Given a CSP from a finite relational structure $$A$$ and query access to an assignment, is the CSP satisfied? This paper shows a dichotomy theorem which characterizes which $$A$$ admit a constant-query test. They go on to show a trichotomy theorem for when instances may include existentially quantified variables, classifying which $$A$$ admit a constant-query test, which admit only a sublinear-query test, and which require a linear number of queries.

Minimizing Quadratic Functions in Constant Time, by Kohei Hayashi and Yuichi Yoshida (arXiv). This paper investigates the problem of (approximately) minimizing quadratic functions in high-dimensions. Prior approaches to this problem scale poorly with the dimension — at least linearly, if not worse. This work uses a sampling-based method to avoid paying this cost: the resulting time complexity is completely independent of the dimension!