# News for March 2019

Last month was very calm. Eerily calm in fact: no property testing or related sublinear-time algorithm in view!* Gird your loins for the April batch, I reckon?

$${}^\ast$$ That we saw, at least. if we missed some, please mention it in the comments below!)

Update: As mentioned in the comments, we did indeed missed two (related) works on distribution estimation from a competitive viewpoint. Namely, for a large class of properties (entropy, distance to uniformity, support size…), Hao, Orlitsky, Suresh, and Wu provide in the first paper (arXiv 1904.00070) an “instance-optimal(ish)” estimator which does “as well” with $$m/\sqrt{\log m}$$ samples than the natural and naive empirical estimator would do with $$m$$ (thus, in some sense, amplifying the data size). In the following paper (arXiv 1903.01432), Hao and Orlitsky improve this and remove the “ish” to get the optimal amplification $$m/\log m$$.

# News for February 2019

A lean month by our (now high) standards. Three papers, on local computational algorithms for spanners, sublinear set cover, and quantum distribution testing.

Local Computation Algorithms for Spanners, by Merav Parter, Ronitt Rubinfeld, Ali Vakilian, Anak Yodpinyanee (arXiv). Consider the problem of computing a $$k$$-spanner for a graph $$G$$. This is a subgraph $$H$$, where (for any pair of vertices) the shortest path distance in $$H$$ is at most $$k$$ times the corresponding distance in $$G$$. A Local Computation Algorithm (LCA) essentially gives sublinear query access to (some spanner) $$H$$, without any preprocessing on $$G$$. In other words, an LCA takes as input an edge $$(i,j)$$ in $$G$$, and in sublinear time says “yes” or “no”. Despite no preprocessing, it is guaranteed that all the “yes” answers form a spanner, and the number of “yes” answers is small. One of the main challenges in LCAs for graph problems is getting a polynomial dependence on $$\Delta$$, the maximum degree. (Many results end up with an exponential dependence on $$\Delta$$, essentially assuming bounded-degree.) This paper give an LCA outputting spanners of optimal size for $$k=3,5$$, with query complexity $$n^{1-\alpha}$$ (for constant $$\alpha$$ depending on $$k$$). The second result works for general $$k$$, but has a query complexity that is $$O(\Delta^4n^{2/3})$$.

Set Cover in Sub-linear Time, by Piotr Indyk, Sepideh Mahabadi, Ronitt Rubinfeld, Ali Vakilian, Anak Yodpinyanee (arXiv). Set cover needs no introduction. This paper considers sublinear algorithms for set cover. It is convenient to think of the instance as a binary $$m \times n$$ matrix, for $$m$$ sets over a universe of size $$n$$. The query model gives access to the $$j$$th non-zero entry in a desired column/row. The aim is this paper is to actually construct a set cover, and not just approximate its size (the latter problem with initiated in Nguyen-Onak). The first result shows a black-box conversion of any set cover algorithm into a sublinear query complexity algorithm. Any $$\alpha$$-approximation for set cover can be converted into an $$O(\alpha)$$-approximation, that makes $$O(m(n/k)^{\beta} + nk)$$ (where $$k$$ is the optimal set cover size, and $$\beta < 1$$ depends on $$\alpha$$). For large $$k$$, the paper gives an $$O(mn/k)$$ query, $$O(\log n)$$-approximation algorithm. There are various lower bounds given, that show these dependencies are necessary.

Distributional Property Testing in a Quantum World by András Gilyén and Tongyang Li (arXiv). There are two aspects to quantum distribution testing. First, we can think of faster distribution testing (for classical problems) using quantum computation. Second, we can generalize the problem of distribution testing to its quantum equivalent, called density operator testing (this papers gives the first result for this quantum generalization). A distribution is a non-negative vector in $$\mathbb{R}^n$$ with unit $$l_1$$-norm. The quantum generalization, a density operator, is a PSD matrix in $$\mathbb{C}^n$$ with unit trace. This paper has a number of results, along both aspects. For example, for the well-known problem of testing equality of distributions under total variation distance, the papers gives an $$\widetilde{O}(\sqrt{n}/\epsilon)$$ tester (an improvement over classical lower bounds). For testing equality of density operators under the same distance, the paper gives an $$O(n/\epsilon)$$ tester (note that the trivial bound is $$O(n^2)$$, since the operator is a matrix).

# News for January 2019

Minimax Testing of Identity to a Reference Ergodic Markov Chain, by Geoffrey Wolfer and Aryeh Kontorovich (arXiv). This work studies distributional identity testing on Markov chains from a single trajectory, as recently introduced by Daskalakis, Dikkala, and Gravin: we wish to test whether a Markov chain is equal to some reference chain, or far from it. This improves on previous work by considering a stronger distance measure than before, and showing that the sample complexity only depends on properties of the reference chain (which we are trying to test identity to). It additionally proves instance-by-instance bounds (where the sample complexity depends on properties of the specific chain we wish to test identity to).

Almost Optimal Distribution-free Junta Testing, by Nader H. Bshouty (arXiv). This paper provides a $$\tilde O(k/\varepsilon)$$-query algorithm with two-sided error for testing if a Boolean function is a $$k$$-junta (that is, its value depends only on $$k$$ of its variables) in the distribution-free model (where distance is measured with respect to an unknown distribution from which we can sample). This complexity is a quadratic improvement over the $$\tilde O(k^2)/\varepsilon$$-query algorithm of Chen, Liu, Servedio, Sheng, and Xie. This complexity is also near-optimal, as shown in a lower bound by Saglam (which we covered back in August).

Exponentially Faster Massively Parallel Maximal Matching, by Soheil Behnezhad, MohammadTaghi Hajiaghayi, and David G. Harris (arXiv). The authors consider maximal matching in the Massively Parallel Computation (MPC) model. They show that one can compute a maximal matching in $$O(\log \log \Delta)$$-rounds, with $$O(n)$$ space per machine. This is an exponential improvement over the previous works, which required either $$\Omega(\log n)$$ rounds or $$n^{1 + \Omega(1)}$$ space per machine. Corollaries of their result include approximation algorithms for vertex cover, maximum matching, and weighted maximum matching.

# News for December 2018

Happy near year, and best wishes to those close and $$\varepsilon$$-far! December concluded the year with 4 new preprints, spanning quite a lot of the property testing landscape:

Testing Stability Properties in Graphical Hedonic Games, by Hendrik Fichtenberger and Anja Rey (arXiv). The authors of this paper consider the problem of deciding whether a given hedonic game possesses some “coalition stability” in a property testing framework. Namely, recall that a hedonic game is a game where players (nodes) form coalitions (subsets of nodes) based on their individual preferences and local information about the considered coalition, thus resulting in a partition of the original graph.
Several notions exist to evaluate how good such a partition is, based on how “stable” the given coalitions are. This work focuses on hedonic games corresponding to bounded-degree graphs, introducing and studying the property testing question of deciding (for several such notions of stability) whether a given game admits a stable coalition structure, or is far from admitting such a partition.

Spectral methods for testing cluster structure of graphs, by Sandeep Silwal and Jonathan Tidor (arXiv). Staying among bounded-degree graphs, we turn to testing clusterability of graphs, the focus of this paper. Given an $$n$$-node graph $$G$$ of degree at most $$d$$ and parameters $$k, \phi$$, say that $$G$$ is $$(k, \phi)$$-clusterable if it can be partitioned in $$k$$ parts of inner conductance at least $$\phi$$.
Analyzing properties of a random walk on $$G$$, this work gives a bicriterion guarantee ($$(k, \phi)$$-clusterable vs. $$\varepsilon$$-far from $$(k, \phi^\ast)$$-clusterable, where $$\phi^\ast \approx \varepsilon^2\phi^2$$) for the case $$k=2$$, improving on previous work by Czumaj, Peng, and Sohler’15.

We then switch from graphs to probability distributions with our third paper:

Inference under Information Constraints I: Lower Bounds from Chi-Square Contraction, by Jayadev Acharya, Clément Canonne, and Himanshu Tyagi (arXiv). (Disclaimer: I’m one of the authors.) In this paper, the first of an announced series of three, the authors generalize the settings of two previous works we covered here and there to consider the general question of distribution testing and learning when the $$n$$ i.i.d. samples are distributed among $$n$$ players, which each can only communicate their sample to the central algorithm by respecting some pre-specified local information constraint (e.g., privacy, or noise, or communication budget). This paper develops a general lower bound framework to study such questions, with a systematic focus on the power of public vs. private randomness between the $$n$$ parties, and instantiate it to obtain tight bounds in the aforementioned locally private and communication-limited settings. (Spoiler: public randomness strictly helps, but not always.)

Finally, after games, graphs, and distributions, our fourth paper of the month concerns testing of functions:

Partial Function Extension with Applications to Learning and Property Testing, by Umang Bhaskar and Gunjan Kumar (arXiv). This work focuses on a problem quite related to property testing, that of partial function extension: given as input $$n$$ pairs point/value from a purported function on a domain $$X$$ of size $$|X| > n$$, one is tasked with deciding whether there does exist (resp., with finding) a function $$f$$ on $$X$$ consistent with these $$n$$ values which further satisfies a specific property, such as linearity or convexity. This is indeed very reminiscent of property testing, where one gets to query these $$n$$ points and must decide (approximate) consistency with such a well-behaved function. Here, the authors study the computational hardness of this partial function extension problem, specifically for properties such as subadditivity and XOS (a sub-property of subadditivity); and as corollaries obtain new property testers for the classes of subadditive and XOS functions.

As usual, if you know of some work we missed from last December, let us know in the comments!

# News for November 2018

Alas! Sorry for the delay, but November was a good month with five seven papers. We missed a few papers from September, so please look at the updated post for September as well. (Update: Thanks to Omri Ben-Eliezer and an anonymous poster for pointing out two missed papers.)

From Local to Robust Testing via Agreement Testing, by Irit Dinur, Tali Kaufman, and Noga Ron-Zewi (ECCC). Consider a code, which is simply a subset of $$C \subseteq \mathbb{F}^n_q$$. (Typically, we think of linear codes, which form a linear subspace.) The study of locally testable codes is a rich area. Consider an input $$x \in \mathbb{F}^n_q$$. A local tester for code/property $$C$$ queries, according to some distribution, a subset $$K$$ of $$Q$$ coordinates in $$x$$, and rejects if the “view” $$x|_K$$ is inconsistent with any codeword. The rejection probability should be proportional to the distance of $$x$$ to $$C$$, denoted $$dist(x,C)$$. A robust tester demands the average distance of $$x|_K$$ to the corresponding property $$C|_K$$ must be proportional to $$dist(x,C)$$. This is a much stronger demand than that of just local testing. The main result is local testability implies robust testability for the special class of lifted codes. The proof goes via agreement testing, where an algorithm gets access to some fragments of $$x$$, and must decide if these fragments are consistent with a codeword.

Testing local properties of arrays, by Omri Ben-Eliezer (ECCC). Consider a function $$f:[n]^d \to \Sigma$$, where $$\Sigma$$ is finite. Now, suppose we define a property $$\mathcal{P}$$ with respect to a set of forbidden $$[k]^d$$ (consecutive) subarrays. Such a property is called $$k$$-local. The inspiration for this work is a line of research on testing of image properties. Note that for $$k=2,3$$, this notion subsumes a number of properties, such as monotonicity, $$c$$-Lipschitz, submodularity, convexity, etc. The main result is a one-sided, non-adaptive tester for all such properties. Ignoring $$\epsilon, k$$ dependencies, the query complexity is $$O(\log n)$$ for $$d=1$$ and $$O(n^{d-1})$$ for $$d > 1$$. Remarkably, there is no dependence on the alphabet size $$|\Sigma|$$. Moreover, the bound above is optimal for one-sided, non-adaptive testers. The basic idea to pick blocks of various sizes, and query all points on the boundary. Then, the tester can determine (by sort of brute force search) if this is consistent with some function in $$\mathcal{P}$$. The key is to prove that for a function that is far from $$\mathcal{P}$$, there are many blocks that cannot be consistent with $$\mathcal{P}$$.

Erasures vs. Errors in Local Decoding and Property Testing, by Sofya Raskhodnikova, Noga Ron-Zewi and Nithin Varma (ECCC). The main theme of this paper is the notion of erasure-resilient testing. Consider property $$\mathcal{P}$$. Suppose $$\alpha$$-fraction of the coordinates of input $$x$$ is hidden. Thus, queries to these coordinates of $$x$$ return nothing. Our aim is to accept if there is some “filling in” of the hidden coordinates that make $$x$$ satisfy $$\mathcal{P}$$, and reject if for all fillings, $$x$$ remains from from $$\mathcal{P}$$. If $$\alpha = 0$$, this is standard property testing. Suppose there exists an $$(\alpha, \alpha+\epsilon)$$-tolerant tester for $$\mathcal{P}$$. (I’m fudging factors a bit here for readability.) We get an $$\epsilon$$-tester with $$\alpha$$-fraction of erasures, by simply filling $$x$$ in arbitrarily. So this model is in between vanilla and tolerant testing. The main result is a separation. There is a property that is constant-query testable under erasures, but requires $$n^{\Omega(1)}$$ queries to tolerant test (with the above parameters). One of the main technical results is a list decoder for the Hadamard code that can tolerate erasures of any $$\alpha \lt 1$$.

Domain Reduction for Monotonicity Testing: A o(d) Tester for Boolean Functions on Hypergrids, by Hadley Black, Deeparnab Chakrabarty, and C. Seshadhri (ECCC). Consider monotonicity testing of Boolean functions of the hypergrid, $$f:[n]^d \to \{0,1\}$$. It is known that there are $$O(d)$$ testers with complexity independent of $$n$$, while for $$n=2$$, there are $$o(d)$$ testers. The main result is a tester with both these properties. The paper gives an $$O(d^{5/6})$$ tester for such functions. The main technical ingredient is a domain reduction theorem. Consider restrictions of $$f$$ to a uniform random $$[poly(d\epsilon^{-1})]^d$$ hypergrid. The paper proves that if $$f$$ is $$\epsilon$$-far from monotone, then (with high probability) so is the restriction. Thus, for the purposes of monotonicity testing, one can simply assume that $$n = poly(d)$$. This gives a black-box method to get testers independent of $$n$$.

On the Testability of Graph Partition Properties, by Yonatan Nakar and Dana Ron (ECCC). Consider property testing on dense graphs. The seminar result of Goldreich-Goldwasser-Ron studies graph partitioning properties. Such a property is defined as follows. Fix constant $$k$$. We wish to partition the vertices of graph $$G$$ into $$V_1, V_2, \ldots, V_k$$. For each $$i$$, $$|V_i|/n \in [\rho^L_i, \rho^U_i]$$. Furthermore, for each $$i, j$$, the number of edges between $$V_i, V_j$$ must be in $$[\rho^L_{i,j}n^2, \rho^U_{i,j}n^2]$$. (All the $$\rho$$s are parameters of the property.) This subsumes properties like $$k$$-colorability, and the classic result is a $$poly(1/\epsilon)$$-tester for these properties. Note that the edge conditions are absolute, with respect to $$n^2$$, and not with respect to $$|V_i|\cdot |V_j|$$. This paper allows for relative bounds of the form $$[\rho^L_{i,j}|V_i|\cdot |V_j|, \rho^U_{i,j}|V_i|\cdot |V_j|]$$. This is far more expressive, subsuming properties like having large cliques, or large complete bipartite graphs. One of the main results is a (two-sided) $$poly(1/\epsilon)$$ tester for all such properties. Furthermore, there is a characterization of such properties that are one-sided testable in $$poly(1/\epsilon)$$ queries. For such properties, it is proven that the density conditions (between $$V_i, V_j$$) must either be unconstrained or have $$\rho^L_{i,j} = \rho^U_{i,j}$$ be either 1 or 0.

Limits of Ordered Graphs and Images, by Omri Ben-Eliezer, Eldar Fischer, Amit Levi, and Yuichi Yoshida (arXiv). The theory of graph limits is a deep topic, with connections to property testing, regularity lemmas, and analysis of real-world networks. Roughly speaking, consider a sequence $$\{G_n\}$$ of graphs where the (relative) frequency of any fixed subgraph converges. Then, we can define a limit of the graphs, called a graphon, which is a measurable function $$W:[0,1]^2 \to [0,1]$$. This paper discovers appropriate limits of ordered graphs. Note that images can be represented as ordered graphs. An ordered graph is a symmetric function $$G:[n]^2 \to \{0,1\}$$ (with the trivial automorphism group). The graphon definitions and limits do not generalize here. There is a sequence of ordered graphs where the ordered subgraph frequencies converge, but one cannot associate any graphon as the limit. This paper discovered a new object called an orderon, which is a function $$W:([0,1]^2)^2 \to [0,1]$$. Orderons can be shown to be the limits of sequences of ordered graphs.

Two Party Distribution Testing: Communication and Security, by Alexandr Andoni, Tal Malkin, and Negev Shekel Nosatzki (arXiv). This paper gives a new take on distribution testing. Suppose Alice and Bob have each have access to $$t$$ samples from (unknown) distributions $$p$$ and $$q$$ respectively. Their aim is to distinguish $$p = q$$ from $$\|p-q\|_{TV} > \epsilon$$. How much communication is required? In the standard setting, it is known that $$n^{2/3}$$ (ignoring $$\epsilon$$) is necessary and sufficient. In the communication setting, the bound turns out to basically be $$\Theta((n/t)^2)$$. Observe how for $$t=n^{2/3}$$, the communication is $$n^{2/3}$$ (saying that Alice is simply going to send all her samples to Bob). But for larger $$t$$, the communication decreases. Since Alice can learn more about her distribution, she can use sketching techniques to reduce her communication. There is an analogous result for testing independence of two distributions. Furthermore, one can even require these protocols to be secure, so Alice and Bob learn nothing about each others samples (other than the final answer).

# News for October 2018

As October draws to a close, we are left with four new papers this month.

Testing Matrix Rank, Optimally, by Maria-Florina Balcan, Yi Li, David P. Woodruff, Hongyang Zhang (arXiv). This work investigates the problem of non-adaptively testing matrix properties, in both the standard query model and the more general sensing model, in which the algorithm may query the component-wise inner product of the matrix with “sensing” matrices. It proves tight upper and lower bounds of $$\tilde \Theta(d^2/\varepsilon)$$ for the query model, and eliminating the dependence on $$\varepsilon$$ in the sensing model. Furthermore, they introduce a bounded entry model for testing of matrices, in which the entries have absolute value bounded by 1, in which they prove various bounds for testing stable rank, Schatten-$$p$$ norms, and SVD entropy.

Testing Halfspaces over Rotation-Invariant Distributions, by Nathaniel Harms (arXiv). This paper studies the problem of testing from samples whether an unknown boolean function over the hypercube is a halfspace. The algorithm requires $$\tilde O(\sqrt{n}/\varepsilon^{7})$$ random samples (which has a dependence on $$n$$ which is tight up to logarithmic factors) and works for any rotation-invariant distribution, generalizing previous works that require the distribution be Gaussian or uniform.

Testing Graphs in Vertex-Distribution-Free Models, by Oded Goldreich (ECCC). While distribution-free testing has been well-studied in the context of Boolean functions, it has not been significantly studied in the context of testing graphs. In this context, distribution-free roughly means that the algorithm can sample nodes of the graph according to some unknown distribution $$D$$, and must be accurate with respect to the measure assigned to nodes by the same distribution. The paper investigates various properties which may be tested with a size-independent number of queries, including relationships with the complexity of testing in the standard model.

A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice, by Hendrik Fichtenberger and Dennis Rohde (arXiv). In the $$k$$-nearest neighbor problem, we are given a set of points $$P$$, and the answer to a query $$q$$ is the set of the $$k$$ points in $$P$$ which are closest to $$q$$. This paper considers the following property testing formulation of the problem: given a set of points $$P$$ and a graph $$G = (P,E)$$, is each point $$p \in P$$ connected to its $$k$$-nearest neighbors, or is it far from being a $$k$$NN graph? The authors prove upper and lower bounds on the complexity of this problem, which are both sublinear in the number of points $$n$$.

# News for September 2018

Last month was slower than usual, with only one property testing paper. We did miss a few papers, so hopefully this list is now up to date.

Property testing and expansion in cubical complexes, by David Garber, Uzi Vishne (arXiv). Consider the question of testing if an arbitrary function $$f\colon V\times V \to\{-1,1\}$$ is of the form $$f(x,y) = h(x)h(y)$$ for some $$h\colon V\to\{-1,1\}$$. An intuitive one-sided test, shown to work by Lubotzky and Kaufman (2014), is to pick uniformly random $$x,y,z\in V$$ and check that $$f(x,y)f(y,z)f(z,x)=1$$. This paper considers the high-dimensional generalization of testing the property that a function$$f\colon V\times V \times V\times V \to\{-1,1\}$$ is of the form $$f(w,x,y,z) = \alpha\cdot h(w,x)h(y,x)h(y,z) h(w,z)$$, for some $$h\colon V\times V\to\{-1,1\}$$ and sign $$\alpha\in\{-1,1\}$$. The authors derive necessary and sufficient conditions for testability of this property, by formulating it in the language of incidence geometry and exploiting this connection.

Local Computation Algorithms for the Lovász Local Lemma, by Dimitris Achlioptas, Themis Gouleakis and Fotis Iliopoulos (arXiv). There has been significant work in the past decade on constructive versions of the Lovász Local Lemma (LLL), since the seminal work of Moser-Tardos. This paper designs news Local Computation Algorithms (LCAs) for the LLL. It’s best to consider the problem of $$k$$-SAT. Consider a $$k$$-CNF $$\phi$$ with $$n$$ variables, $$m$$ clauses, where every variable is in at most $$d$$ clauses. By the LLL, if $$d \leq 2^k/ke$$, then $$\phi$$ is satisfiable. An LCA would compute any bit of a satisfying assignment, by making sublinear queries into $$\phi$$. This was first studied by Rubinfeld-Tamir-Vardi-Xie. Their LCA would make polylogarithmically many queries, but requires a stronger condition that what LLL achieves. This paper gives the first sublinear LCA with precisely the LLL conditions, though the number of queries is $$n^\beta$$ (for $$\beta \lt 1$$). The main result is an LCA for an abstract LLL formulation, that also leads to LCAs for graph coloring. Roughly speaking, for a graph with maximum degree $$\Delta$$ where all neighborhoods are sufficiently far from cliques, the LLL shows that the chromatic number bound of $$\Delta + 1$$ can be beaten. This result gives an LCA for graph coloring under these LLL conditions.

Sublinear Time Low-Rank Approximation of Distance Matrices, by Ainesh Bakshi and David P. Woodruff (arXiv). Consider two sets of points $$P$$ and $$Q$$ in a metric space, with $$m$$ and $$n$$ points respectively. The $$m \times n$$ distance matrix $$A$$ contains all pairwise distances between them. This paper studies approximating $$A$$ using a low rank representation, without reading all the entries in $$A$$. The main result is as follows. For rank parameter $$k$$, let $$A_k$$ be the closest (by Frobenius norm) rank-$$k$$-approximation to $$A$$. There is a $$O(m^{1+\gamma} + n^{1+\gamma}poly(k\epsilon^{-1}))$$ (for arbitrary $$\gamma > 0$$) algorithm that outputs a rank $$k$$-matrix $$B$$ with the following property: $$\|A-B\|^2_F \leq \|A-A_k\|^2_F + \epsilon \|A\|^2_F$$. Interestingly, there is a lower bound showing that a $$o(mn)$$ algorithm cannot get a multiplicative approximation. One technical ingredient is a method to sample column norms of $$A$$, under an approximate triangle inequality constraint. This allows one to compute smaller matrices that approximate $$A$$, on which one can directly compute an approximate rank-$$k$$ decomposition.

On Solving Linear Systems in Sublinear Time, by Alexandr Andoni, Robert Krauthgamer, Yosef Pogrow (arXiv). Solving Laplacian linear systems is an immensely deep area, with lots of exciting recent work. This paper studies sublinear algorithms for such problems. Consider a Laplacian matrix $$L$$ (think of $$I – A/d$$, for adjacency matrix $$A$$ of a $$d$$-regular graph). The aim is to solve $$Lx = b$$, for $$x, b \in {\mathbb R}^n$$. Let the solution be $$x^*$$. The main result shows that one can approximate any entry in sublinear time. Specifically, for any coordinate $$i$$, one can output an approximate $$\hat{x_i}$$ such that $$|\hat{x_i} – x^*_i| \leq \|x^*\|_\infty$$. The running time is essentially $$d\epsilon^{-2}\kappa^3$$, where $$\kappa$$ is a bound on the condition number of $$L$$. There are generalizations for Symmetrically Diagonally Dominant (SDD) matrices, a generalization of Laplacians. There is an $$\Omega(n^{1/d^2})$$ lower bound for solving general PSD systems, and a lower bound showing that $$\Omega(\kappa^2)$$ queries into $$b$$ are necessary.

# News for August 2018

Three papers this month close out Summer 2018.

Test without Trust: Optimal Locally Private Distribution Testing, by Jayadev Acharya, Clément L. Canonne, Cody Freitag, and Himanshu Tyagi (arXiv). This work studies distribution testing in the local privacy model. While private distribution testing has recently been studied, requiring that the algorithm’s output is differentially private with respect to the input dataset, local privacy has this requirement for each individual datapoint. The authors prove optimal upper and lower bounds for identity and independence testing, using a novel public-coin protocol named RAPTOR which can outperform any private-key protocol.

Testing Graph Clusterability: Algorithms and Lower Bounds, by Ashish Chiplunkar, Michael Kapralov, Sanjeev Khanna, Aida Mousavifar, and Yuval Peres (arXiv). This paper studies the problem of testing whether a graph is $$k$$-clusterable (based on the conductance of each cluster), or if it is far from all such graphs — this is a generalization of the classical problem of testing whether a graph is an expansion. It manages to solve this problem under weaker assumptions than previously considered. Technically, prior work embedded a subset of the graph into Euclidean space and clustered based on distances between vertices. This work uses richer geometric structure, including angles between the points, in order to obtain stronger results.

Near log-convexity of measured heat in (discrete) time and consequences, by Mert Saglam (ECCC). Glancing at the title, it might not be clear how this paper relates to property testing. The primary problem of study is the quantity $$m_t = uS^tv$$, where $$u, v$$ are positive unit vectors and $$S$$ is a symmetric substochastic matrix. This quantity can be viewed as a measurement of the heat measured at vector $$v$$, after letting the initial configuration of $$u$$ evolve according to $$S$$ for $$t$$ time steps. The author proves an inequality which roughly states $$m_{t+2} \geq t^{1 – \varepsilon} m_t^{1 + 2/t}$$, which can be used as a type of log-convexity statement. Surprisingly, this leads to lower bounds for the communication complexity of the $$k$$-Hamming problem, which in turns leads to optimal lower bounds for the complexity of testing $$k$$-linearity and $$k$$-juntas.

# News for July 2018

Three Four papers for July. New sublinear graph algorithms, distribution testing under new models, and sublinear matrix algorithms. Onward ho…
(Sorry Amit, for missing your paper on sublinear matrix algorithms.)

Metric Sublinear Algorithms via Linear Sampling, by Hossein Esfandiari and Michael Mitzenmacher (arXiv). Consider a weighted clique $$G = (V,E)$$ where $$V$$ is a set of points in a metric space and edge weights are metric distances. In this setting, sublinear algorithms are those that make $$o(n^2)$$ edge queries. This paper studies problems like densest subgraph and maxcut in this setting. The key method is a sparsifying algorithm that achieves the following. (I paraphrase their language.) Consider a positive parameter $$\alpha$$, and let $$w(e)$$ denote the weight of edge $$e$$. The aim is to get a subgraph $$H$$ that contains every edge $$e$$ (in $$G$$) with independent probability $$\min(w(e)/\alpha, 1)$$. Furthermore, this subgraph should be obtained in time linear in the number of edges in $$H$$ (hence the title of the paper). For problems like 1/2-approximating the densest subgraph and PTASes for maxcut, the results show that for a carefully chosen $$\alpha$$, approximate solutions on $$H$$ give solutions of comparable quality on $$G$$. These results cleanly generalize to settings where edge weights satisfy triangle inequality with some multiplicative penalty.

Sublinear Algorithms for ($$\Delta$$ + 1) Vertex Coloring, by Sepehr Assadi, Yu Chen, and Sanjeev Khanna (arXiv). Arguably, the first thing you learn about vertex coloring is that a graph with maximum degree $$\Delta$$ admits a $$(\Delta+1)$$-coloring, that can be found in linear time. But what about sublinear time/space? I like this! You take a simple classical fact, throw in sublinear constraints, and it opens up a rich theory. This paper shows a non-adaptive $$O(n^{3/2})$$-time algorithm for this problem, and gives a nearly matching lower bound. There are also results for streaming and parallel computation, but let’s focus on the sublinear result. It is remarkable that there is no loss in colors in going to sublinear time. (In contrast, the papers shows an $$\Omega(n^2)$$ lower bound for constructing a maximal matching.) The main technical tool is a list coloring result, where each vertex is given a list of colors and much choose its own from that list. Obviously, if each list is $$[\Delta + 1]$$, such a coloring is possible. The paper proves that even if each list is an independent $$O(\log n)$$-sized sample of $$[\Delta+1]$$, a valid coloring is still possible. The final algorithm is pretty involved, and uses this meta-algorithm as a building block.

Anaconda: A Non-Adaptive Conditional Sampling Algorithm for Distribution Testing, by Gautam Kamath and Christos Tzamos (ECCC). The standard model for distribution testing is access to samples from the unknown distribution $$\mathcal{D}$$ with support $$[n]$$. This has attracted much attention with a rich set of results, and the complexity classic problems of uniformity, identity, and equivalence are well understood. But there are alternate models, such as the model of conditional samples (Chakraborty-Fischer-Goldhirsh-Matsliah ’13 and Canonne-Ron-Servedio ’14). For any subset $$S \subseteq [n]$$, we can get a random sample from $$\mathcal{D}$$ restricted to $$S$$. This adds an algorithmic dimension to distribution testing. This paper studies the power of non-adaptive conditional (NACOND) queries. The main result is that uniformity, identity, and equivalence are testable with $$\mathrm{poly}(\log n)$$ queries. (There are existing $$\Omega(\log n)$$ lower bounds for all these problems.) The heart of these algorithms is a procedure ANACONDA that tries to find a set $$S$$ where some element has a high probability, relative to the mass of $$S$$.

Sublinear-Time Quadratic Minimization via Spectral Decomposition of Matrices, by Amit Levi and Yuichi Yoshida (arXiv). When it comes to fundamental problems, it’s hard to beat quadratic minimization. Given a matrix $$A \in \mathbb{R}^{n\times n}$$, we wish to find $$v \in \mathbb{R}^n$$ that minimizes $$v^TAv$$. (This is basically a singular value/vector problem.) One may have additional terms in the objective, depending on $$v^Tv$$ or $$v^Tb$$ (for fixed vector $$b$$). This paper gives sublinear algorithms for this problem. A natural approach is to simply subsample $$k$$ rows and columns to get submatrix $$B$$, solve the problem for $$B$$, and hope for the best. This idea has a rich history from seminal work of Frieze-Kannan. Recently, Hayashi-Yoshida show that constant $$k$$ (only depending on error parameters) suffice for getting a non-trivial approximation for this problem. Unfortunately, the solution error depends on the $$\ell_\infty$$-norm of the solution. This paper shows that for polylogarithmic $$k$$, one can get an error depending on the $$\ell_2$$-norm of the solution. This is a significant improvement, especially for sparse solution vectors. The main technical workhorse is a new matrix decomposition theorem, that shows how any matrix can be written as a sum of a few block matrix, and a low-norm “error” matrix. Admirably, the paper shows a number of experiments,
showing the effectiveness of this technique for eigenvalue computations. It’s very nice to see how ideas from sublinear algorithms might have a practical impact.

# News for June 2018

The summer gets off to a flying start, with three property testing papers, spanning differential privacy, distribution testing, and juntas in Gaussian space!

On closeness to $$k$$-wise uniformity, by Ryan O’Donnell and Yu Zhao (arXiv)
In this paper, the authors consider the following structural question about probability distributions over the Boolean hypercube $$\{-1,1\}^n$$: ” what is the relation between total variation distance $$\delta$$ to $$k$$-wise independence, and bound $$\varepsilon$$ on the Fourier coefficients of the distribution on degrees up to $$k$$?”

While this question might seem a bit esoteric at first glance, it has direct and natural applications to derandomization, and of course to distribution testing (namely, to test $$k$$-wise independence and its generalization, $$(\varepsilon, k)$$-wise independence of distributions over the hypercube).

The main contribution here is to improve (by a $$(\log n)^{O(k)}$$ factor) the bounds on $$\delta (n,k,\varepsilon)$$ over the previous work by Alon et al. [AAK+07], making them either tight (for $$k$$ even) or near-tight. To do so, the authors introduce a new hammer to the game, using linear programming duality in the proof of both their upper and lower bounds.

Property Testing for Differential Privacy, by Anna Gilbert and Audra McMillan (arXiv)
Differential privacy, as introduced by Dwork et al., needs no introduction. Property testing, especially on this website, needs even less. What about a combination of the two? Namely, given black-box access to an algorithm claiming to perform a differentially private computation, how to test whether this is indeed the case?

Introducing and considering this quite natural question for the first time [01/31/2019: see erratum below], this work shows, roughly speaking, that testing differential privacy is hard. Specifically, they show that for many notions of differential privacy (pure, approximate, and their distributional counterparts), testing is either impossible or possible but not with a sublinear number of queries (even when the tester is provided with side information about the black-box). In other terms, as the authors put it: trusting the privacy of an algorithm “requires compromise by either the verifier or algorithm owner” (and, in the latter case, even then it’s not a simple matter).

Is your data low-dimensional?, by Anindya De, Elchanan Mossel, and Joe Neeman (arXiv)
(Well, is it?) To state it upfront, I am biased here, as it is a problem I was very eager to see investigated to begin with. To recap, the question is as follows: “given query access to some unknown Boolean-valued function $$f\colon \mathbb{R}^n \to \{-1,1\}$$ over the high-dimensional space $$\mathbb{R}^n$$ endowed with the Gaussian measure, how can one check whether $$f$$ only depends on “few” (i.e., $$k \ll n$$) variables?”

This is the continuous, Gaussian version of the (quite famous) junta testing problem, which has gathered significant attention over the past years (the Gaussian version has, to the best of my knowledge, never been investigated). Now, the above formulation has a major flaw: specifically, it is uninteresting. In Gaussian space*, who cares about the particular basis I expressed my input vector in? So a more relevant question, and that that the authors tackle, is the more robust and natural one: “given query access to some unknown Boolean-valued function $$f\colon \mathbb{R}^n \to \{-1,1\}$$ over the high-dimensional space $$\mathbb{R}^n$$ endowed with the Gaussian measure, how can one check whether $$f$$ only depends on a low-dimensional linear combination of the variables?” Or, put differently, does all the relevant information for $$f$$ live in a low-dimensional subspace?

De, Mossel, and Neeman show how can do this, non-adaptively, with a query complexity independent of the dimension $$n$$ (hurray!), but instead polynomial in $$k$$, the distance parameter $$\varepsilon$$, and the surface area $$s$$ of $$f$$. And since this last parameter may seem quite arbitrary, they also proceed to show that a polynomial dependence in this $$s$$ is indeed required.

*”In Gaussian space, no one can hear you change basis?”

Erratum (01/31/2019): It was brought to our attention that our overview of “Property Testing for Differential Privacy” was overlooking a key part of the literature; specifically, a work of Dixit et al. (TCC 2013)  which introduces this very question. From the abstract:

How does one ensure that [those third-parties] have implemented their algorithms in a way which meet the specifications of the privacy requirements? […] In this work, we propose a new approach to the above problem which we call privacy testing. We do this by formulating the above problem in the well-studied framework of property testing.