Author Archives: Clement Canonne

Looking back at WOLA’21: Videos available

The fifth Workshop on Local Algorithms (WOLA’21) took place earlier this month, and the recordings of the invited talks are now available on YouTube. If you missed the workshop, or want to refresh your memory, here are the recordings (ordered by the workshop schedule):

Thanks again to the speakers and organizers, and looking forward to WOLA’22!

Announcing WOLA’21 (Workshop on Local Algorithms)

The fifth WOLA (Workshop on Local Algorithms) will be virtual, and take place June 14-15. Registration is free, but required: please fill this form by June 10th to attend.

Local algorithms — that is, algorithms that compute and make decisions on parts of the output considering only a portion of the input — have been studied in a number of areas in theoretical computer science and mathematics. Some of the related areas include sublinear-time algorithms, distributed algorithms, streaming algorithms, (massively) parallel algorithms, inference in large networks, and graphical models. These communities have similar goals but a variety of approaches, techniques, and methods. This workshop is aimed at fostering dialogue and cross-pollination of ideas between the various communities.

This year, the workshop will include:

  • A poster session: Please submit your poster proposal (title and abstract) at by May 26th. Everyone is invited to contribute. This session will take place on gather.town.
  • Invited long talks: the tentative schedule is available, and features talks by James Aspnes, Jelani Nelson, Elaine Shi, Christian Sohler, Uri Stemmer, and Mary Wootters.
  • Junior-Senior social meetings
  • An AMA (Ask Me Anything) session, moderated by Merav Parter
  • A Slack channel
  • An Open Problems session

The Program Committee of WOLA 2021 is comprised of:

  • Venkatesan Guruswami (CMU)
  • Elchanan Mossel (MIT)
  • Merav Parter (Weizmann Institute of Science)
  • Sofya Raskhodnikova (chair) (Boston University)
  • Gregory Valiant (Stanford)

and the organizing committee:

  • Sebastian Brandt (ETH)
  • Yannic Maus (Technion)
  • Slobodan Mitrović (MIT)

For more detail, see the website; to stay up to date with the latest announcements concerning WOLA, join our mailing list!

News for April 2021

A somewhat “sublinear” month of April, as far as property testing is concerned, with only one paper. (We may have missed some; if so, please let us know in the comments!)

Graph Streaming Lower Bounds for Parameter Estimation and Property Testing via a Streaming XOR Lemma, by Sepehr Assadi and Vishvajeet N (arXiv). This paper establishes space vs. pass trade-offs lower bounds for streaming algorithms, for a variety of graph tasks: that is, of the sort “any \(m\)-pass-streaming algorithm for task \(\mathcal{T}\) must use memory at least \(f(m)\).” The tasks considered include graph property estimation (size of the maximum matching, of the max cut, of the weight of the MST) and property testing for sparse graphs (connectivity, bipartiteness, and cycle-freeness). The authors obtained exponentially improved lower bounds for those, via reductions to a relatively standard problem, (noisy) gap cycle counting, for which they establish their main lower bound. As a key component of their proof, they prove a general direct product result (XOR lemma) for the streaming setting, showing that the advantage for solving the XOR of \(\ell\) copies of a streaming predicate \(f\) decreases exponentially with \(\ell\).

News for January 2021

The first month of 2021 has brought with it 5 papers, covering graph testing, Boolean function testing, and distribution testing — as well as database theory. Let’s dive into it.

Random walks and forbidden minors III: \(\mathrm{poly}(d/\varepsilon)\)-time partition oracles for minor-free graph classes, by Akash Kumar, C. Seshadhri, and Andrew Stolman (arXiv). Minor-closed bounded-degree graphs have a very nice property: denoting by \(d\) the degree bound and \(n\) the number of edges, it is always possible to partition any such graph into components of constant size, \(O_\varepsilon(1)\), just by removing a linear number of edges, merely \(\varepsilon dn\) (\(\varepsilon\) being a small parameter). This is a crucial result in many graph property testing algorithms, those which rely on something called a “partition oracle”: loosely speaking, a routine which makes “few” queries to the graph, and is able to indicate which component of an underlying partition any given vertex belongs to. But what is “few” here? The first partition oracles made \(d^{\mathrm{poly}(d,1/\varepsilon)}\) queries to the graph to answer any such request. This later got significantly improved to \(d^{\mathrm{log}(d/\varepsilon)}\). Using spectral graph theory techniques previously developed by the authors (hence the “III” in the title), this work settles the question, achieving partition oracles which as good as it gets: making only \(\mathrm{poly}(d,1/\varepsilon)\) queries to the graph! This in turns has immediate consequences for graph property testing, which the paper details.

And since we are on the topic of oracles… more exciting news on that front:

Spectral Clustering Oracles in Sublinear Time, by Grzegorz Gluch, Michael Kapralov, Silvio Lattanzi, Aida Mousavifar, Christian Sohler (arXiv). Given a graph \(G\) and an integer \(k\), one often want to partition the graph into components \(C_1,C_2,\dots,C_k\), such that each \(C_i\) is well-connected and has few edges to the other \(C_j\)’s. But can we get ultra efficient algorithms for such spectral clusterings? Specifically, can we design oracles for them: sublinear-time algorithms which provide implicit access to an underlying “good” spectral clustering \(\hat{C}_1,\hat{C}_2,\dots,\hat{C}_k\), by returning on any query vertex \(v\) the index of the cluster \(\hat{C}_i\) to which \(v\) belongs? This paper introduces the question, and answers it in the affirmative: in more detail, it provides a spectral clustering oracle which, for any \(\varepsilon>0\), has preprocessing time \(2^{\mathrm{poly}(k/\varepsilon)}n^{1/2+O(\varepsilon)}\), query time \(n^{1/2+O(\varepsilon)}\), and space \(n^{1/2+O(\varepsilon)}\); and provides access to a clustering with relative error \(O(\varepsilon\log k)\) per cluster. The paper also allows tradeoffs between query time and space, and discusses applications to the Local Computation Algorithms (LCA) model.

Next stop, distribution testing…

The Sample Complexity of Robust Covariance Testing, by Ilias Diakonikolas and Daniel M. Kane (arXiv). Suppose you have i.i.d. samples from some high-dimensional Gaussian \(\mathcal{N}(0,\Sigma)\) in \(d\) dimensions, and want to test whether the unknown covariance matrix \(\Sigma\) is the identity, versus \(\varepsilon\)-far from it (in Frobenius norm). Good news: we know how to do that, and \(\Theta(d/\varepsilon^2)\) samples are necessary and sufficient. (To learn \(\Sigma\), that’d be \(\Theta(d^2/\varepsilon^2)\).) Bad news: you don’t have i.i.d. samples from some high-dimensional Gaussian \(\mathcal{N}(0,\Sigma)\); what you have is i.i.d. samples from a noisy version of it, \((1-\alpha)\mathcal{N}(0,\Sigma) + \alpha B\), where \(B\) is an arbitrary “bad” distribution (not necessarily Gaussian itself). You still want to test whether the covariance \(\Sigma\) is the identity, but now you have that extra \(\alpha\) fraction of noisy samples, and you need to do that testing robustly… The good news is, you can still do that by learning the covariance matrix \(\Sigma\), robustly, with \(O(d^2/\varepsilon^2)\) samples. The bad news is the main result of this paper: that’s also the best you can do. That is, \(\Omega(d^2)\) samples are necessary: if you have to be robust to noise, testing is no longer easier than learning….

Onto Boolean functions!

Junta Distance Approximation with Sub-Exponential Queries, by Vishnu Iyer, Avishay Tal, and Michael Whitmeyer (ECCC). If you follow this blog, you may have seen over the past couple years a flurry of results about tolerant junta testing: “given query access to some Boolean function \(f\colon\{-1,1\}^n\to\{-1,1\}\), how close is \(f\) to only depending on \(k \ll n\) of its variables?”
This paper contains several results on this problem, including an improved bicriteria tolerant testing algorithm: an efficient algorithm to distinguish between \(\varepsilon\)-close to \(k\)-junta and \(1.1\varepsilon\)-far from \(k’\)-junta making \(\mathrm{poly}(k,1/\varepsilon)\) queries (and \(k’ = O(k/\varepsilon^2)\)). But the main result of the paper, and the one giving it its name, is for the non-relaxed version where \(k’=k\): while all previous works had a query complexity \(2^{O(k)}\), here the authors show how to break that exponential barrier, giving a fully tolerant testing algorithm with query complexity \(2^{\tilde{O}(\sqrt{k})}\)!

And finally, a foray into database theory:

Towards Approximate Query Enumeration with Sublinear Preprocessing Time, by Isolde Adler and Polly Fahey (arXiv). In this paper, the authors are concerned with the task of (approximate) query enumeration on databases, aiming for ultra efficient (i.e., sublinear-time) algorithms. Leveraging techniques from property testing (specifically, in the bounded-degree graph model), they show the following:
On input databases of bounded degree and bounded tree-width, every (fixed) first-order definable query can be enumerated approximately in time linear in the output size, after only a sublinear-time preprocessing phase.

That’s all for this month! If you noticed a paper we missed, please let us know in the comments.

News for October 2020

Sorry for the delay in writing this monthly digest: we hope you didn’t spend the week frantically refreshing your browsers! We found four papers this month: let’s dive in.

A Structural Theorem for Local Algorithms with Applications to Coding, Testing, and Privacy, by Marcel Dall’Agnol, Tom Gur, and Oded Lachish (ECCC). This paper introduces the notion of “robust (local) algorithm,” an abstraction which encompasses may types of algorithms: e.g., property testing algorithms, locally decodable codes, etc. The main result of this work is that any (possibly adaptive) \(q\)-query robust local algorithm can be transformed into a non-adaptive, sample-based one making \(n^{1-1/(q^2\log q)}\) queries (where \(n\) is the input size). Here, “sample-based” means that the algorithm doesn’t get to make arbitrary queries, but just gets to observe randomly chosen coordinates of the input. As application of this result, the authors derive new upper and lower bounds for several of the types of local algorithms mentioned above, resolving open questions from previous works.

Testing Tail Weight of a Distribution Via Hazard Rate, by Maryam Aliakbarpour, Amartya Shankha Biswas, Kavya Ravichandran, Ronitt Rubinfeld (arXiv). The authors consider, from the point of view of distribution testing, the question of deciding whether data is heavy-tailed: as would, for instance, data following a power law. The paper first sets out to formalize the question, and discusses various possible definitional choices before setting on one; after which it provides a test, and analyzes its sample complexity as a function of various parameters (such as smoothness of the unknown distribution). The authors finally back these results with empirical evaluation of their algorithm.

On Testing of Samplers, by Kuldeep S. Meel, Yash Pote, Sourav Chakraborty (arXiv). Suppose you are given a (known) distribution \(p\) over some domain \(\Omega\), and want to sample from it conditioned on some predicate \(\varphi\). Now someone comes to you with an algorithm which does exactly that, efficiently, and cheaply: great! But can you easily check that you’re not getting fooled, and that this sampler actually does what it claims? This paper provides this: an algorithm which accepts if the sampled distribution is \(\varepsilon\)-close to what it should (roughly, in a multiplicative, KL divergence sense), and rejects if it’s \(\varepsilon’\)-far (in total variation distance). The number of samples required is polynomial in \(\varepsilon’-\varepsilon\), and depends on some characteristic of \(p\) and \(\varphi\), the “tilt” (ratio between max and min probability of the conditional distribution).

Finally, an omission from late September:
Sample optimal Quantum identity testing via Pauli Measurements, by Nengkun Yu (arXiv). The abstract is concise and clear enough to speak for itself: “In this paper, we show that \(\Theta(\textrm{poly}(n)\cdot\frac{4^n}{\varepsilon^2})\) is the sample complexity of testing whether two \(n\)-qubit quantum states \(\rho\) and \(\sigma\) are identical or \(\varepsilon\)-far in trace distance using two-outcome Pauli measurements.”

Please let us know if you spotted a paper we missed!

Videos from the WoLA 2020 workshop

The 4th Workshop on Local Algorithms (WoLA 2020) recently concluded: aimed at “fostering dialogue and cross-pollination of ideas between the various communities” related to local algorithms, broadly construed, it featured invited and contributed talks on a variety of topics, many (if not most) very relevant to the sublinear algorithms and property testing community.

Because of the online format of the workshop (imposed by the current circumstances), all talks were recorded and posted online. As such, all videos all available on the workshop’s website and YouTube channel: a good list of resources to peruse!

News for July 2020

We hope you’re all staying safe and healthy! To bring you some news (and distraction?) during this… atypical summer,here are the recent papers on property testing and sublinear algorithms we saw appear this month. Graphs, probability distributions, functions… there is a something for everyone.

On Testing Hamiltonicity in the Bounded Degree Graph Model, by Oded Goldreich (ECCC). The title sort of gives it away: this relatively short paper shows that testing whether an unknown bounded-degree graph has a Hamiltonian path (or Hamiltonian cycle) in the bounded-degree model requires a number of queries linear in \(n\), the number of nodes. The results also hold for directed graphs (with respect to directed Hamiltonian path or cycle), and are shown via a local reduction to a promise problem of satisfiability of 3CNF formulae. Also included: a complete proof of the linear lower bound for another problem, Independent Set Size; and an open problem: what is the query complexity of testing graph isomorphism in the bounded-degree model?

Local Access to Sparse Connected Subgraphs Via Edge Sampling, by Rogers Epstein (arXiv). Given access to a connected graph \(G=(V,E)\), can we efficiently provide access to some sparse connected subgraph \(G’=(V,E’)\subseteq G\) with \(|E’|\ll |E|\)? This question, well-studied in particular for the case where \(G\) had bounded degree and the goal is to achieve \(|E’|\leq (1-\varepsilon)|V|\), is the focus of this paper which provides a trade-off between the query complexity of the oracle and \(|E’|\). Specifically, for every parameter \(T\), one can give oracle access to \(G’\) with \(|E’|=O(|V|T)\), with a query complexity \(=\tilde{O}(|E|/T)\).

Switching gears, we move from graphs to probability distributions:

Tolerant Distribution Testing in the Conditional Sampling Model, by Shyam Narayanan (arXiv). In the conditional sampling model for distribution testing, which we have covered a few times on this blog, the algorithm at each step gets to specify a subset \(S\) of the domain, and observe a sample from the distribution conditioned on \(S\). As it turns out, this can speed things up a lot: as Canonne, Ron, and Servedio (2015) showed, even tolerant uniformity testing, which with i.i.d. samples requires a near-linear (in the domain size \(n\)) number of samples, can be done in a constant number of conditional queries. Well, sort of constant: no dependence on \(n\), but the dependence on the distance parameter \(\varepsilon\) was, in CRS15, quite bad: \(\tilde{O}(1/\varepsilon^{20})\). This work gets rid of this badness, and shows the (nearly) optimal \(\tilde{O}(1/\varepsilon^{2})\) query complexity! Among other results, it also generalizes it to tolerant identity testing (\(\tilde{O}(1/\varepsilon^{4})\)), for which previously no constant-query upper bound was known. Things have become truly sublinear.

Interactive Inference under Information Constraints, by Jayadev Acharya, Clément Canonne, Yuhan Liu, Ziteng Sun, and Himanshu Tyagi (arXiv). Say you want to do uniformity/identity testing (or learn, but let’s focus on testing) on a discrete distribution, but you can’t actually observe the i.i.d. samples: instead, you can only do some sort of limited, “local” measurement on each sample. How hard is the task, compared to what you’d do if you fully had the samples? This setting, which captures things like distributed testing with communication or local privacy constraints, erasure channels, etc., was well-understood from previous recent work in the non-adaptive setting. But what if the “measurements” could be made adaptively? This paper shows general lower bounds for identity testing and learning, as a function of the type of local measurement allowed: as a corollary, this gives tight bounds for communication constraints and local privacy, and shows the first separation between adaptive and non-adaptive uniformity testing, for a type of “leaky” membership query measurement.

Efficient Parameter Estimation of Truncated Boolean Product Distributions, by Dimitris Fotakis, Alkis Kalavasis, and Christos Tzamos (arXiv). Suppose there is a fixed and unknown subset \(S\) of the hypercube, a “truncation” set, which you can only accessible via membership query; and you receive i.i.d. samples from an unknown product distribution on the hypercube, truncated on that set \(S\) (for instance, because your polling strategy or experimental measurements have limitations). Can you still learn that distribution efficiently? Can you test it for various properties, as you typically really would like to? (or is it just me?) This paper identifies some natural sufficient condition on \(S\), which they call fatness, under which the answer is a resounding yes. Specifically, if \(S\) satisfies this condition, one can actually generate honest-to-goodness i.i.d. samples (non-truncated) from the true distribution, given truncated samples!

Leaving distribution testing, our last paper is on testing functions in the distribution-free model:

Downsampling for Testing and Learning in Product Distributions, by Nathaniel Harms and Yuichi Yoshida (arXiv). Suppose you want to test (or learn) a class of Boolean functions \(\mathcal{C}\) over some domain \(\Omega^n\), with respect to some (unknown) product distribution (i.e., in the distribution-free testing model, or PAC-learning model). This paper develops a general technique, downsampling, which allows one to reduce such distribution-free testing of \(\mathcal{C}\) under a product distribution to testing \(\mathcal{C}\) over \([r]^d\) under the uniform distribution, for a suitable parameter \(r=r(d,\varepsilon,\mathcal{C})\). This allows the authors, among many other things and learning results, to easily re-establish (and, in the second case, improve upon) recent results on testing of monotonicity over \([n]^d\) (uniform distribution) and over \(\mathbb{R}^d\) (distribution-free).

News for April 2020

April is now behind us, and we hope you and your families are all staying safe and healthy. We saw six seven property papers appear online last month, so at least there is some reading ahead of us! A mixture of privacy, quantum, high-dimensional distributions, and juntas (juntæ?). A lot of distribution testing, overall.

Connecting Robust Shuffle Privacy and Pan-Privacy, by Victor Balcer, Albert Cheu, Matthew Joseph, and Jieming Mao (arXiv). This paper considers a recent notion of differential privacy called shuffle privacy, where users have sensitive data, a central untrusted server wants to do something with that data (for instance, say… testing its distribution), and a trusted middle-man/entity shuffles the users’ messages u.a.r. to bring in a bit more anonymity. As it turns out, testing uniformity (or identity) of distributions in the shuffle privacy model is (i) much harder than without privacy constraints; (ii) much harder than with ‘usual’ (weaker) differential privacy (iii) much easier than with local privacy; (iv) related to the sample complexity under another privacy notion, pan-privacy. It’s a brand exciting new world out there!

(Note: for the reader interested in keeping track of identity/uniformity testing of probability distributions under various privacy models, I wrote a very short summary of the current results here.)

Entanglement is Necessary for Optimal Quantum Property Testing, by Sebastien Bubeck, Sitan Chen, and Jerry Li (arXiv). The analogue of uniformity testing, in the quantum world, is testing whether a quantum state is equal (or far from) the maximally mixed state. It’s known that this task has “quantum sample complexity” (number of measurements) \(\Theta(d/\varepsilon^2)\) (i.e., square root dependence on the dimension of the state, \(d^2\)). But this requires entangled measurements, which may be tricky to get (or, in my case, understand): what happens if the measurements can be adaptive, but not entangled? In this work, the authors show that, under this weaker access model \(\Omega(d^{4/3}/\varepsilon^2)\) measurements are necessary: adaptivity alone won’t cut it. It may still help though: without either entanglement nor adaptivity, the authors also show a \(\Omega(d^{3/2}/\varepsilon^2)\) measurements lower bound.

Testing Data Binnings, by Clément Canonne and Karl Wimmer (ECCC). More identity testing! Not private and not quantum for this one, but… not quite identity testing either. To paraphrase the abstract: this paper introduces (and gives near matching bounds for) the related question of identity up to binning, where the reference distribution \(q\) is over \(k \ll n\) elements: the question is then whether there exists a suitable binning of the domain \([n]\) into \(k\) intervals such that, once binned, \(p\) is equal to \(q\).”

Hardness of Identity Testing for Restricted Boltzmann Machines and Potts models, by Antonio Blanca, Zongchen Chen, Daniel Štefankovič, and Eric Vigoda (arXiv). Back to identity testing of distributions, but for high-dimensional structured ones this one. Specifically, this paper focuses on the undirected graphical models known as restricted Boltzmann machines, and provides efficient algorithms for identity testing and conditional hardness lower bounds depending on the type of correlations allowed in the graphical models.

Robust testing of low-dimensional functions, by Anindya De, Elchanan Mossel, and Joe Neeman (arXiv). Junta testing is a classical, central problem in property testing, with motivations and applications in machine learning and complexity. The related (and equally well-motivated) question of junta testing of functions on \(\mathbb{R}^d\) (instead of the Boolean hypercube) was recently studied by the same authors; and the related (and, again, equally well-motivated) question of tolerant junta testing on the Boolean hypercube was also recently studied (among other works) by the same authors. Well, this paper does it all, and tackles the challenging (and, for a change, equally well-motivated!) question of tolerant testing of juntas on \(\mathbb{R}^d\).

Differentially Private Assouad, Fano, and Le Cam, by Jayadev Acharya, Ziteng Sun, and Huanyu Zhang (arXiv). Back to probability distributions and privacy. This paper provides differentially private analogues of the classical eponymous statistical inference results (Assouad’s lemma, Fano’s inequality, and Le Cam’s method). In particular, it gives ready-to-use, blackbox tools to prove testing and learning lower bounds for distributions in the differentially private setting, and shows how to use them to easily derive, and rederive, several lower bounds.

Edit: We missed one!

Learning and Testing Junta Distributions with Subcube Conditioning, by Xi Chen, Rajesh Jayaram, Amit Levi, Erik Waingarten (arXiv). This paper focuses on the subcube conditioning model of (high-dimensional) distribution testing, where the algorithm can fix some variables to values of its choosing and get samples conditioned on those variables. Extending and refining techniques from a previous work by a (sub+super)set of the authors, the paper shows how to optimally learn and test junta distributions in this framework—with exponential savings with respect to the usual i.i.d. sampling model.

News for January 2020

The first month of 2020 is behind us, and it’s already looking good! Four papers last month, spanning quantum, graphs, and probability distributions.

On Efficient Distance Approximation for Graph Properties, by Nimrod Fiat and Dana Ron (arXiv). In the dense graph (i.e., adjacency-matrix) model, one is given a distance parameter \(\varepsilon\) and granted query access to the adjacency matrix of a graph \(G\), and seeks to determine something about the distance of \(G\) to a prespecified property \(\mathcal{P}\) of interest (i.e., the fraction of the matrix that needs to be changed for \(G\) to satisfy the property). Testing requires to distinguish whether that distance is zero, or at least \(\varepsilon\); many results over the past years have shown that many properties can be tested with a number of queries depending only on \(\varepsilon\) (and not on \(n=|G|\). This work focuses on the harder problem of distance estimation, or, equivalently, tolerant testing: that is, estimate this distance up to \(\pm\varepsilon\). The authors introduce a general framework to get distance approximation algorithms from a “complicated” property by decomposing it into simpler properties and using algorithms for those. Applying this framework to a few flagship properties, they then show that \(P_3\)-freeness, induced \(P_4\)-freeness, and cordiality are properties which have efficient distance estimation algorithms (independent of \(n\), and polynomial in \(1/\varepsilon\)).

Minimax Optimal Conditional Independence Testing, by Matey Neykov, Sivaraman Balakrishnan, and Larry Wasserman (arXiv). Given i.i.d. draws from a triple \((X,Y,Z)\), how hard is it to check whether \(X \perp Y \mid Z\), that is, “\(X\) and \(Y\) are independent conditioned on \(Z\) (or far from it)?” This is the problem on conditional independence testing, which was covered back in the days for the case where \(X,Y,Z\) are discrete. Well, this new work takes the fight out of the discrete world: extending the results and techniques from the discrete case, it provides optimal bound for the continuous case: where \(Z\) is on \([0,1]\), and then when all three \(X,Y, Z\) are continuous.

How symmetric is too symmetric for large quantum speedups?, by Shalev Ben-David and Supartha Podder (arXiv); and Can graph properties have exponential quantum speedup?, by Andrew M. Childs and Daochen Wang (arXiv). Both these works independently study the relation between the (bounded-error) randomized and quantum query complexities of any graph property \(f\), in the dense graph (adjacency-matrix) model. In particular, how much advantage do quantum algorithms provide for those?
As it turns out, not so much: for those functions, both papers show the two quantities are always polynomially related (\(R(f) \leq Q(f) \leq O(R(f)^6))\)) in the dense-graph model. As a corollary, this implies that testing any such property won’t benefit much from quantum (that is, at most polynomially)…. at least in this model. In the adjacency list model (also known as the bounded-degree graph model), the first paper conjectures that exponential query complexity improvements are possible; and the second paper provides an example, establishing it. Altogether, this fully settles an open problem of Ambainis, Childs, and Liu, and Montanaro and de Wolf.

News for October 2019

Last month was a lean month, with only three papers: one on direct product testing, one on finding forbidden patterns in a sequence, and one (an update of a paper which we had missed in the Spring) on quantum distribution testing.

Direct sum testing – the general case, by Irit Dinur and Konstantin Golubev (ECCC). Say a function \(f\colon \prod_{i=1}^d [n_i] \to \mathbb{F}_2\) is a direct product if it can be factored as \(f(x_1,\dots,x_d)=\sum_{i=1}^d f_i(x_i)\), where \(f_i\colon [n_i]\to\mathbb{F}_2\).
This paper provides a 4-query tester (i.e., a proximity oblivious tester (POT)) for the direct product property, reminiscent of (and relying on) the BLR linearity test: specifically, draw two subsets \(S,T\subseteq [d]\) and two inputs \(x,y\in \prod_{i=1}^d [n_i]\) u.a.r., and accept iff
\(f(x)+f(x_Sy)+f(x_Ty)+f(x_{S\Delta T}y) = 0\,.\)
The main theorem of the paper is to show that the probability that this simple test rejects is lower bounded (up to a constant factor) by the distance of \(f\) to direct-product-ness. (The authors also provide a different POT making \(d+1\) queries, but with a simpler analysis.)

Finding monotone patterns in sublinear time, by Omri Ben-Eliezer, Clément Canonne, Shoham Letzter, and Erik Waingarten (ECCC). Given a function \(f\colon [n]\to\mathbb{R}\), a monotone subsequence of size \(k\) is a \(k\)-tuple of indices \(i_1 < \dots <i_k\) such that \(f(i_j) < f(i_{j+1})\) for all \(j\). This work considers (non-adaptive) one-sided testing of monotone-subsequence-freeness, or, equivalently, the task of finding such a monotone subsequence in a function promised to contain many of them. (This, in particular, generalizes the problem of one-sided monotonicity testing, which is the case \(k=2\).) The main result is a full characterization of the query complexity of this question (for constant \(k\)): strange as the exponent may seem, \(\Theta_\varepsilon( (\log n)^{\lfloor \log_2 k\rfloor} )\) queries are necessary and sufficient. The proof relies on a structural dichotomy result, stating that any far-from-free sequence either contains “easy to find” increasing subsequences with increasing gaps between the elements, or has a specific hierarchical structure.

Quantum Closeness Testing: A Streaming Algorithm and Applications, by Nengkun Yu (arXiv). This paper is concerned with quantum distribution testing in the local model, which only allows a very restricted (albeit, as the author argues, more natural and easier to implement) type of measurements, and is particularly well-suited to a streaming setting. The main contribution of this paper is to show a connection to classical distribution testing, allowing one to obtain quantum distribution testing upper bounds from their classical distribution testing counterparts. In more detail, the paper shows that, from local measurements to two \(d\)-dimensional quantum states \(\rho,\sigma\), one can provide access to two classical distributions \(p,q\) on \(\approx d^2\) elements such that (i) \(\| p-q\|_2 \approx \|\rho-\sigma\|_2/d\) and (ii) \(\| p\|_2,\| q\|_2 = O(1/d)\).
Using this connection, the paper proceeds to establish a variety of upper bounds for testing several distribution properties in the local quantum model.