# News for June 2024

A nice diverse month, this June 2024. We have a selection of papers on quantum property testing, shortest paths, polynomial threshold functions, and nearest neighbor search. And a nice survey on a closely related topic. Onward with the papers!

Invitation to Local Algorithms by Václav Rozhoň (arXiv). This is a great survey on local algorithms, a topic (roughly) about message passing distributed algorithms on graphs. While that is not a sublinear algorithm per se, the core combinatorial question has relevance to our readers. Informally, a “local problem” is one where, if a solution is incorrect, that can be determined by looking at a small radius around some vertex. For example, consider problems like maximal independent set, matching, etc. The problem is to construct such a solution in a distributed manner by only looking at low radius balls. There is a direct connection with Local Computation Algorithms (LCAs), with the primary difference being the complexity resource. In Local Algorithms as defined, one is interested in the number of rounds of computation, corresponding to the radius of balls. In LCAs, we care about the number of vertices seen, which is the size of the balls. The survey gives a detailed discussion of these connections.

Testably Learning Polynomial Threshold Functions by Lucas Slot, Stefan Tiegel, and Manuel Wiedmer (arXiv). Consider the classic model of agnostic learning with a distributional assumption. Our aim is to learn a function $$f: X \to \{-1,1\}$$. The learner gets labeled samples $$(x,f(x))$$, where $$x$$ is drawn from an unknown distribution $$\mathcal{D}$$. In agnostic learning, we aim to output a hypothesis $$\hat{f}$$ such that $$\|f – \hat{f}\|$$ is small, where distance is measured according to $$\mathcal{D}$$. Often one makes distributional assumptions (like $$\mathcal{D}$$ is Gaussian) to get non-trivial results. This is a severe limitation, since such distributional assumptions cannot be validated. A recent model of Rubinfeld-Vasilyan asks for “tester-learner” pairs that address this issue. We first run a “tester” to check the property (say, Gaussian) of the distribution. If the tester passes, then we run the learner. If $$\mathcal{D}$$ satisfies the distributional property, the tester is guaranteed to pass (whp). If the tester passes, then the learner is guaranteed to output a valid hypothesis. Thus, the tester is allowed to pass distributions that may be far from satisfying the property, but in such situations, the learner outputs a correct hypothesis. This paper gives such tester-learner pairs for learning polynomial threshold functions with polynomially many samples. Previously, such learning results were not known even for degree-2 PTFs with respect to the Gaussian distribution.

A Sublinear Algorithm for Approximate Shortest Paths in Large Networks by Sabyasachi Basu, Nadia Kōshima, Talya Eden, Omri Ben-Eliezer, C. Seshadhri (arXiv). This paper brings a mix of a classic problem, a real-world observation, and a new graph class. Finding shortest paths in graphs is about as classic as it gets. There is a plethora of applied work on shortest paths algorithms for real-world networks, much of which is based on landmarking. One precomputes distances between chosen landmarks, and then uses those to output shortest path distance queries. This paper asks: is it possible to get approximation shortest paths in sublinear time? Hence, we are not even allowed to preprocess the entire graph. This paper builds on previous work in social networks about “core-periphery” structure. The core is a denser, small region of the graph; the rest of the vertices form the periphery that connect to the core. The hypothesis is that shortest paths in real-world graphs typically go via the core. Given the (small) core, one can build small sublinear data structures that quickly answer shortest path queries. The paper uses insights from previous work on core-periphery structure, and gives a practical sublinear algorithm for computing approximate shortest paths. If the graph comes from a Chung-Lu distribution, then queries can be provably answered in $$n^{o(1)}$$ time.

Quantum Property Testing Algorithm for the Concatenation of Two Palindromes Language by Kamil Khadiev and Danil Serov (arXiv). An early result of property testing is that all regular languages can tested in constant time. But the simple (non-regular) language $$\mathcal{L}$$ of strings formed by concatenating two palindromes requires super-constant queries. Specifically, the complexity of property testing $$\mathcal{L}$$ is $$\Theta(\sqrt{n})$$ (ignoring log and $$\varepsilon$$ factors). Here, $$n$$ denotes the string size. This paper shows that there exist quantum property testing algorithms that beat this bound. There is a quantum property tester for the language $$\mathcal{L}$$ that makes $$\widetilde{O}(n^{1/3})$$ queries. The paper also shows that quantum query complexity of the exact decision problem is $$\Theta(\sqrt{n})$$. Thus, one gets non-trivial quantum speedup for both the property testing and exact problem.

A Bi-metric Framework for Fast Similarity Search by Haike Xu, Sandeep Silwal, and Piotr Indyk (arXiv). Nearest-neighbor (NN) search is one of the most important algorithmic tasks in modern data analysis. The usual setting is that we have a set of $$n$$ points over a metric, denoted by $$D$$. We preprocess the points and construct a data structure. Given a query point $$q$$, we wish to find the (approximate) closest points from our data set, in time sublinear in the data size. This paper introduces a bi-metric setting. Think of $$D$$ as expensive to compute, while there exists a cheap “proxy metric” $$d$$. For example, if the points represent images, the expensive metric $$D$$ could be some complex (but semantically accurate) metric, while $$d$$ may represent a simple Euclidean metric over an embedding. Formally, $$d$$ is a (large) constant factor approximation of $$D$$. This paper gives an algorithm that performs nearest neighbor search, but only preprocesses over $$d$$. Given, a query $$q$$, it makes sublinear queries to the expensive $$D$$. All in all, it is a truly sublinear algorithm in terms of $$D$$, and leverages the preprocessing over the cheap metric $$d$$. Technically, this paper gives a meta-algorithm that can be applied to any “base” NN algorithm. The paper proves results for two popular NN algorithms, and gives extensive empirical evidence for the underlying approach.