# Workshop on Algorithms for Large Data: We found WALD(O), and so can you!

Ainesh Bakshi, Rajesh Jayaram, and Samson Zhou are organizing a 3-day Workshop on Algorithms for Large Data (nicely abbreviated as WALD(O), the O standing for Online), featuring many talks which should be of interest to the readers of this blog, as well as an open problems and a poster sessions, and a junior/senior lunch. As the organizers describe it:

This workshop aims to foster collaborations between researchers across multiple disciplines through a set of central questions and techniques for algorithm design for large data. We will focus on topics such as sublinear algorithms, randomized numerical linear algebra, streaming and sketching, and learning and testing.

The workshop will take place on August 23 — August 25 (ET). Attendance is free, but registration is required by August 20th. More details at https://waldo2021.github.io/

# New for July 2021

This month saw three papers appear online, together covering a rather broad range of topics: testing of regular languages, distribution testing under differential privacy, and local testability from high-dimensional expanders. Let’s dive in!

Property Testing of Regular Languages with Applications to Streaming Property Testing of Visibly Pushdown Languages, by Gabriel Bathie and Tatiana Starikovskaya (paper). Let $$L\in \Sigma^\ast$$ be a regular language recognized by an automation with $$m$$ states and $$k$$ connected components: given as input a word $$u\in \Sigma^n$$, what is the query complexity to test membership to $$L$$ in Hamming distance? Edit distance? Or, more generally, weighted edit distance, where each letter of the word $$u$$ comes with a weight? In this paper, the authors focus on non-adaptive, one-sided errors testing algorithms, for which they show an upper bound of $$q=O(k m \log(m/\varepsilon)/\varepsilon)$$ queries (with running time $$O(m^2 q)$$), which they complement by a query complexity lower bound of $$\Omega(\log(1/\varepsilon)/\epsilon)$$, thus matching the upper bound for languages recognized by constant-size automata. The guarantee for the upper bound is with respected to weighted edit distance, and thus implies the same upper bound for testing with respect to Hamming distance.
To conclude, the authors use an existing connection to streaming property testing to obtain new algorithms for property testing of visibly pushdown languages (VPL) in the streaming model, along with a new lower bound in that model.

High dimensional expansion implies amplified local testability, by Tali Kaufman and Izhar Oppenheim (arXiv). This paper sets out to show that codes that arise from high-dimensional expanders are locally testable (membership to the code can be tested using very few queries). To do so, the authors define a new notion of high-dimensional expanding system (HDE system), as well as that of amplified local testability, a stronger notion than local testability; they then prove that a code based on a HDE system satisfies this stronger notion. Moreover, they show that many well-known families of codes are, in fact, HDE system codes, and therefore satisfy this stronger notion of local testability as well.

Finally, a survey on differential privacy, with a foray into distribution testing:

Differential Privacy in the Shuffle Model: A Survey of Separations, by Albert Cheu (arXiv). If you are familiar with differential privacy (DP), you may recall that there are several notions of DP, each meant to address a different “threat model” (depending on whom you trust with your data). Shuffle DP is one of them, intermediate between “central” DP and the more stringent “local” DP. Long story short: with shuffle DP, the tradeoff between privacy and accuracy can be strictly in-between what’s achievable in central and local DP, and that’s the case for one of the usual suspects of distribution testing, uniformity testing (“I want to test if the data uniformly distributed, but now, with privacy of that data in mind”). The survey discusses what is known about this in Sections 3.3 and 6, and what the implications are; but there are quite a few questions left unanswered… Long story short: a very good introduction to shuffle privacy, and to open problems in that area!