Linear Probing Revisited: Tombstones Mark the Demise of Primary Clustering

William Kuszmaul (MIT Mathematics)

28-Oct-2021, 21:30-23:00 (2 years ago)

Abstract: The linear-probing hash table is one of the oldest and most widely used data structures in computer science. However, linear probing also famously comes with a major drawback: as soon as the hash table reaches a high memory utilization, elements within the hash table begin to cluster together, causing insertions to become slow. This phenomenon, now known as "primary clustering", was first captured by Donald Knuth in 1963; at a load factor of $1 - 1/x$, the expected time per insertion becomes $\Theta(x^2)$, rather than the more desirable $\Theta(x)$.

We show that there is more to the story than the classic analysis would seem to suggest. It turns out that small design decisions in how deletions are implemented have dramatic effects on the asymptotic performance of insertions. If these design decisions are made correctly, then even a hash table that is continuously at a load factor $1 - \Theta(1/x)$ can achieve average insertion time $\tilde{O}(x)$. A key insight is that the tombstones left behind by deletions cause a surprisingly strong "anti-clustering" effect, and that when insertions and deletions are one-for-one, the anti-clustering effects of deletions actually overpower the clustering effects of insertions.

Based on joint work with Michael A. Bender and Bradley C. Kuszmaul. arxiv.org/abs/2107.01250 To appear in FOCS 2021.

Computer scienceMathematicsPhysics

Audience: researchers in the topic


MIT Simple Person's Applied Mathematics Seminar

Organizers: André Lee Dixon*, Ranjan Anantharaman, Aaron Berger
*contact for this listing

Export talk to