Hashing, Randomness and Dictionaries

Rasmus Pagh

October 2002


This thesis is centered around one of the most basic information retrieval problems, namely that of storing and accessing the elements of a set. Each element in the set has some associated information that is returned along with it. The problem is referred to as the dictionary problem, due to the similarity to a bookshelf dictionary, which contains a set of words and has an explanation associated with each word. In the static version of the problem the set is fixed, whereas in the dynamic version, insertions and deletions of elements are possible.

The approach taken is that of the theoretical algorithms community. We work (almost) exclusively with a model, a mathematical object that is meant to capture essential aspects of a real computer. The main model considered here (and in most of the literature on dictionaries) is a unit cost RAM with a word size that allows a set element to be stored in one word.

We consider several variants of the dictionary problem, as well as some related problems. The problems are studied mainly from an upper bound perspective, i.e., we try to come up with algorithms that are as efficient as possible with respect to various computing resources, mainly computation time and memory space. To some extent we also consider lower bounds, i.e., we attempt to show limitations on how efficient algorithms are possible. A central theme in the thesis is randomness. Randomized algorithms play an important role, in particular through the key technique of hashing. Additionally, probabilistic methods are used in several proofs.

Available as PostScript, PDF.


Last modified: 2004-03-29 by webmaster.