Linear Hashing

Noga Alon
Martin Dietzfelbinger
Peter Bro Miltersen
Erez Petrank
Gábor Tardos

June 1997


Consider the set tex2html_wrap_inline27 of all linear (or affine) transformations between two vector spaces over a finite field F. We study how good tex2html_wrap_inline27 is as a class of hash functions, namely we consider hashing a set S of size n into a range having the same cardinality n by a randomly chosen function from tex2html_wrap_inline27 and look at the expected size of the largest hash bucket. tex2html_wrap_inline27 is a universal class of hash functions for any finite field, but with respect to our measure different fields behave differently.

If the finite field F has n elements then there is a bad set tex2html_wrap_inline47 of size n with expected maximal bucket size tex2html_wrap_inline51. If n is a perfect square then there is even a bad set with largest bucket size always at least tex2html_wrap_inline55. (This is worst possible, since with respect to a universal class of hash functions every set of size n has expected largest bucket size below tex2html_wrap_inline59.)

If, however, we consider the field of two elements then we get much better bounds. The best previously known upper bound on the expected size of the largest bucket for this class was tex2html_wrap_inline61. We reduce this upper bound to tex2html_wrap_inline63. Note that this is not far from the guarantee for a random function. There, the average largest bucket would be tex2html_wrap_inline65.

In the course of our proof we develop a tool which may be of independent interest. Suppose we have a subset S of a vector space D over tex2html_wrap_inline71, and consider a random linear mapping of D to a smaller vector space R. If the cardinality of S is larger than tex2html_wrap_inline79 then with probability tex2html_wrap_inline81, the image of S will cover all elements in the range

Available as PostScript, PDF, DVI.


Last modified: 2003-06-08 by webmaster.