|  |            <<< Note 1120.0 by TRACE::GILBERT "Ownership Obligates" >>>
                         -< Rating a Hashing function >-
>> Is there a formula for rating a hashing function?  Hashing a set of values
>> should give a set of numbers uniformly distributed in the range 0..N-1.
>> Given the distribution, the number of numbers that fall into each of these
>> N buckets, how may the uniformity of the distribution be rated?
    
    The performance of any hashing function is crucially dependent on the
    distribution of the input numbers; if the distribution is such that you
    get plenty of collisions with your given hashing algorithm, you will
    get very bad performance even though the table may be nowhere near
    full.
    
    Having stated the standard caveat, I would suggest that you do:
    
    1.	Get a representative sample of the input distribution.
    
    2.	Feed the representative sample through the hash function and note
    	the number of elements that fall in each bucket.
    
    3.	See if the number of collisions is excessive (it's up to you to
    	decide what "excessive" means).  If the number of collisions is
    	excessive, get a new algorithm.
    
    4.	Run a chi-square test against a uniform distribution.  (There are
    	some tests that don't depend on an underlying distribution, but �/
    	these tests tend to be less powerful than the "standard" ones (because
    	they make fewer assumptions) and �/ the standard tests tend to give
    	fairly good results even when some of their underlying assumptions are
    	violated.)
    
 |