|  |        <<< Note 1210.0 by TIXEL::ARNOLD "Real men don't set for stun" >>>
               -< Median, Mean, & Standard Deviation questions >-
    >> Suppose you have 'n' data points; for example:
    >>    5   6   1   3   14   4   11   3   8   3   5
    >> There are a variable number ('n') of these data points.  Figuring the
    >> AVERAGE is easy.  (I haven't been away from it *that* long!)  But is
    >> there a mathematical formula to figure the MEDIAN, the MEAN, and the
    >> STANDARD DEVIATION of these data points?  That's where I get lost.
    
    	The median requires you to sort the data points (in ascending or
    descending order, whichever you fancy) and pick the middle one; that is
    the median.  If n is even, there is no middle one, so one normally
    takes the average of the two middle ones and call that average the
    median.
    
    	The mean is, if memory serves, the same as the average.
    
    	The standard deviation you calculate (*in double precision*, if n is
    large) as
    
    		SQRT((SUM(X(i)*X(i)) - SUM(X(i))*SUM(X(i))/N)/(N-1))
    
    (this formula only requires one pass over the data, since you can
    calculate SUM(X(i)) and SUM(X(i)*X(i)) into two different scalars).
    
    	The reason you need double precision for large N is that
    SUM(X(i)*X(i)) frequently is very close to SUM(X(i))*SUM(X(i))/N.
    
    	WARNING: Be very careful about plugging means and standard
    deviations into formulae, because many of the standard formulae assume
    that the underlying distribution is normal (i.e., that the X(i), if
    plotted in a histogram, look like a bell curve) and may lead you astray
    if this assumption is not satisfied.  *Before* you plug values into a
    standard formula that assumes normality, plot a histogram and verify
    that it looks reasonably like a bell curve.
 |