[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

1866.0. "Question on non-linear regression" by IOSG::CARLIN (Dick Carlin IOSG, Reading, England) Fri Apr 22 1994 14:09

    I hope you don't disapprove of me entering the following on behalf of
    my son. I'm afraid I haven't looked at it myself yet, but he is keen to
    get comments on it. I hope nothing got chopped in the conversion from
    Word to txt.
    
    
Data exists as recorded values X1,X2,X3,...,Xn
with associated values Y1,Y2,Y3,...,Yn. 

A power curve is to be fitted by rearranging the power relationship :
				y = t x^u
...to the linear relationship :
			        ln y = u ln x + ln t
...by taking natural logs on each side, then using least squares
(in a vertical residual direction) to obtain values of t and u.

Does minimising (E represents a sigma sign from i=1 to i=n) :
		E(ln Yi - u ln Xi - ln t)^2				-(I)
...by finding suitable values of t and u, imply that :
		E(Yi - t Xi^u)^2					-(II)
...is also minimised with these same values of t and u ?

(I) can be multiplied out to give :
R = u^2 E(ln Xi)^2 + E(ln Yi)^2 - 2u E(ln Xi ln Yi)
  + 2u ln t E(ln Xi) - 2ln t E(ln Yi) + n (ln t)^2

This gives :
'partial' dR/du = 2u E(ln Xi)^2 + 2ln t E(ln Xi) -2 E(ln Xi ln Yi)
'partial' dR/dt = (2u / t) E(ln Xi) - (2 / t) E(ln Yi) + (2n ln t / t)

It is known, and proved by algebraic methods, that :
   u = ( n E(ln Xi ln Yi) - E(ln Xi) E(ln Yi) )
       ----------------------------------------
          ( n E(ln Xi)^2 - ( E(ln Xi) )^2 )

ln t = ( E(ln Xi)^2 E(ln Yi) - E(ln Xi) E(ln Xi ln Yi) )
       -------------------------------------------------
              ( n E(ln Xi)^2 - ( E(ln Xi) )^2 )

(II) can be multiplied out to give :
S = t^2 E(Xi^2u) + E(Yi^2) - 2t E(Xi^u Yi)

This gives :
'partial' dS/du = 2 t^2 E(Xi^2u ln Xi) - 2t E(Xi^u Yi ln Xi)
'partial' dS/dt = 2t E(Xi^2u) - 2 E(Xi^u Yi)

Q1 : What are the solutions to 'partial' dS/du = 0 and 'partial' dS/dt = 0 ?

Q2 : If the previous solutions for t and u, obtained from (I),
     were substituted in 'partial' dS/du and 'partial' dS/dt
     would the two partial derivative equations both sum to zero ?

Q3 : The data to follow indicates that the answer to Q2 is "No".
     The sum of squares of the residuals for the curve y = t x^u
     best fitted (or seemingly best fitted) is GREATER than the
     sum of squares of the residuals for the simpler SUBSET of
     curves y = s x (i.e. u = 1, t = s).

     It may be that this is just a computer inaccuracy
     but it occurs at 16 s.f. spreadsheet working.

     Question 3 is "IS THE METHOD FOR CALCULATING THE BEST FIT
     POWER CURVE INACCURATE AND THEREFORE INCORRECTLY NAMED ???".
     This is the real question that requires answering - the rest
     is just preliminary 'garbage'.


DATA :  ( x , y ) (14,394) (17,371) (23,779) (26,1044) (27,661) (28,828)
                  (30,1701) (31,1251) (32,719) (33,886) (33,1010) (35,971)
                  (37,837) (42,1141) (43,1357)

[The erratic entry of (30,1701) may be omitted without altering the
trend in the results.  Data is from
"Differential Equations & Numerical Analysis" by Andrew Paterson, page 111.]

T.R	Title	User	Personal Name	Date	Lines
1866.1	If I understand your question...	CADSYS::COOPER	Topher Cooper	`Fri Apr 22 1994 15:55`	66
	Let me take a different tack -- We can think of regression as a process of looking through a family of available curves -- distinguished from each other by some number of parameters -- and choosing a curve from the whole family which minimizes some penalty function. In least-squares regression that penalty function is the sum of the squares of difference from our observed points. Think about one of those points, which will be contributing to the final cost function. Assume it is at location (x0, y0) (both >0). Now think of two of our potential curves one of which has a value at x0 of y0+e, and the other of which has a value at x0 of y0-e, for "e" a small positive value. The contribution to the sum of squares measure for both curves is equal, and is e^2. Now lets look at the situation after we have transformed our space by taking the log of both the x and the y coordinates. The point has been transformed to be at (ln(x0), ln(y0)), while the transformed curves will pass through point ln(x0) at points ln(y0+e) and ln(y0-e). The contribution for that point, now, to the sum-of-squares score will be (ln(y0+e) - ln(y0))^2 for the first curve and (ln(y) - ln(y0-e))^2 for the second. If you look at the shape of the ln curve you will see that the lower curve will have a larger "error" value than the upper curve. (To demonstrate this algebraically, note that the expansion of the first difference squared around e=0, is: 2 3 4 5 e e 11 e 5e ---- - ---- + ------- - ----- + ... 2 3 4 5 y y 12 y 6y while the second one is: 2 3 4 5 e e 11 e 5e ---- + ---- + ------- + ----- + ... 2 3 4 5 y y 12 y 6y The odd order terms, which are positive are added in the latter and subtracted in the former, so the latter is bigger than the former). This means that a least squares fitting in the transformed space will prefer, as far as this point is concerned, the upper curve to the lower even though in the untransformed space they are equal. The linearized least square regression, therefore will not be equal to the non-linear least square In fact, it will consistently run high. But... There is nothing magic about the least square criteria. There are situations where it has actual strong justification, but most of the time it is simply used because it is analytically convenient. Depending on why you want the regression, your logrithmatized criterion, though a little hard to characterize, may be no more arbitrary and just as useful as a "vanila" least square. Furthermore, if you really want a strict least square, this procedure will generally get you a good first estimate for an iterative, non-linear least-square regression calculation. I think that Numerical Recipies discusses such procedures. Topher
1866.2	When really justified.	CADSYS::COOPER	Topher Cooper	`Mon Apr 25 1994 17:17`	24
	Someone asked me via EMail just when the least-square criteria is really justified. It is justified when the dependent variable ("y") can reasonably be assumed to be the result of a sum of: 1) A deterministic process parameterized by the precisely known independent variables ("x's"). 2) A stochastic, normally distributed process (e.g., measurment error) which is not dependent on the independent variables. There are variant versions of least square which allow imprecision in the independent variables and/or precisely characterized variation in the variance with position, but the essence is a normally distributed error on a deterministic process. This is a parameter estimation procedure and you still need a criterion for selecting what the best estimators are. It turns out that virtually all the reasonable criteria, including the important "maximum likelihood" criteria, agree on least-square as the right measure given these conditions. Topher