| Title: | Mathematics at DEC |
| Moderator: | RUSURE::EDP |
| Created: | Mon Feb 03 1986 |
| Last Modified: | Fri Jun 06 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 2083 |
| Total number of notes: | 14613 |
I hope you don't disapprove of me entering the following on behalf of
my son. I'm afraid I haven't looked at it myself yet, but he is keen to
get comments on it. I hope nothing got chopped in the conversion from
Word to txt.
Data exists as recorded values X1,X2,X3,...,Xn
with associated values Y1,Y2,Y3,...,Yn.
A power curve is to be fitted by rearranging the power relationship :
y = t x^u
...to the linear relationship :
ln y = u ln x + ln t
...by taking natural logs on each side, then using least squares
(in a vertical residual direction) to obtain values of t and u.
Does minimising (E represents a sigma sign from i=1 to i=n) :
E(ln Yi - u ln Xi - ln t)^2 -(I)
...by finding suitable values of t and u, imply that :
E(Yi - t Xi^u)^2 -(II)
...is also minimised with these same values of t and u ?
(I) can be multiplied out to give :
R = u^2 E(ln Xi)^2 + E(ln Yi)^2 - 2u E(ln Xi ln Yi)
+ 2u ln t E(ln Xi) - 2ln t E(ln Yi) + n (ln t)^2
This gives :
'partial' dR/du = 2u E(ln Xi)^2 + 2ln t E(ln Xi) -2 E(ln Xi ln Yi)
'partial' dR/dt = (2u / t) E(ln Xi) - (2 / t) E(ln Yi) + (2n ln t / t)
It is known, and proved by algebraic methods, that :
u = ( n E(ln Xi ln Yi) - E(ln Xi) E(ln Yi) )
----------------------------------------
( n E(ln Xi)^2 - ( E(ln Xi) )^2 )
ln t = ( E(ln Xi)^2 E(ln Yi) - E(ln Xi) E(ln Xi ln Yi) )
-------------------------------------------------
( n E(ln Xi)^2 - ( E(ln Xi) )^2 )
(II) can be multiplied out to give :
S = t^2 E(Xi^2u) + E(Yi^2) - 2t E(Xi^u Yi)
This gives :
'partial' dS/du = 2 t^2 E(Xi^2u ln Xi) - 2t E(Xi^u Yi ln Xi)
'partial' dS/dt = 2t E(Xi^2u) - 2 E(Xi^u Yi)
Q1 : What are the solutions to 'partial' dS/du = 0 and 'partial' dS/dt = 0 ?
Q2 : If the previous solutions for t and u, obtained from (I),
were substituted in 'partial' dS/du and 'partial' dS/dt
would the two partial derivative equations both sum to zero ?
Q3 : The data to follow indicates that the answer to Q2 is "No".
The sum of squares of the residuals for the curve y = t x^u
best fitted (or seemingly best fitted) is GREATER than the
sum of squares of the residuals for the simpler SUBSET of
curves y = s x (i.e. u = 1, t = s).
It may be that this is just a computer inaccuracy
but it occurs at 16 s.f. spreadsheet working.
Question 3 is "IS THE METHOD FOR CALCULATING THE BEST FIT
POWER CURVE INACCURATE AND THEREFORE INCORRECTLY NAMED ???".
This is the real question that requires answering - the rest
is just preliminary 'garbage'.
DATA : ( x , y ) (14,394) (17,371) (23,779) (26,1044) (27,661) (28,828)
(30,1701) (31,1251) (32,719) (33,886) (33,1010) (35,971)
(37,837) (42,1141) (43,1357)
[The erratic entry of (30,1701) may be omitted without altering the
trend in the results. Data is from
"Differential Equations & Numerical Analysis" by Andrew Paterson, page 111.]
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 1866.1 | If I understand your question... | CADSYS::COOPER | Topher Cooper | Fri Apr 22 1994 15:55 | 66 |
Let me take a different tack --
We can think of regression as a process of looking through a family of
available curves -- distinguished from each other by some number of
parameters -- and choosing a curve from the whole family which
minimizes some penalty function. In least-squares regression that
penalty function is the sum of the squares of difference from our
observed points.
Think about one of those points, which will be contributing to the
final cost function. Assume it is at location (x0, y0) (both >0).
Now think of two of our potential curves one of which has a value
at x0 of y0+e, and the other of which has a value at x0 of y0-e, for
"e" a small positive value. The contribution to the sum of squares
measure for both curves is equal, and is e^2.
Now lets look at the situation after we have transformed our space by
taking the log of both the x and the y coordinates. The point has
been transformed to be at (ln(x0), ln(y0)), while the transformed
curves will pass through point ln(x0) at points ln(y0+e) and ln(y0-e).
The contribution for that point, now, to the sum-of-squares score will
be (ln(y0+e) - ln(y0))^2 for the first curve and (ln(y) - ln(y0-e))^2
for the second. If you look at the shape of the ln curve you will see
that the lower curve will have a larger "error" value than the upper
curve. (To demonstrate this algebraically, note that the expansion of
the first difference squared around e=0, is:
2 3 4 5
e e 11 e 5e
---- - ---- + ------- - ----- + ...
2 3 4 5
y y 12 y 6y
while the second one is:
2 3 4 5
e e 11 e 5e
---- + ---- + ------- + ----- + ...
2 3 4 5
y y 12 y 6y
The odd order terms, which are positive are added in the latter and
subtracted in the former, so the latter is bigger than the former).
This means that a least squares fitting in the transformed space will
prefer, as far as this point is concerned, the upper curve to the lower
even though in the untransformed space they are equal. The linearized
least square regression, therefore will not be equal to the non-linear
least square In fact, it will consistently run high.
But...
There is nothing magic about the least square criteria. There are
situations where it has actual strong justification, but most of the
time it is simply used because it is analytically convenient.
Depending on why you want the regression, your logrithmatized
criterion, though a little hard to characterize, may be no more
arbitrary and just as useful as a "vanila" least square.
Furthermore, if you really want a strict least square, this procedure
will generally get you a good first estimate for an iterative,
non-linear least-square regression calculation. I think that Numerical
Recipies discusses such procedures.
Topher
| |||||
| 1866.2 | When really justified. | CADSYS::COOPER | Topher Cooper | Mon Apr 25 1994 17:17 | 24 |
Someone asked me via EMail just when the least-square criteria is
really justified.
It is justified when the dependent variable ("y") can reasonably be
assumed to be the result of a sum of:
1) A deterministic process parameterized by the precisely known
independent variables ("x's").
2) A stochastic, normally distributed process (e.g., measurment
error) which is not dependent on the independent variables.
There are variant versions of least square which allow imprecision in
the independent variables and/or precisely characterized variation in
the variance with position, but the essence is a normally distributed
error on a deterministic process.
This is a parameter estimation procedure and you still need a criterion
for selecting what the best estimators are. It turns out that virtually
all the reasonable criteria, including the important "maximum
likelihood" criteria, agree on least-square as the right measure given
these conditions.
Topher
| |||||