| Title: | Mathematics at DEC |
| Moderator: | RUSURE::EDP |
| Created: | Mon Feb 03 1986 |
| Last Modified: | Fri Jun 06 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 2083 |
| Total number of notes: | 14613 |
Hello,
I was wondering if any of the readers of this conference
would have some ideas on the types of functions best suited
for a curve fit to a NON-DECREASING INTEGRAL data set. The
problem that I'm running into is that the data has lots of
flat spots which makes things more difficult. All comments,
suggestions, or references are welcome.
Restrictions:
1. There MUST be a way to reconstruct the original data points
(integers) from the function.
i.e. data(i) = round( f(i) )
The behavior of the function between the data points does
not matter.
2. The function must be somewhat easy to express, i.e. A one
one hundred term polynomial is NOT desirable. I'd prefer
a function that has < 10 coefficients, exponents, and
constants. That is, the number of coefficients + the number
of exponents + a constant is less than 10 or so.
Here's an example data set and a plot of it.
i : 0 1 2 3 4 5 6 7 8 9
DATA(i): 0 1 1 4 5 5 5 6 7 8
9
8 *
7 *
6 *
DATA(i) 5 * * *
4 *
3
2
1 * *
0 *
0 1 2 3 4 5 6 7 8 9
i
Find a function that fits this data such that:
1. data(i) = round(f(i))
2. f(i) is not too messy (I know this is vague!).
Note that in reality the data set will be much larger,
perhaps n > 500.
Thanks in advance for your input.
Sincerely,
John Argentati
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 977.1 | Let's solve something else - this one's too messy | AKQJ10::YARBROUGH | I prefer Pi | Wed Nov 16 1988 09:32 | 14 |
Your only hope here appears to be in segmenting the function so that each segment can be fitted with a low-order polynomial. The 'flat spots', or regions with constant derivative, suggest that an overall polynomial fit will have wild ups and downs between points. In other words, if you fit a continuous differentiable function Y through 500 points at random, there probably exists an x-value for which Y(x) is substantially larger in absolute value than any of the input data points. However, we are probably already looking too closely at the details of (one possible) implementation before really knowing what *problem* you are trying to solve. If you were to describe that, we might be able to see an approach that completely avoids the issue you raised. Lynn Yarbrough | |||||
| 977.2 | HPSTEK::XIA | Wed Nov 16 1988 11:52 | 8 | ||
re. 0
SPLINE should be a good way of doing it. As .1 said, if you use
high order polynomials, there is a danger of large fluctuation between
data points. On the other hand, if you are sure intermediate points
do not matter, then.... Check any numerical analysis book on how
to do SPLINE.
Eugene
| |||||
| 977.3 | DWOVAX::YOUNG | Note early. Note Often. | Thu Nov 17 1988 00:46 | 7 | |
Re .2:
Splines are too messy. An exact fit spline for a 500 data point
set would probably be even more complex, in number of terms and
segments, than a polynomial.
-- Barry
| |||||
| 977.4 | HPSTEK::XIA | Thu Nov 17 1988 01:31 | 10 | ||
re .2 .3
Well, if as .1 said that 500 points are already fine enough,
why not just link these points with straight lines. That will be
easy enough. On the other hand 500 is not unmanagable by SPLINE.
What it amounts to is solving a tri-diagnal matrix (maybe even simpler
than that. I have not touch the stuff for two years. So....) of
500X500 which shouldn't be too bad if you have an efficient matrix
package.
Eugene
| |||||
| 977.5 | HPSTEK::XIA | Thu Nov 17 1988 01:34 | 5 | ||
re .2
An important point I want to emphasize is that a polynomial
of degree 499 is unsafe.
Eugene
| |||||
| 977.6 | AITG::DERAMO | Daniel V. {AITG,ZFC}:: D'Eramo | Thu Nov 17 1988 18:43 | 120 | |
I remember a calendar day-of-the-week algorithm that came up
with a good representation for the running total of the
number of days in a month, modulo seven. This was clever,
but having only twelve data points must have helped a lot.
The calendar stuff follows. It is interesting but won't
help for five hundred data points.
Dan
I saw something like this used to encode an almost linear
function used in a day-of-the-week algorithm. Our calendar
has a period of 400 years. Leap year occurs in years
divisible by 4, unless also divisible by 100 when there is
no leap year, unless also divisible by 400 when there is.
So every 400 years, the leap year cycle repeats. It so
happens that these 400 years have a number of days which is
an exact multiple of 7. So after 400 years the next leap
year cycle starts on the same day of the week as the first
previous cycle. This led someone to develop a day of the
week formula.
So number the days of the modulo 7, with Sunday = 0 up to
Saturday = 6. Let y, m, and d be the year, month, and day
numbers. So for today, Thursday November 17, 1988, we have
y = 1988, m = 11, d = 17, and the day of the week is 4.
It is easiest to handle leap day as the last day of the
year, so we slightly massage the y, m, d inputs by treating
March as the first month, and February as the twelfth month
of the previous year. So the first step in the formula is
to make this transformation from y, m, d to Y, M, D:
{ y - 1 if m = 1 or m = 2
Y = {
{ y if 3 <= m <= 12
{ M + 10 if m = 1 or m = 2
M = {
{ M - 2 if 3 <= m <= 12
D = d
Now the day of the week number will be the number of days
past since some reference date added to the day of the week
number of that reference date, modulo 7. The contribution
of Y to the sum will be 365 = 1 mod 7 days per year, plus 1
for every leap year, for a total of
Y + [Y/4] - [Y/100] + [Y/400]
Here the notation [x] means the greatest integer <= x.
The contribution for the day will just be D. Now here is
where the "almost linear" function stuff starts, in
computing the contribution for the adjusted month number M.
For M=1 (March) the contribution will be 0; for M=2 the
contribution will be the number of days in March, modulo 7;
for M=3 the contribution will be the number of days in March
and April, modulo 7; etc. Now write this out, for each next
value of M adding the number of days of the previous month
modulo 7, but not reducing 7's from the sum:
M days days modulo 7 sum
-- ---- ------------- ---
1 0 0 0
2 31 3 3
3 30 2 5
4 31 3 8
5 30 2 10
6 31 3 13
7 31 3 16
8 30 2 18
9 31 3 21
10 30 2 23
11 31 3 26
12 31 3 29
or
M 1 2 3 4 5 6 7 8 9 10 11 12
f(M) 0 3 5 8 10 13 16 18 21 23 26 29
The problem was to express this function in an easy to
compute way. The solution, and I don't know who to credit
this to, is to use:
f(M) = [2.6 M - 2.1] = [2.6M - 2.2]
I had not memorized this, I just worked it out again, it
isn't too hard for this particular example. By the way,
in most programming languages, use truncating integer
arithmetic and let f(M) = (26 * M - 21)/10 or (13 * M - 11)/5
So the overall formula becomes
Y + [Y/4] - [Y/100] + [Y/400] + [2.6M - 2.1] + D + k
where k is the constant that gives the correct value for the
initial day. For today, with expected result 4, we get
y, m, d = 1988, 11, 17 Y, M, D = 1988, 9, 17
4 = 1988 + [1988/4] - [1988/100] + [1988/400] + [2.6 * 9 - 2.2] + 17 + k
(all that modulo 7, of course) or k = 2.
The [... - 2.1] and + 2 can be combined to give
>> f(Y, M, D) = Y + [Y/4] - [Y/100] + [Y/400] + [2.6M - 0.1] + D
You can precompute the result of the Y terms and remember
them for the next year or two. So, for example,
f(1988, M, D) = [2.6M - 0.1] + D + 6 modulo 7
f(1989, M, D) = [2.6M - 0.1] + D modulo 7
Of course, don't forget the transformation from y,m -> Y,M.
Dan
| |||||