| T.R | Title | User | Personal Name
 | Date | Lines | 
|---|
| 1601.1 | sample branches | MOCA::BELDIN_R | All's well that ends | Wed Apr 29 1992 12:02 | 18 | 
|  |    Re:  <<< Note 1601.0 by CLARID::DEVAL "0-� Carefull with that vax eug�ne �-0" >>>
One quick and dirty suggestion:
STATISTICAL ANALYTIC APPROACH:
   1) Sample a few branches.  (You need to be able to estimate
      what fraction of the entire network you are including.)
   
   2) Sample the traffic along those branches at various times
      of day.  Calculate some statistics like mean, min, max,
      number of bytes per [second, minute, hour, day] depending
      on just how much data you collect.
      
   3) Extrapolate your sample statistics from 2) to the entire
      network using the sampling fraction in 1).
      
Dick
 | 
| 1601.3 | If I understand you correctly... | CADSYS::COOPER | Topher Cooper | Wed Apr 29 1992 15:47 | 57 | 
|  |     Not absolutely sure I understand the problem so first let me try to
    describe it.
    You have a network of N nodes, with n links (n will be even since they
    are paired, back and forth, but I don't think that that matters). 
    Bytes are transmitted between the N nodes over the n links following
    paths which minimize the number of links that the byte needs to be
    transmitted over.  You want to know the number of bytes transmitted
    (where a byte transmitted from machine A through machine B to machine C
    is counted once).  What you have is the number of bytes transmitted
    through each of the n links, which means that you overcount each link.
    If that is a fair summary, here is a solution.
    First of all you can assume that every message is sent to an adjacent
    node.  That means that there is *no* overcounting and this provides
    a maximum number of messages consistent with your n link-counts.  That
    maximum is simply the sum of your n link-counts.
    The minimum is less trivial but not too hard.
    Imagine a path from one machine, A, to another, B, over some number of
    links,p (p is the length of the path).  The path is directional and is
    the minimum path from A to B.  The maximum amount of traffic along that
    (directional) path is the minimum number of bytes transmitted over any
    of its links.  Any byte transmitted from A to B will be counted (in the
    maximum count) p times in the maximum count.  The maximum number of
    counts for the path in the maximum count is therefore, the p times
    the minimum number of bytes transmitted over any of its links.
    To obtain the minimum possible traffic we want to see how much over-
    counting could possibly have occured.  Here is the procedure:
    Start with a "minimum count", C, of 0.  Have a list of all N(N-1) paths.
    Let B[i] be the maximum number of bytes transmitted along path i (see
    above), and p[i] be the length of path i.  Find that path, j, which has
    a maximum value for B[j]*p[j].
    Add B[j] to C.  Subtract B[j] from each of the links in path j.  At
    least one of those links (more if there were ties for minimum traffic
    link) will go to zero.  Drop every path which includes those links
    (including path j) from the list (you don't actually have to do this
    since their maximum traffic becomes zero, but its probably better to
    get them out of the way -- a table indexing paths by links might come
    in handy).
    Repeat until you run out of (non-zero traffic) paths.  C will be your
    answer.
    I think that that will produce an absolute minimum, but I haven't
    absolutely convinced myself.  In practice it shouldn't matter: it is
    certainly conservative enough so it can be taken as a practical lower
    limit unless some very bizzare things are taking place in the network.
    Hope that helps.
				    Topher
 | 
| 1601.4 | Time for a SWAG | SSAG::LARY | Laughter & hope & a sock in the eye | Wed Apr 29 1992 16:24 | 27 | 
|  | I think that to get a tighter result than Topher's bounds in .3 you will need
more information. Some information of a heuristic nature that might help would
be:
- Stick a network monitor on each link and sample sources and destinations of
messages, which will let you construct a profile of the actual distribution of
the number of hops per message on the links 
- Make assumptions about the magnitude of the traffic between any pair of
sites and use those assumption to derive the path length distribution.
The kinds of assumptions that you might be able to reasonably defend are:
* Assume the traffic between any pair of sites is proportional to the product
of the number of Digital employees at the two sites
* Assume the traffic between any pair of sites is proportional to the product
of the number of nodes at the two sites
* Any better assumption than the above based on the business activity and
interrelations of the sites - f'rinstance, information tends to flow out of
Geneva more than it flows in :-)
If you do any of the above, you wind up with a function p(L,d,n) which is the
estimated probability that traffic on link L in direction d is part of an n-hop
path; then, if the traffic on each link is T(L,d), your "best guess" at the
total amount of traffic would be SUM(T(L,d)*p(L,d,n)/n) where the sum is over
all L, d, and n. 
 | 
| 1601.5 |  | TRACE::GILBERT | Ownership Obligates | Fri May 01 1992 11:10 | 28 | 
|  | .3> Find that path, j, which has a maximum value for B[j]*p[j].
I'll call this the B*p greedy algorithm.  It doesn't quite work.
If something like it *will*, then I think the expression should
be B[j] * (p[j] - 1).  I'll explain.
Whenever B bytes of traffic on two adjacent links are 'connected' (i.e.,
made part of a longer path) by the algorithm, the bytes 'transmitted'
decreases by B.  When links form a path of length p, there are only p-1
'connections' between the links, and the bytes 'transmitted' is reduced
by B*(p-1).  Thus, I think B*(p-1) should be better.
Also, the following is a counterexample to the B*p greedy algorithm (but
note that the B*(p-1) greedy algorithm minimizes the transmissions).
	X ----- Y ----- Z
	    5       2
The B*p greedy algorithm takes:
	5 bytes connecting X-Y, then
	2 bytes connecting Y-Z,
for a total transmission of 5+2 = 7.
The B*(p-1) greedy algorithm takes:
	2 bytes connecting X-Y-Z, then
	3 bytes connecting Y-Z,
for a total transmission of 2+3 = 5.
 | 
| 1601.6 | Good point. | CADSYS::COOPER | Topher Cooper | Fri May 01 1992 12:53 | 6 | 
|  | RE: .5
    Your right.  The "overcount" (p-1) is more appropriate then the "count"
    (p).
					    Topher
 |