| T.R | Title | User | Personal Name
 | Date | Lines | 
|---|
| 4158.1 | Dump file analysis | VMSNET::P_NUNEZ |  | Fri Feb 14 1997 11:02 | 17 | 
|  |     if it helps, license server dump analysis is returning:
    
    Condition signalled to take dump:
    %SYSTEM-F-ABORT, abort
    %SYSTEM-F-ABORT, abort
    -SYSTEM-S-NOMSG, Message number 0000F941
    
    DBG> show calls
     module name     routine name                     line     rel PC   abs PC
    *LOGGING         PLog                             3998    00000299 0000F941
     IC                                                       00000000 0000E848
     IC                                                       00000000 0000BF7B
     SHARE$PWRK$CSSHR
                                                              00000000 00096085
     MTS$MAIN                                                 00000000 0000581F
                                                              00000000 8962038D
    
 | 
| 4158.2 | Questions | CPEEDY::KENNEDY | Steve Kennedy | Fri Feb 14 1997 13:04 | 24 | 
|  |     Paul-
    
    .0> Customer has several license server dump files.  The license server
    .0> logs report the error:
    
    Is this an occasional occurance or is this happening all the time?
    ("all the time" meaning that the license server won't run at all)
    
    Is the customer running PATHWORKS on both nodes of the cluster?  If so,
    is the license server configured to run on both nodes?  If so, does the
    license server always fail on both nodes?  Or does it fail sometimes?
    If sometimes, is it always/usually while trying to start up on one node
    as a result of failing over from the other node?  
    
    I know it's a lot of work, but has the customer tried to trouble shoot
    this by configuring for a single transport to see which one (or more)
    fails?
    
    \steve
    
    
    
    
    
 | 
| 4158.3 | update | VMSNET::P_NUNEZ |  | Fri Feb 14 1997 14:53 | 96 | 
|  |     Steve,    
    
    .0> Customer has several license server dump files.  The license server
    .0> logs report the error:
    
    >Is this an occasional occurance or is this happening all the time?
    >("all the time" meaning that the license server won't run at all)
    
    It appeared to be happening all the time.  But (1) with version limit
    of 5 I only have stuff from today and (2) the customer just upgraded to
    v5.0E over the weekend.  He did note he's had problems with the license
    server since upgrading that required him to stop/restart it several
    times before it worked (but see below for how he was managing his
    license server in the cluster).  So, based on that, and the fact that
    I'm seeing one strange netbios license server name on a cluster running
    v5.0E in our lab that is similar to what I saw on the customer's, I
    gotta believe this is new to v5.0E.
    
>    Is the customer running PATHWORKS on both nodes of the cluster?  If so,
>    is the license server configured to run on both nodes?  If so, does the
>    license server always fail on both nodes?  Or does it fail sometimes?
>    If sometimes, is it always/usually while trying to start up on one node
>    as a result of failing over from the other node?  
    
    He has a dssi cluster of 2 VAX 4000-500A, hardware model type 453
    (BIGBRD and CLONE) and is running PATHWORKS on them both.  Yes it was
    failing on both nodes.  
    
    I think your hunch about starting the license server on one node after
    it's failed on the other is a good one.  Due to misconceptions on their
    part, they thought they should only run pwrk$license_s on node BIGBRD
    (because that's the name the license server grabbed initially). 
    Because they weren't aware of the inhibit logical, they accomplished
    this by running pwrk$license_shutdown on CLONE after PATHWORKS was
    running on both nodes in the cluster.  They could start PATHWORKS on
    either node first (they didn't have a policy on this).  And I would
    think that if this is the issue, then the order that likely causes the
    problem is:
    
    start PATHWORKS on clone first (becomes active license server)
    start PATHWORKS on bigbrd
    stop license server on clone  (license server fails over to bigbrd)
    
    If it were the other way around, stopping the license server on clone
    wouldn't cause any failover to occur.  
    
    >I know it's a lot of work, but has the customer tried to trouble shoot
    >this by configuring for a single transport to see which one (or more)
    >fails?
    
    By the time we figured out how to get the license server started, the
    customer wanted to leave it alone until Monday.  I've got a cluster I'm
    going to try to duplicate it on (in .0 I did show one strange netbios
    name related to the license server exists on our cluster already).
    
    We found our way around it when I noticed the pwrk$lbigbrd\20,
    pwrk$lbigrd\43, and pwrk$ls\47 NETBIOS names still existed ($ mc
    pcsa_claim_name /status) on bigbrd after stopping the license server on
    bigbrd. 
    
    So I:
    
    $ mc pcsa_claim_name /delete pwrk$lbigbrd
    $ mc pcsa_claim_name /delete pwrk$lbigbrd\43
    $ mc pcsa_claim_name /delete pwrk$ls\47
    $ @sys$startup:pwrk$license_startup
    
    and it worked.  So it seems the netbios names (possibly just DECnet
    netbios names) aren't being deleted when pwrk$license_s is stopped.  
    This would explain the license server log error "Name 'PWRK$LBIGBRD    '
    is in use by Another License Server".  
    
    Comments?
    
    I still don't understand how those odd netbios names are getting
    created?  I checked the customer's license server log and state file
    and they have the correct name of just BIGBRD.  Same on our cluster. 
    Here's the one odd name that existed on our internal cluster (which
    seemed to be running fine - no dumps/etc) that uses the license server
    name PWRK$LALFPW1:
    
                NetBIOS name    Last  Numb  Status
    
                PWRK$LALFPW1R01  50    11    04
    
    In all cases where "R0n" is appended to the name, the last byte is 50. 
    On the customer's system I saw it had names for PWRK$LBIGBRDR01 -
    PWRK$LBIGBRDR0M and all had a last byte of 50.  When he viewed these
    names from DOS with SHOW ASTAT BIGBRD, the names ended with a "P" (for
    example, PWRK$LBIGBRDR02P).  I don't see any names with a last byte of
    50 when things are "normal". 
    
    I'm still dialed in if you need more info (but things are "normal" at
    this point)...
    
    Paul
 | 
| 4158.4 | More weirdness | VMSNET::P_NUNEZ |  | Fri Feb 14 1997 15:06 | 5 | 
|  |     
    Also note in .0 that we see on node CLONE that it's claimed the name
    PWRK$LCLONE R01 (and others).  But why isn't PWRK$LBIGBRD????
    
    Paul
 | 
| 4158.5 | Account issue? | VMSNET::P_NUNEZ |  | Fri Feb 14 1997 15:17 | 8 | 
|  |     Possibly another factor.  The customer noted that it seemed he had to
    run the pwrk$license_startup from the SYSTEM account even though he has
    fully privileged VMS account.  I was using FIELD account (with all
    privs enabled) to stop/start the license server process.  Could this be
    a factor?
    
    
    paul
 | 
| 4158.6 | PWRK$L<name>\4c ? | VMSNET::P_NUNEZ |  | Fri Feb 14 1997 15:21 | 6 | 
|  |     
    I'm trying to duplicate on our cluster.  I'm seeing an additional
    license server netbios name with a last byte of 4c.  I don't see this
    on the customers systems?
    
    Paul
 | 
| 4158.7 | my thought is names aren't being deleted | CPEEDY::KENNEDY | Steve Kennedy | Fri Feb 14 1997 18:51 | 79 | 
|  |     .3> So it seems the netbios names (possibly just DECnet
    .3> netbios names) aren't being deleted when pwrk$license_s is stopped.  
    .3> This would explain the license server log error "Name 'PWRK$LBIGBRD    '
    .3> is in use by Another License Server".  
    .3> 
    .3> Comments?
    This was my suspicion.  I remembered we ran into a problem like this in
    our test lab, but I couldn't remember if it was while testing shipping
    software or prototype software.  In either case it looks like the
    problem is now in the field.  FWIW: when we saw it before it was DECnet
    only.
    .3> I still don't understand how those odd netbios names are getting
    .3> created?  
    The odd names that you can now see were introduced recently as an
    optimization to the license components' "PING client" functions.
    Essentially these names are created and serve as a 'cache' of network
    names which the LS (or LR) use to ping clients for license information.
    Previously the license components created new names on the fly, which
    turns out to be very expensive (time wise) - especially in the LR case
    where the client is waiting in the middle of trying to establish a
    connection with the file server while this is going on.
    The "R01P" ("R01"+ASC(50)) you see in the names is just a four
    character tag appended to a "PWRK$Lname" name base to create a unique
    name (*). The first character of this tag indicates if the name is
    associated with the license registrar ("R") or license server ("S").
    The next two characters are actually an alpha-numeric counter used to
    create multiple unique names, where either character may be "0"-"9",
    "A"-"Z". The last of the four characters is "P" (Ascii(50)), indicating
    a "Ping" end-point.
    .4> Also note in .0 that we see on node CLONE that it's claimed the name
    .4> PWRK$LCLONE R01 (and others).  But why isn't PWRK$LBIGBRD????
                   _^_
    (*) This is a registrar name, so the name base is formed using "PWRK$L"
        plus the node name (ie in this case CLONE), so as not to conflict 
        with other LRs in a cluster).  I believe the LS uses the LS name as 
        its name base when forming these names.
    
    
    .5> Possibly another factor.  The customer noted that it seemed he had to
    .5> run the pwrk$license_startup from the SYSTEM account even though he has
    .5> fully privileged VMS account.  I was using FIELD account (with all
    .5> privs enabled) to stop/start the license server process.  Could this be
    .5> a factor?
    I can't think of a reason why this is a factor, but I won't dismiss it
    as a possibility. 
    I did notice on my system that the LS groups names, "PWRK$LS...G" and
    "PWRK$Lname...L", are not cleaned up when the license server is shut
    down using PWRK$LICENSE_SHUTDOWN (though these leftovers shouldn't
    cause the conflict the customer is seeing). I wonder if it might be a
    timing thing where the failover happens too quickly and the name on the
    other node of the cluster isn't cleaned up?  That said, I would only
    expect this to be a possibility true if there were changes in this
    area, since we haven't seen this type of problem before with cluster
    configurations.
    .6> I'm trying to duplicate on our cluster.  I'm seeing an additional
    .6> license server netbios name with a last byte of 4c.  I don't see this
    .6> on the customers systems?
    Ascii(4c) = "L".  The "L" is a tag which the license server uses (in
    addition to the other tags listed in Note 2479.4.  I can't remember its
    exact use off the top of my head, but I think this indicates some sort
    of listener thread for the license server.
    Let us know the results or any info you glean from your testing.  
    Also, this seems to be a problem which will require a code change
    solution - probably should escalate.
    \steve
 | 
| 4158.8 | New Features, eh? | VMSNET::P_NUNEZ |  | Mon Feb 17 1997 09:39 | 18 | 
|  |     Steve,
    
>    Let us know the results or any info you glean from your testing.  
    From your reply, our cluster is working normally and I was unable to 
    duplicate the customer's "duplicate name" problems by stopping/starting
    license server many times...
    
>    Also, this seems to be a problem which will require a code change
>    solution - probably should escalate.
    
    I had the customer run the gather info procedure and ftp the saveset to
    me last Friday, but it didn't make it in tact; I'll have him send it on
    tape, but is there anything else I should get?
    Appreciate the help,
    
    Paul
 | 
| 4158.9 | feature? we think so ;-) | CPEEDY::KENNEDY | Steve Kennedy | Mon Feb 17 1997 12:50 | 37 | 
|  |     Paul-
    re: "New Features, eh?"
    We thought so ;-)  Here's why: When server-based licensing is being
    used, caching NETBIOS names for use by the license registrar in pinging
    the client will save ~3 seconds in the turn around time back to the
    client (the three seconds it takes to claim a new network name that the
    LR used to ping the client). In V6 things potentially get worse because
    the the 3+ second delay will turn into ~5 seconds if WINS is being used
    (due to the extra time to go to the name server).  Caching network
    names for this purpose allows us to eliminate this very long delay in
    most cases.
    
    Feature? ;-}
    .8> [...] and I was unable to duplicate the customer's "duplicate name"
    .8> problems by stopping/starting license server many times...
    Someone will try to reproduce this here once we get a CLD.
    I'm now wondering if this isn't just a timing issue during failover in
    a cluster, where the conflict is caused by the license server's name
    not being cleaned up quickly enough on one node before the other node
    tries to claim it. The reason I'm leaning this way now is that the
    "PWRK$Lname" didn't show up in the PCSA_CLAIM_NAME list, so it's not
    like something just lost track and didn't clean-up the name.  Since the
    name isn't "hanging around" in the name tables, I'm asuming there must
    have been some intermittent conflict.
    .8> [...] anything else I should get?
    
    I can't think of any other info that the customer's going to have that
    you can ask for.
    
    thanks,
    \steve
 |