| Title: | + OpenVMS Clusters - The best clusters in the world! + |
| Notice: | This conference is COMPANY CONFIDENTIAL. See #1.3 |
| Moderator: | PROXY::MOORE |
| Created: | Fri Aug 26 1988 |
| Last Modified: | Fri Jun 06 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 5320 |
| Total number of notes: | 23384 |
I am having some trouble adding a Satellite to my two node SCSI cluster.
I shut down one node just to take it out of the picture, reducing the problem
to just one boot node, one satellite, one system disk with two roots. The lan
can be Eithernet or FDDI, makes no difference. I am using Port Allocation
Classes on the boot node. This is the SSB version of OpenVMS V7.1 /W OSI.
While the boot node is in it's 'waiting for <satellite> to boot loop'
the following is seen on the satellite:
>>>b -fl 0,1
(boot fwa0.0.0.12.0 -flags 0,1)
Trying MOP boot.
.........
Network load complete.
Host name: MOLD
Host address: aa-00-04-00-63-30
bootstrap code read in
base = 1f2000, image_start = 0, image_bytes = 71077
initializing HWRPB at 2000
initializing page table at 1e4000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code
%VMScluster-I-MOPSERVER, MOP server for downline was node MOLD
%VMScluster-I-SYSDISK, Satellite system disk is _$1$DKA100:
%VMScluster-I-SYSROOT, Satellite system root is <SYS10.>
%VMScluster-I-BUSONLINE, LAN adapter is now running 08-00-2B-B4-18-80
%VMScluster-I-VOLUNTEER, System disk service volunteered by node MOLD
AA-00-04-00-63-30
%VMScluster-I-CREATECH, Creating channel to node MOLD
08-00-2B-B4-18-80 00-00-F8-4A-A0-10
%VMScluster-I-OPENVC, Opening virtual circuit to node MOLD
%VMScluster-I-MSCPCONN, Connected to a MSCP server for the system disk,
node MOLD
%VMScluster-E-NOT_SERVED, Configuration change, the system disk is no
longer served by node MOLD FF-7F-00-00-83-00
%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server
And back to "%VMScluster-I-MSCPCONN", an endless loop.
Relevant params on boot node:
Parameters in use: Active
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
VAXCLUSTER 2 1 0 2 Coded-valu
EXPECTED_VOTES 3 1 1 127 Votes
VOTES 1 1 0 127 Votes
RECNXINTERVAL 20 20 1 32767 Seconds D
DISK_QUORUM "$1$DKA100 " " " " " "ZZZZ" Ascii
QDSKVOTES 1 1 0 127 Votes
QDSKINTERVAL 10 10 1 32767 Seconds
ALLOCLASS 1 0 0 255 Pure-numbe
LOCKDIRWT 0 0 0 255 Pure-numbe
CLUSTER_CREDITS 10 10 10 128 Credits
NISCS_CONV_BOOT 0 0 0 1 Boolean
NISCS_LOAD_PEA0 1 0 0 1 Boolean
NISCS_PORT_SERV 0 0 0 3 Bitmask
MSCP_LOAD 1 0 0 16384 Coded-valu
TMSCP_LOAD 0 0 0 3 Coded-valu
MSCP_SERVE_ALL 1 0 0 2 Coded-valu
TMSCP_SERVE_ALL 0 0 0 3 Coded-valu
MSCP_BUFFER 128 128 16 -1 Coded-valu
MSCP_CREDITS 8 8 2 128 Coded-valu
MSCP_CMD_TMO 600 600 0 2147483647 CNTLRTMOs D
TAPE_ALLOCLASS 0 0 0 255 Pure-numbe
NISCS_MAX_PKTSZ 1498 1498 1080 8192 Bytes
NISCS_LAN_OVRHD 18 18 0 256 Bytes
CWCREPRC_ENABLE 1 1 0 1 Bitmask D
System Disk params:
Disk $1$DKA100: (MOLD), device type DEC RZ28M, is online, mounted, file-oriented
device, shareable, served to cluster via MSCP Server, error logging is
enabled.
Error count 3 Operations completed 31065
Owner process "" Owner UIC [SYSTEM]
Owner process ID 00000000 Dev Prot S:RWPL,O:RWPL,G:R,W
Reference count 283 Default buffer size 512
Total blocks 4110480 Sectors per track 86
Total cylinders 2988 Tracks per cylinder 16
Allocation class 1
Volume label "GROUT71" Relative volume number 0
Cluster size 4 Transaction count 381
Free blocks 1701652 Maximum files allowed 411048
Extend quantity 5 Mount count 1
Mount status System Cache name "_$1$DKA100:XQPCACHE"
Extent cache size 64 Maximum blocks in extent cache 170165
File ID cache size 64 Blocks currently in extent cache 19572
Quota cache size 0 Maximum buffers in FCP cache 754
Volume owner UIC [1,1] Vol Prot S:RWCD,O:RWCD,G:RWCD,W:RWCD
Volume Status: subject to mount verification, protected subsystems enabled,
file high-water marking, write-through caching enabled.
So, what am I doing wrong here?
Steve
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 5233.1 | BSS::JILSON | WFH in the Chemung River Valley | Wed Feb 19 1997 16:40 | 1 | |
What is SCSCONNCNT? Increase it to 50 or 60 and try again. | |||||
| 5233.2 | will try | CPEEDY::CONWAY | Thu Feb 20 1997 08:49 | 5 | |
Thanks, I will try that as soon as I can pry the system away from my "customer". Steve | |||||
| 5233.3 | CPEEDY::CONWAY | Fri Feb 21 1997 11:43 | 13 | ||
In addition to the the symptoms in .1 I also see, during "$show clus/con"
(add connections), two things:
LOCAL_PROC_NAME CON_STA
--------------- -------
SCS$DIR_LOOKUP CON_SEN ! This keeps comming and going
MSCP$DISK OPEN ! This is steady
I still havn't gotten a chance to try changing SCSCONN yet.
Steve
| |||||
| 5233.4 | STAR::PITCHER | Steve Pitcher/Pathworks for OpenVMS | Fri Feb 21 1997 13:14 | 8 | |
I'm working on this same cluster. I tried setting SCSCONNCNT to 60...
It didn't help.
Any other thoughts?
Thanks.
- stp
| |||||
| 5233.5 | Questions, Guesses, No Answers... | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Fri Feb 21 1997 16:05 | 4 |
What (non-zero) disk allocation classes are in use, and on which nodes? Does a bootstrap over the NI controller complete? Are there any errors logged? Is there a version of CLUSTER_AUTHORIZE in SYS$SPECIFIC:[SYSEXE]? | |||||
| 5233.6 | Same problem here | PRSSOS::MENICACCI | Mon Mar 24 1997 11:29 | 96 | |
.0, Steve dis you find a solution ?
Is it a known problem ? If an IPMT is needed, what else information would be
useful ?
Regards,
Maria.
Configuration :
-------------
Alphaserver 1000A 5/400 et 2000 4/233, cluster SCSI running OpenVMS V7.1.
------
| | Station alpha 4/233
| | root SYS10
| | (METEOR)
| |
------
|
| ETHERNET
------------------------------------------------------------------------
| |
| |
| |
---------- _____________ Alphaserver
| | Alphaserver 1000a 5/400 | | 2000 4/233
| | root SYS1 (MATEMA) | | root SYS0
| | | | (METIS)
| | SCSI Port allo class 200 | |
| | ____ | |SCSI Port Allo
| |--------| |$200$DKA0 | | Class 300
| | SCSI ---- | | ____
| | | |___| |$300$DKA0
| KZPSA | |KZPSA|KZPSA | ----
---------- -------------
|SCSI Port SCSI Port | |SCSI Port
|allo class allo class| |allo class
|301 301 | |302 _____
| | |----|TAPE|$302$MKA400
| SCSI SCSI | | -----
|__________________|------|_______________________| | _____
| | -$301$DKA100(SYSTEM DISK) |----|TAPE|$302$MKA500
| | _$301$DKA200 -----
| | _$301$DKA300
| | _$301$DKA400
-------
! VMS$DEVICES.DAT
CLUSTER_CONFIG created 17-mar-1997 14:25:50
[Port MATEMA$PKA]
allocation class = 301
[Port MATEMA$PKB]
allocation class = 200
!
!
[Port METIS$PKA]
allocation class = 300
[Port METIS$PKB]
allocation class = 301
[Port METIS$PKC]
allocation class = 302
>>> B fl 0,0 ewa0
...
%VMScluster-E-NOT_SERVED, Configuration change, the system disk is no
longer served by node METIS
%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server
%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server
%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server
....
On the two boot nodes, $ show dev/service ==> all disks are available.
No errors logged.
Only one cluster_authorize.dat in sys$common:[sysexe]
Boot was also done with only one boot node and scsconncnt = 60.
| |||||
| 5233.7 | Start the IPMT... | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Mon Mar 24 1997 13:57 | 6 |
:Is it a known problem ? If an IPMT is needed, what else information would be :useful ? Please start the IPMT, with the information here... If additional information is needed, it will be asked for. | |||||
| 5233.8 | Lots Of Ethernet Addresses Here... | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Mon Mar 24 1997 14:02 | 22 |
:Network load complete. :Host name: MOLD :Host address: aa-00-04-00-63-30 :%VMScluster-I-BUSONLINE, LAN adapter is now running 08-00-2B-B4-18-80 :%VMScluster-I-VOLUNTEER, System disk service volunteered by node MOLD : AA-00-04-00-63-30 :%VMScluster-I-CREATECH, Creating channel to node MOLD :08-00-2B-B4-18-80 00-00-F8-4A-A0-10 :%VMScluster-I-OPENVC, Opening virtual circuit to node MOLD :%VMScluster-I-MSCPCONN, Connected to a MSCP server for the system disk, : node MOLD :%VMScluster-E-NOT_SERVED, Configuration change, the system disk is no :longer served by node MOLD FF-7F-00-00-83-00 :%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server I have no idea if this is significant, but MOLD appears to have more than a few Ethernet addresses here... Can you bring the satellite and the boot host onto the same LAN segment (a private LAN segment is even better), and eliminate any weird network hardware that might be lurking? | |||||
| 5233.9 | CPEEDY::CONWAY | Wed Mar 26 1997 08:39 | 20 | ||
re .8 HOFFMAN:
> I have no idea if this is significant
Don't underestimate yourself, turns out we have a "invalid connection"
(PHY Status LED is flashing amber) on one of our DECswitch 900 EF'S
that is almost certainly causing my problem (not yet verified).
re .6 MENICACCI
>%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server
>%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server
>%VMScluster-I-REINIT_WAIT, Waiting for access to the system disk server
I do not get a steady stream of REINIT_WAIT's. I get a stream of
MSCPCONN => NOT_SERVED => REINIT_WAIT. So it does not sound like the
same problem to me. It looks to me like my satellite is repeatedly
discovering the boot node and then loosing it.
Steve
| |||||
| 5233.10 | Decnt MOP and LANCP MOP | PRSSOS::MENICACCI | Thu Apr 10 1997 08:52 | 8 | |
Engineering has found the solution. On this cluster, Decnet MOP and lancp MOP were both up and running. The solution was to stop Decnet Mop and use Lancp MOP only. Satellite booted OK. | |||||