| T.R | Title | User | Personal Name
 | Date | Lines | 
|---|
| 5297.1 |  | BSS::JILSON | WFH in the Chemung River Valley | Fri Apr 25 1997 08:42 | 1 | 
|  | The easiest thing to do is add a quorum system.
 | 
| 5297.2 | Please Learn About Quorum Scheme... | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Fri Apr 25 1997 09:37 | 40 | 
|  | : I have been seeing brochures of multiple site clustering but I have
:not read on how this is done, especially if there are only 2 nodes.
   Please take the time to read through the _Guidelines for VMScluster
   Configurations_ manual.
   Do not attempt to bypass the quorum scheme -- if you have three
   votes configured, set the EXPECTED_VOTES to three.  The quorum
   scheme is present to prevent user and system data corruption;
   it is not something that was implemented just to be bypassed.
   *Severe* user and system data corruptions can arise -- though
   the quorum scheme is designed to be difficult to bypass, there
   are a few configurations where folks have corrupted their disks.
   You will want to configure the nodes:
	peer:
		equal nodes.  Failure of either will cause
		the other to enter a "user data integrity
		interlock" (aka: quorum hang).  This is the
		classic "two node VMScluster", and this tends
		to be a configuration most folks avoid.
	primary-secondary:
		the AlphaServer 8400 has all the votes, and
		the Alphaserver 4100 can boot or fail without
		affecting the AlphaServer 8400.  Failure of
		the AlphaServer 8400 will cause the AlphaServer
		4100 to enter a "user data integrity interlock"
		(aka: quorum hang).
	with a third (voting) node:
		The best solution.  Failure of any one node
		will not affect the other two nodes.  Each
		node has VOTES=1 and each has EXPECTED_VOTES=3.
    If a third node is not available, systems are often configured with
    a non-served shared quorum disk on a shared interconnect -- but with
    the distance you are considering, there are no shared storage
    interconnects available.
 | 
| 5297.3 | Check into the BRS products | CSC32::B_HIBBERT | When in doubt, PANIC | Fri Apr 25 1997 11:00 | 28 | 
|  |     Just to re-inforce Steve's emphasis on using the quorum scheme
    correctly, I will give an example of WHY you don't want each node
    to be able to continue independently in a 2 node multisite cluster.
    
    Consider a cluster that has 1 node at each of 2 sites.  Each site has 1
    member of dual member shadow sets for data protection.  The cluster
    has it's voting scheme set incorrectly so that each node can run
    independently of the other.  A failure occurs in the interconnect
    (backhoe cuts the fiber, bridge dies, etc.).  Both nodes continue
    running, the shadow sets split, data is being entered by users at both
    sites.  Sounds good right? WRONG!!!  Each copy of the database is being
    updated independently of the other.  When the interconnect problem is
    repaired you will have 2 completely different copies of the database. 
    If you reform the shadow sets, any modifications made from 1 of the
    systems will be lost when the shadow copy is done.  Data is LOST!!!
    If you set the cluster up to hang when quorum is lost, or set 1 node to
    have all the votes so that it can continue, but the other one will hang
    you will be better off. Or better yet, add a 3rd node as a quorum
    vote server.
    
    Please consider the BRS services for a multisite data center.  This
    product set is intended to discover issues like the above and to
    provide solutions for the problems.  The product may seem expensive on
    the surface, but it is well worth it of your customer's data is
    important.
    
    Brian Hibbert
    
 | 
| 5297.4 | EXPECTED_VOTES should be called TOTAL_VOTES | WIBBIN::NOYCE | Pulling weeds, pickin' stones | Fri Apr 25 1997 14:26 | 17 | 
|  | >   On the 8400 and
>    4100 I have set votes=2 and 1 for 8400 and 4100 respectively and
>    expected votes=2 so that 8400 will still work even if the 4100 goes
>    down.
There seems to be a common misconception about what EXPECTED_VOTES means.
Many people think it's the number of votes required for the cluster to
keep running, so they set it to just over 1/2 the total number of votes
that are available.  This is wrong.
EXPECTED_VOTES is supposed to be the *total* number of votes in all nodes
that might ever participate in the cluster.  VMS automatically figures out
a quota (a value just over 1/2 of EXPECTED_VOTES) that prevents you from
running a partitioned cluster, as explained in .3
So in the example quoted at the top of the note, EXPECTED_VOTES should be
set to 2+1=3, since the two nodes have a total of 3 votes.
 | 
| 5297.5 | Thanks. will check BRS. | MANM01::NOELGESMUNDO |  | Sat Apr 26 1997 02:44 | 15 | 
|  |     Thanks for all the replies.
    
    We have sold this setup to this customer and are expecting this to work
    without additional node (the best solution!). I have already informed
    one of the customer about the quorum scheme and that if the node with
    higher number of votes fail, the whole cluster fail as well and may
    have to reboot the remaining good system with modified parameters.
    
    I have come across BRS in the Guidelines to OpenVMS Cluster
    Configurations and will have to do some research on this area with the
    hope that is will help in this kind of situation.
    
    Regards.
    
    Noel
 | 
| 5297.6 | also check IPC mechanism | HAN::HALLE | Volker Halle MCS @HAO DTN 863-5216 | Sat Apr 26 1997 06:14 | 20 | 
|  |     Noel,
    
    in case the node with the 'higher' votes fails, the other node will
    just 'hang'. You don't have to reboot it, just use the IPC interrupt
    and recalculate quorum. You can do this from the console with the
    following commands:
    
    	<CTRL/P>
    	>>> D SIRR C
    	>>> C
    	IPC> Q
    	IPC> <CTRL/Z>
    
    Check the documentation. It works ! As a general warning: the customer
    should be well aware of the cluster mechanisms and the system manager
    needs to be well trained. Running a clsuter like this without 'cluster
    knowledge' and well-tested 'emergency' procedures puts you customer's
    data at risk ! That's what BRS is for...
    
    Volker.
 | 
| 5297.7 | EXPECTED_VOTES is a "blade guard"... | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Mon Apr 28 1997 13:22 | 8 | 
|  | 
   And if EXPECTED_VOTES is mis-set, it will be corrected automatically
   as soon as it is noticed by the VMScluster connection manager -- if
   the incorrect setting is not detected during bootstrap (due to some
   connectivity problem; due to a situation this EXPECTED_VOTES setting
   is designed for and intended to detect), severe user and system disk
   data corruptions can be expected, and have occured.
 |