|  | 
	Feedback on review of CLUSTER_CHAP1           Ron Stahly 5/6/92
       -----------------------------------------------------------------
      Page 1-5a
	Guidelines for VAXcluster Configurations is now at rev -005
	Note also that this document is available to customers.
	Note that the VAXcluster Principles manual is internal use only.
	There is also a Version 5.4 VAXcluster Principles Update
	order number EK-VAXCP-UP-001 (also internal use only)
	A "new" VAXcluster principles manual(based on version 5.5) is
	in the works and should be available in the next few months.
	Plans are to also make this available to customers.
	-------------------------------------------------------------
      Page 1-5 locatio   -   needs an "n"
	-------------------------------------------------------------
      Page 1-7
	Add FDDI to the interconnects, maybe list network interconnects
	with Ethernet and FDDI
	I would suggest that you add to your DSSI quote of 4 megabytes,
	the statement of 32 megabits/sec so that you are using the same 
	term (megabits) that was used for CI and NI speed. This makes the
	comparison easier.
	-------------------------------------------------------------
      Page 1-8a
	On the Star Coupler discussion, you say "These may be either VAX
	systems or HSC subsystems in any combination that makes sense"
	We limit the number of CPUs attached to a star coupler to 16
	This is stated in the V5.5 SPD
	Also note that multiple star couplers can be used in the cluster.
	-------------------------------------------------------------
      Page 1-8 CI VAXcluster System Advantages
	add Maximum of 16 VMS systems after Minimum of one VMS system
	-------------------------------------------------------------
                                   - 1 -
                                   - 2 -
      Page 1-10a - under security data. 
	The statements:  Using DSSI does not change the essential 
	character of ethernet-based system, since the connectivity is 
	established over the ethernet. It does, however, offer the
	option of providing dual-hosted system disks to the cluster.
	are quite misleading. They are true if the DSSI is KFQSA
	based on a QBUS, but incorrect for all other DSSI adapters
	found on 3300/3400, 4200/4300/4500/4600, and KFMSA(XMI based).
	All of these adapters allow full SCS traffic and are preferred
	over the ethernet during connection formation.
	-------------------------------------------------------------
      Page 1-11a
	Discussing SCS communication over the DSSI. You state supported 
	only for 3300/3400 systems using the EDA640 adapters and PIDRIVER.
	This is also true for the embedded adapters on 4200/4300/4500/4600
	and the KFMSA(XMI-based) adapters using PADRIVER.
	-------------------------------------------------------------
      Page 1-11 statement: The Ethernet cable is a single point of failure.
	So is the ethernet adapter, but we now support multiple adapters,
	and you could use multiple segments. This is not the issue that
	it used to be.
	-------------------------------------------------------------
      Page 1-12a
	
	Again SCS and the DSSI.
	Add EDAs on 4200/4300/4500/4600 and KFMSA(XMI-based) or drop
	the list and keep the point of KFQSA does not support.
	-------------------------------------------------------------
      Page 1-12  DSSI
	add EDA on 4200(single DSSI bus)
	add EDA on 4300/4500/4600 (two DSSI bus)
	add KFMSA(XMI based) on 6xxx/9xxx ( two DSSI bus)
	also you discuss Dual-Host, but not Multi-Host. We support
	several multi-host systems of three nodes today, and soon more.
	maybe add more details to instructor page 1-12.a
	-------------------------------------------------------------
      Page 1-15 Mixed-Interconnect - statement on DSSI service for
	up to 2 MicroVAX members per DSSI
	We currently have configurations available of 3 VAX systems
	-------------------------------------------------------------
      Page 1-20a
	Work around for tape ported to a single member
	Starting with VMS V5.5 we can serve TMSCP tapes to other
	nodes in the cluster.
	-------------------------------------------------------------
      Page 1-21a - Reference to JBCSYSQUE.DAT - need updated !
	-------------------------------------------------------------
      Page 1-21  - Reference to JBCSYSQUE.DAT - need updated !
	-------------------------------------------------------------
      Page 1-26 - VAXcluster-Supplied components
	Add TMSCP server 
	-------------------------------------------------------------
      Page 1-27a  VAXcluster Troubleshooting Reference Set
	These are now revision -002
	Very good reference
	Also available as a set (EK-VCSRS-PK-002 I think, but will check)
	-------------------------------------------------------------
      Page 1-28  Statement: Cluster state transitions occur when a node
	joins or leaves the cluster, and when the cluster recognizes
	a quorum disk.
	add to the statement about the quorum disk, that a transition
	also occurs when the quorum disk becomes unavailable
	-------------------------------------------------------------
      Page 1-29a Instructors notes on changing quorum
	use of IPC to regain quorum should also be included here in 
	the instructor notes.
	-------------------------------------------------------------
      Page 1-30a Instructor notes for resources on lock manager
	could add:   Distributed Lock Manager course material, and
	the Version 5.4 VAXcluster Principles Update (EK-VAXCP-UP-001)
		   
	-------------------------------------------------------------
      Page 1-31a wording on the statement (VMS V5.4) When a member
	leaves the cluster(SHUTDOWN), part of the process before the
	cluster state transition is a Lock Database Rebuild.
	I really disagree with this wording. A rebuild is not really
	performed before the transition, instead before the transition
	begins any resources mastered on the node shutting down, are
	re-mastered on another node interested in the resource.
	-------------------------------------------------------------
      Page 1-31 on LOCKDIRWT and dynamic remastership.
	Your info "The distribution procedure operates as follows:"
	is for version 5.2 - 5.4-3
	Version 5.5 information is also needed here.
	I can provide a write-up if needed.
	-------------------------------------------------------------
      Page 1-32 statement: You may wish to set LOCKDIRWT to zero on
	MicroVAX and workstation members.
	I would suggest using the term satellite nodes rather than
	specifically pointing out MicroVAX and workstation members.
	-------------------------------------------------------------
      Page 1-35   SHOW DEVICE?FULL
	I think you wanted a / and not a ?
	-------------------------------------------------------------
      Page 1-36 - Reference to JBCSYSQUE.DAT - need updated !
 
-------------------------------------------------------------------------------
 Two things that I want to ask concerning the course material:
 -------------------------------------------------------------
 1) We need to present current release information. On the issue of
    of BATCH/PRINT; should materials only contain V5.5 release 
    information, or should they contain both pre-V5.5 and version
    5.5 material (especially the instructor pages) ?
2)  The chapter appears to contain VAXcluster System Management
    material with some new additions. Will the VAXcluster System
    management course be updated and continue to be delivered, or
    will these two new courses replace it ? ( Andy and/or Sherry
    can you please respond to this question...)
Ron Stahly
DTN 523-2134
719-260-2134
NEURON::STAHLY
 | 
|  |   Sorry, but 2 & � weeks was the best I could do...
These are some the items I've found that need further explanation or
corrections for the Cluster module.  I'll *NOT* be repeating RON's or
SUSAN's comments, as they are on target and need to be incorporated.
General comment --  please make sure that somewhere in the early pages of 
the book (or this chapter) to mention that the material constantly refers to
the word 'cluster' as a common 'slang' for the word VAXcluster, which is
a trademarked word.
1-5a, 2nd sentence --
  "Details about common environment and multiple environment are in the 
   'Building' module."  But we mention them on 1-6 (and 1-6a).  This will
  probably require that these topics are discussed in some detail at this
  time.  Can we not move it all to the Building Chapter.  
  By the way, I've searched through the VAXcluster manual and didn't find any 
  references to anything called a 'Heterogeneous' cluster.  Did we make that
  up?
1-5, 8th bullet --
  "Potentially perform I/O to any disk OR TAPE storage subsystem in the 
   cluster" -- new as of 5.5
1-6a, 1st sentence --
  I suggest that you make an extra overhead of figure 1-3 and keep that
  available all week -- very good example to use when you need a sample
  configuration to display.
1-6, definition of multiple environment --
  
  If you read the sentence defining this environment it is possible to think
  that you can have 2 'clusters' sharing one set of hardware.  Make sure that
  there is a note either on this page or the instructor's page that specifies
  that all systems attached to a common star coupler must be in the same 
  cluster.
  Or...kill this page entirely and bring up in the Building Chapter.
1-7a, 3rd paragraph --
  It states that we are to use NI VAXcluster instead of LAVc, but commonly
  throughout this module are many references to Local Area VAXcluster and LAVc
  that need to be updated -- see 1-10a, 1-10, and 1-11a for examples.
      4th paragraph -- 
  Please remove the word 'even', as it provides no benefit.
1-7, 5th bullet --
  Last I checked DSSI can now span 20 meters.
1-8a, last bullet --
  SPD limits the VAX CPUs to 16, but neither the hardware nor the software 
  enforce this.  There are many sites (in and out of DEC) with 17+ VAX systems
  on the extended CIs.
1-9 --
  
  Can we get rid of the shadowed-out part of the diagram.  It is not useful
  and provides too easy of an avenue for questions before their time.
  Add the tape drive to the HSCs.
  You picked the CPUs, now pick the HSCs.  I suggest a HSC50 and a HSC60, this
  will show both old and new HSCs (like the 785 in the cluster) work fine.
  The disks could be almost any disk - RA70, RA72, RP06, etc.  Let's just drop
  the types of disks entirely from the description since they are NOT members.
1-10a --
  I think Ron mentioned Dual-ported on a later page, but there are numerous
  references that need to be updated to Multi-Hosted system disks throughout
  the entire chapter.
1-10 --
  
  "Boot server", "Boot Server", "Bootserver" or "bootserver"??  This page shows
  two different spellings (3rd bullet from top and 2nd sentence from end).  Can
  we be consistent?
  4th bullet, 2nd dash --
  "Sends *initial* VMS image to node", since DECnet/MOP is only responsible for 
   NISCS_LAA.EXE and TERTIARY_VMB.EXE.
1-11a --
  Move the DSSI discussion to 1-12a.
1-11, 9th bullet --
  Since we can have Multi-hosted system disks via DSSI there isn't the need
  for a quorum disk, so: "...and POSSIBLY a quorum disk..."
1-12a --
  This page is very dated.  Multi-host vs Dual-host, no new hardware listed,
  etc.  I think Ron mentioned this as well, but what we need is a generic
  listing that won't date itself.  Such as using "for example" or "a partial
  list" type words to keep compatible if not current.
1-12 --
  1st sentence needs "(ISE)" after "Integrated Storage Element" if we are to
  use the acronym later in that paragraph.  Therefore remove it from bullet 2.
  3rd bullet needs updated hardware, list the 6000 and 4000 hardware.
1-13 --
  Valid diagram, but old.  How about updating with 4000s and 6000s?  Also, if
  we have the KFQSA listed, why not show the part for the other DSSI cable?
1-14 --
  Why does the Instructor page specify V5.4-3 and the student page says V5.5?
1-15a, 1st sentence --
  A pair of 6000s connected via both CI and DSSI *are* a MI cluster.  The 
  Ethernet hardware is *NOT* required!  More correct:  "Therefore, each
  SATELLITE and BOOT SERVER must reside on the Ethernet."  There might be
  other ways to word it, but be careful -- MI does *NOT* imply strictly
  CI-NI based clusters anymore.
1-15 --
  3rd bullet is misleading.  Last I checked all CI based cluster members *must*
  share a common star coupler, although some CPUs may have multiple Star
  Couplers.
  The quote from the SPD (V5.4) is...
     The maximum number of VAX CPU's supported in a VAXcluster system is 96.
     Up to 32 systems may be systems other than single user workstations.
  It also is misleading, but 16 is still the maximum 'supported' VAX systems
  on a Star Coupler.
1-17 --
  The paragraph defining VAX PROCESSORS lists VAX or MicroVAX, but leaves off
  VAXstation and VAX-11 systems.  Best to just drop MicroVAX and let VAX
  account for all of them.
1-18a, 2nd paragraph --
  This leads us to believe that only HSC or DSSI based devices can be shared.
  Misleading since almost any disk (or tape) attached *locally* can also be 
  served.  This needs to be reworded to not exclude these devices.
1-19, 4th bullet --
  VAX 4000 (not a MicroVAX or a VAXstation) is also a valid satellite.  Maybe
  just adding "...and selected VAX systems" might fix it.
1-20a --
 
  1st bullet needs to add TAPE DRIVES
  8th bullet states that the Lock manager is in the Software Module, but it 
  seems to be a very complete discussion on pages 30-33 in THIS chapter.
  Last bullet, insert the following bullet for this one:
    "o  Serve the Tape Drive if it is capable (see 1-35a for a list)"
1-20, 1st sentence --
  Change "are" to "in" 
1-21a --
  1st bullet, "...must complete on that node,..." is a confusing statement.
  "Users can recover *more* quickly..."  "More"?  More quickly than what?  Just
  drop the word MORE.
1-21, 4th bullet --
  "All" is *NOT* correct.  Tried serving an RX23?  Change ALL to MOST.
1-22, 4th bullet --
  No longer true, see page 35 for details.  How about "Available to any system
  in the cluster provided that it is served."
  By the way, Page 1-18 is a bulleted version of Pages 1-21 and 1-22.  Can we
  kill page 1-18?
1-23, 3rd bullet --
  This is partially incorrect.  For proper system management of a VAXcluster
  you are required to have DECnet.  HOWEVER, the SYSMAN utility does NOT
  require DECnet to operate correctly in a cluster.  MONITOR commands like
  MONITOR CLUSTER and MONITOR NODE do need it, but SYSMAN only uses DECnet to 
  get to nodes that are OUTSIDE of the cluster, otherwise is uses SCS like 
  any good Cluster utility. 
1-25, table headers --
  Why is the Multiprocessor listed as a 6000?  9000 is also valid.  Drop the
  CPU type and let the header read "Multiprocessor".
1-28, 1st paragraph --
  "...if a majority of the expected VOTING MEMBER NODES are functioning."
  WRONG!!  Not correct!  We don't care how many *NODES* are present, what 
  we are looking for is "...a majority of the expected VOTES from MEMBER 
  NODES are present."  The only time the first statement will be true is if
  they all have the same number of votes.  If the systems have varying
  numbers of votes then it won't work.
1-30a --
  For resources also see the Digital Technical Journal on VAXclusters.
1-30, 2nd bullet --
  So, what is a Lock Value Block, and what does it do and why does a system
  manager need to know what it is?  Drop "...through the lock value block"
  as it adds no benefit.
  Last bullet should also list DIGITAL-written applications.
1-31 --
  2nd bullet, 1st dash -- Nodes with a Zero value do *NOT* participate in the 
  directory function.
  "Eagerness" -- please define!  Do the systems block the other systems?  Is it
  a timing issue?  Some type of interprocessor signal?
  4th bullet -- when did this happen?  Is there a reference for this?  Please
  give more info on instructor's page.
1-32a, 3rd bullet, 2nd dash --
  Poor wording, please rewrite.
1-32, last bullet --
  replace "MicroVAX and workstations" with "slow CPUs and satellites"
1-34a, last bullet --
  replace "<REFERENCE>...stuff..." with a proper reference.
1-34, 1st bullet, 5th dash --
  If you are going to specify "...mixed-interconnect..." then you need a 
  parenthesis similar with the previous bullet.
1-35a, last 2 bullets --
  Aren't these disks??
1-35 --
  1st bullet.  We don't need the cluster course to be a 'new features' or 
  a 'release notes' type material.  Let's just put things on the student
  pages that are pertinent to VAXcluster *management*.  Drop the reference
  of SDA and put it on the instructor's page.  The extra 'stuff' (like the
  bottom 1/2 of 1-34a) is OK on instructor pages, but PLEASE keep the info
  on the student pages focused on the purpose of the course and chapter.
  Last two bullets -- put 'compliant' behind TMSCP.
1-36, 4th bullet --
 
  Print queues don't have job limits.  There is no 'ratio' for print queues.
  These need to be two separate bullets.
1-37a, 1st bullet --
  We tease them here.  Point them to OPC$ENABLE_OPA0 in SYLOGICALS.COM.
1-38, 1st bullet --
  Where are VAXstations??
 |