|  | Hi,
have seen this problem on three different BRS-cluster after upgrading them to
BRS V1.4. In fact it is reproducible on all of them.
Here is what I figured out so far, it may or may not be 100% correct. The BRS
procedures are quite complex and it took me a considerable amount of time to
dig in, and work around this (and other) bugs.
If one is leaving the OMS-menu not using the 'correct way' ( File -> Exit )
the mailbox assigned to this process remains allocated and continues to get
filled up by the BRS$SENTRY process. After some time, depending on the size of
the mailbox, the mailbox is full and the writing process (BRS$SENTRY) goes into
RWMBX.
This szenario happens quite often if, for example, an operator leaves his
X-session ( Session -> End session ) without exiting the OMS-menu first.
(b.t.w There are also other possible problems like huge DECW$SERVER_0_ERROR.LOG
files filling up the system-disk if the OMS-menu will not be exit'ed prior
session-end)
A serious side effect of this; if the BRS$SENTRY process hangs, no OMS-station
failover will be possible if the primary one fails.
my workaround:
I have a batch process running checking the OMS-station at regular intervalls
for processes in RWMBX (and other SUSP) states. If one is found and the
Processname is BRS$SENTRY the appended com-procedure will be triggered. The
procedure will deassign the BRS$OMSMBX_x Logical and clean-up the mailbox.
Afterwards the BRS$SENTRY process continues to work.
Maybe there is another, less complex, way. But that was the only one I found so
far to keep the OMS-stations running without constantly rebooting them.
Maybe you need to IPMT this.
regards,
Bernd
================================================================================
$! File :       SCHEDULER_COM:CHECK_ORPHAN_BRS_MAILBOXES.COM
$! Date :       08.10.96  B.Oberle
$! Usage:       release BRS connections to mailboxes which are not longer in
$!              use (maybe due to inproper BRS-menu exit)
$! Note :       called by SYS$SYSDEVICE:[SNS$WATCHDOG]SNS$CHECK_PROCESSES_STATES
$!------------------------------------------------------------------------------
$! History:
$! ========
$!
$!------------------------------------------------------------------------------
$!
$  set noon
$  counter = 0
$!
$  NODE         = F$GETSYI("NODENAME")
$  if node .NES. F$trnlnm("BRS$PRIMARY_OMS") then goto not_prim_oms
$!
$  loop:
$! -----
$!
$  counter = counter + 1
$  if counter .GT. 50 then goto end_run
$  mba_dev_name = F$trnlnm("BRS$OMSMBX_''counter'")
$  if mba_dev_name .EQS. "" then goto loop
$!
$  if F$getdvi(mba_dev_name,"REFCNT") .NE. 2 then goto loop
$!
$  write sys$output "==> orphan BRS mailbox found (''mba_dev_name') -- deleting logical ..."
$  deassign/system/user BRS$OMSMBX_'counter'
$!
$! ####  now empty the mailbox  ####
$! This happens in batch because the procedure will hang if the mailbox is empty
$!
$  submit -
        /noident                        -
        /param=("''mba_dev_name'")      -
        /queue=sys$batch                -
        /noprint                        -
        /nolog -
        scheduler_com:SUB_EMPTY_ORPHAN_BRS_MAILBOXES.COM
$!
$  wait ::10
$  delete/entry='$ENTRY'
$!
$  goto loop
$!
$  not_prim_oms:
$! -------------
$!
$  write sys$output ""
$  write sys$output "==>  Procedure run on primary OMS-node only !!!"
$  write sys$output ""
$!
$  end_run:
$! --------
$!
$  exit
$!
================================================================================
$! File :       SCHEDULER_COM:SUB_EMPTY_ORPHAN_BRS_MAILBOXES.COM
$! Date :       08.10.96  B.Oberle
$! Usage:       write content of mailbox to null-device
$! Note :       do not use interactive  -- called by CHECK_ORPHAN_BRS_MAILBOXES.COM
$!------------------------------------------------------------------------------
$! History:
$! ========
$!
$!------------------------------------------------------------------------------
$!
$  if p1 .EQS. "" then exit
$!
$  loop:
$! -----
$  copy 'p1' sys$output:
$  goto loop
 |