| T.R | Title | User | Personal Name
 | Date | Lines | 
|---|
| 2688.1 | First Class remote mail goes via Sender | IOSG::MARSHALL |  | Thu May 22 1997 10:52 | 8 | 
|  | Sounds like there's something wrong with the Sender process on the originating
system.  Are there a lot of messages on the Sender queue for some reason?
I doubt your suggestion that the Fetcher on the receiving system is the culprit
here, if as you observe Express mail (which is sent directly by the originating
process and doesn't go near the Sender process) is fine.
Scott
 | 
| 2688.2 | EMD-E-OPENERR, Error opening | KERNEL::BURDENI |  | Thu May 22 1997 13:07 | 15 | 
|  |     Apologies, I ment sender, there seems to be a queue of upto 200
    regularly in waiting to be sent. They have reported an error in the
    Sender log as follows
    
    MTI$ERROR log has the following error every 10 minutes.
    
    'default sender'
    %EMD-E-OPENERR, Error opening link to message router.
    
    Mails do go, and there does not seem to be any failures to send.
    
    (This is information I should have posted in .0)
    
    Any ideas why this is happening ?
    Ivan.
 | 
| 2688.3 |  | IOSG::PYE | Graham - ALL-IN-1 Sorcerer's Apprentice | Thu May 22 1997 14:12 | 3 | 
|  |     Is the Message Router on only one (or some) nodes of a cluster, and
    hence the sender is taking a lot of retries to find the node that
    works?
 | 
| 2688.4 | Both machines | KERNEL::BURDENI |  | Thu May 22 1997 15:16 | 4 | 
|  |     The cluster has two nodes, each has a logical MR$NODE set to the
    Nodename. and each has OA$PRIMARY_NODE set to the Cluster alias.
    ALL-IN-1 (spelt correctly:-) is on both machines.
    	
 | 
| 2688.5 |  | IOSG::MARSHALL |  | Fri May 23 1997 15:11 | 14 | 
|  | Hmmm, .4 doesn't answer the question: on which node(s) is MR running?
Are the nodes VAXes or Alphas?
What are the values of the "Remote Mail", "Remote MR" and "MR node" fields in
A1CONFIG?
What is the value, if any, of the OA$MTI_MR_NODE logical?  Note that ALL-IN-1
doesn't use the MR$NODE logical.
What does SHOW A1 (or whatever your A1 mailbox is called) yield from MRMAN? 
Does this match what the documentation says it should be?
Scott
 | 
| 2688.6 | More info | KERNEL::BURDENI |  | Wed May 28 1997 09:24 | 22 | 
|  |     The customer has a Vax cluster, with the following setup from ALL-IN-1
    
               ALL-IN-1 SYSTEM CONFIGURATION INFORMATION - continued (1)
    
     Remote Mail: 1   Direct Type: 0   Direct Level: 0   ASCII Translate: 0
    
     Remote MR: 0   MR Node:          MR Mailbox: A1
    
    >>>>This is the same on both nodes.
    
    The logical OA$MTI_MR_NODE is not set on either machine, and the A1
    Mailbox is setup as follows.
    
            This is MRMAN V3.3-313
    MRM> sho a1
    A1,                       Owner=ALLIN1 Notify=OA$NOTIFY_MBX 
    Suppress_Delivery_Reports Complete_Messages Ignore_Sender Service_Messages
    MRM>
    
    I hope this helps
    Ivan.
    
 | 
| 2688.7 |  | IOSG::MARSHALL |  | Fri May 30 1997 17:28 | 25 | 
|  | All the information in .6 seems in order.
Again: please tell us on which nodes of the cluster Message Router is running. 
In your environment, MR should be running on all nodes (and that is the only
supported configuration), but if for some reason it's only running on one, that
could explain this problem.
On which nodes of the cluster do you have ALL-IN-1 Senders running?
The reason for the backlog of messages on the Sender queue is that every time
    %EMD-E-OPENERR, Error opening link to message router
occurs, the Sender waits ten minutes before trying again.  So if this error
happens a lot, you end up with a lot of dead time when nothing is happening.
You should probably check the Message Router log files to see if there is any
information there which would help explain why ALL-IN-1 can't connect to MR.  If
everything's running on the same node, it's not likely to be network problems,
so there's no external factors which could be at fault here.
It might also be worth re-setting the A1 mailbox password, just in case some
discrepancy there is causing problems.  Then there's always the option of
shutting everything down and restarting it, in case that clears the problem.
Scott
 | 
| 2688.8 | More info | KERNEL::BURDENI |  | Tue Jun 03 1997 12:35 | 23 | 
|  |     Thanks, I have asked the customer a few more questions and the results
    are as follows.  He has message router started on both nodes, though
    there is only the MRLOGGER process running on node B.  There are
    sender and Fetchers running only on node A along with the transfer
    service.
    
    In the MRERR_ALL.INF log there is an error which matches the Sender
    error for timing (every 10 minutes) as follows :
    
    %MROUTER-I-FAILOG_LSTN_S ' date/time ', The application ALL-IN-1 on
    nodes CHECC1 identified to Mailbox A1, is sending a message.
    %EXPO-E-TEXT,!AS
    
    The CHECC1 is the cluster alias.
    
    I really should have picked this up sooner apologies. But the customer
    reported no errors in the MR logs.  Is this any help ?
    
    The systems have been rebooted a number of times since this problem
    began.
    
    Cheers
    Ivan
 | 
| 2688.9 | Maybe this will fix it | IOSG::MARSHALL |  | Wed Jun 04 1997 11:35 | 38 | 
|  | Hmm, the two error messages you're seeing don't make sense:
>> 'default sender'
>> %EMD-E-OPENERR, Error opening link to message router.
>> %MROUTER-I-FAILOG_LSTN_S ' date/time ', The application ALL-IN-1 on
>> nodes CHECC1 identified to Mailbox A1, is sending a message.
So ALL-IN-1 thinks it can't connect to Message Router, but according to the
Message Router log, ALL-IN-1 has connected and is sending a message.  Plus, as
you confirm, your messages are being sent, albeit with some delays.
My guess is that the Sender process is using the cluster alias to connect to
Message Router, so there's a 50% chance DECnet will try to connect it to node B,
where there is no Transfer Service, and that will cause the ALL-IN-1 error and a
ten minute wait.  Only when DECnet gives ALL-IN-1 a connection to node A will
the messages get through.
But it's curious the two messages occur at the same time; how many Senders are
they trying to run on node A?
You have several options to fix this:
1) Start the transfer service on node B as well.  This is in fact the only
supported configuration of the components, so is the one the customer should use.
2. Persuade ALL-IN-1 not to connect to node B.  One way you may be able to do
this is to set 'Remote MR' to 1 in A1CONFIG, and define the remote MR node to be
node A.  Note this isn't an officially supported way of doing things, and I
don't know whether there would be any unwanted side effects - I'm not suggesting
there will be, and I don't think there will be, but I'm not entirely certain
there won't be.
If you do (2), don't forget to shutdown and restart ALL-IN-1.  Also, do they use
NETWORK.DAT for mail addressing?  There are some subtle implications around that
if you change your MR node name; see topic 438 in this conference for info.
Scott
 |