| Title: | dec_mls_plus |
| Moderator: | SMURF::BAT |
| Created: | Mon Nov 29 1993 |
| Last Modified: | Thu Jun 05 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 534 |
| Total number of notes: | 2544 |
On an otherwise idle system, (root was the only
one logged on any node.) the following command was issued on host
sebastian:
ls -sR | sort
this resulted in the message:
NFS Slookup failed for server bashful: RPC: Timed out
./ace1/accplib/NOS/PF/CYBER/IRWD2 not found
I then logged on to the host bashful and entered the following:
cd /ace1/accplib/NOS/PF/CYBER
lsacl IRWD2
which resulted in this output:
# file:IRWD2
# owner:accplib
# group:users
user::rwx
mask::rwx
user:rtdr:r--
user:tlcsrll:r--
user:tlrcwah:r--
user:tlrcfwp:r--
user:tlrcelh:r--
user:tlruwah:r--
user:tlruelh:r--
user:tlrcfts:r--
user:tlruhab:r--
group::rwx
group:rtdr:r--
group:tlrcfts:r--
other::r-x
I subsequently entered this on sebastian:
ls -sR ./ace1/accplib/NOS/PF/CYBER/IRWD2
with this result:
7 ./ace1/accplib/NOS/PF/CYBER/IRWD2
If this were an isolated incident, we would not be concerned,
but this happens fequently, sometimes on systems that are busy,
sometimes on systems that don't appear to have much of a load.
We would appreciate any sugestions and/or fixes to prevent this
problem.
Sam
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 505.1 | patch level | RHETT::AMAN | Thu May 15 1997 15:08 | 9 | |
One note on this customer site. He currently has Level 9 patches
installed. He plans to pull and install the Level 10 patches soon.
This also applies to note #506.
Thanks,
janet
csc/cs
770-514-1050
| |||||
| 505.2 | OK | SMURF::BAT | Segui la tua beatitudine | Thu May 15 1997 18:18 | 4 |
In note 466.4, Martin says that the customer installed PK#10. If they
are saying they only have PK#9 installed, presume they de-installed
PK#10 because they were having performance problems. I never heard if
they rebuilt their TNETDB and whether the problems went away or not?
| |||||
| 505.3 | current level | RHETT::AMAN | Wed May 21 1997 16:39 | 30 | |
Here's what the customer says about the patch levels -
Janet,
Funny you should mention patch levels. We just recently
installed patch level 9. (about a month ago). The NFS timeouts seemed
to have gotten much worse. About a week after putting this patch level
in Martin Moore sent me an e-amail indicating we should have the
TNETDB file rebuilt as part of this. This was done. There were fewer
nfs timeouts, but our activity level had subsided due to other reasons.
I put patch level 10 in Sunday, May 18. We installed a new
emulator in Tuesday, May 20 early in the A.M. In the past two days
there have been a lot more activity, a lot more people logging in to
test the emulator and we have had NO nfs timeouts. We'll keep an eye
out for these, but so far so good. I will try a mltape backup tonight
that has consistently displayed the acl / memory alocation error.
If your question was to imply that these problems began at any
given patch level, the answer is we have had the mltape problem ever
since we have had the large number of user files on the system (several
months). We have nfs timeout problems off and on ever since the system
was installed. (May/June 1995). The i/o error from nfs served disk due
to unknown user indexes was only noticed in the relatively recent past.
----------
Thanks,
janet
770-514-1050
| |||||
| 505.4 | good | SMURF::BAT | Segui la tua beatitudine | Wed May 21 1997 20:15 | 15 |
Ah, now it is getting clearer.
PK#10 is the one, after you install it, you must remove your TNETDB
before rebooting. That did not apply to PK#9. I can look back at PK#9
and see if there was something NFS related that might have explained
the timeout problem, but if they aren't complaining any more then why
bother?
PK#10 contained a major change to way security attributes get hashed
into structure of the TNETDB, i.e., how the attributes are getting
stored into buckets. It was expected to improve performance... but not
if you didn't start with a fresh TNETDB.
mltape is a different problem altogether.
| |||||
| 505.5 | they're back... | RHETT::AMAN | Tue May 27 1997 10:09 | 18 | |
Hi,
This update for the customer came in Sat, 24 May 1997 12:59:44.
-------------
Janet,
In my last message I said we hadn't had an nfs timeout in 2 days.
Well there back. We have been having them sporadically since tuesday.
(And, then he goes on to describe a new and different issue with a
particular user account. I'll work on this one separately, and may
enter another note...)
------------
Thanks,
janet
| |||||
| 505.6 | more more more info, please... | SMURF::SCHOFIELD | Rick Schofield, DTN 381-0116 | Mon Jun 02 1997 09:42 | 18 |
I spoke with Janet this morning and asked her to collect some more
details on the systems in use at PAFB. Specifically, I'm looking for
deatils on which machines are NFS clients/servers and which are NIS
master/slaves/clients. I also asked if we can find out, when these
timeouts occur, which NIS server the NFS server is bound to. I'm
trying to eliminate NIS from consideration as part of this problem.
I also asked if Janet could determine if the folks at PAFB would be
willing to run with modified versions of code (NFS server/client, etc)
if we felt it would expedite the debugging process.
This just occurred to me too: Do we know if there are any entries in
the var/adm/syslog.dated/.../*.log files when these timeouts occur? We
usually look at these first-thing, but I don't see anything in the
history saying that this was/wasn't done.
Rick
| |||||
| 505.7 | information from the customer | RHETT::AMAN | Tue Jun 03 1997 21:27 | 187 | |
From the customer -
Now, we'd like to get on with our nfs timeout problems. Here is an
line of our configuration.
Our MLS+ configuration:
dumbo (16.20.40.109)
alpha 3000-900
nis master server (there is no slave server.)
fddi interface to gigaswtich via DECconcentrator
nfs client only
bashful (16.20.40.111)
alpha 3000-900
fddi interface to gigaswtich via DECconcentrator
nfs server
serving the following disks:
/dev/rz50c /ace1
/dev/rz52c /ace4
/dev/rz29c /ace5
/dev/rz61c /ace6
/dev/rz45c /audit1
bashful also exports
/usr/local
kumba (16.20.40.107)
alpha 3000-900
fddi interface to gigaswtich via DECconcentrator
nfs server
serving the following disks:
/dev/rz25c /ace2
/dev/rz29c /ace3
/dev/rz57c /ace7
/dev/rz52c /ace8
/dev/rz42c /sey
/dev/rz61c /audit
simba (16.20.40.105)
alpha 3000-900
fddi interface to gigaswtich via DECconcentrator
nfs client only
goofy, thumper (16.20.40.104), (16.20.40.101)
alpha 3000-700
fddi interface to gigaswtich via DECconcentrator
nfs client only
flower, pocohantas, flounder, pinnochio, sebastian
alpha 3000-300
thin wire interface to gigaswtich via DECrepeater
nfs client only
We also have 23 lat ports serving rs232 type lines. Only about
6 of these have dumb terminal currently attached.
----------------
We have two nfs servers (bashful and kumba). the below is a copy of
their fstab and export files.
--------------- bashful fstab
/dev/rz16a / ufs rw 1 1
/dev/rz18a /usr ufs rw 1 2
/dev/rz16b swap1 ufs sw 0 0
/dev/rz32b swap2 ufs sw 0 0
/dev/rz34b swap3 ufs sw 0 0
/dev/rz18b /var ufs rw 1 2
/dev/rz50c /ace1 ufs rw 0 2
/dev/rz52c /ace4 ufs rw 0 2
/dev/rz29c /ace5 ufs rw 0 2
/dev/rz61c /ace6 ufs rw 0 2
/dev/rz45c /audit1 ufs rw 0 2
/ace2@kumba /ace2 nfs rw,bg,hard 0 0
/ace3@kumba /ace3 nfs rw,bg,hard 0 0
/ace7@kumba /ace7 nfs rw,bg,hard 0 0
/ace8@kumba /ace8 nfs rw,bg,hard 0 0
/home@kumba /home nfs rw,bg,hard 0 0
/audit@kumba /audit nfs rw,bg,hard 0 0
/sey@kumba /sey nfs rw,bg,hard 0 0
-------------- kumba fstab
/dev/rz16a / ufs rw 1 1
/dev/rz18c /usr ufs rw 1 2
/dev/rz16b swap1 ufs sw 0 0
/dev/rz37b swap2 ufs sw 0 0
/dev/rz40b swap3 ufs sw 0 0
/dev/rz20c /home ufs rw 1 2
/dev/rz25c /ace2 ufs rw 1 2
/dev/rz33c /ris1 ufs rw 1 2
/dev/rz61c /audit ufs rw 1 2
/dev/rz57c /ace7 ufs rw 1 2
/dev/rz52c /ace8 ufs rw 1 2
/dev/rz42c /sey ufs rw 1 2
/dev/rz29c /ace3 ufs rw 1 2
/ace1@bashful /ace1 nfs rw,bg,hard 0 0
/ace4@bashful /ace4 nfs rw,bg,hard 0 0
/ace5@bashful /ace5 nfs rw,bg,hard 0 0
/ace6@bashful /ace6 nfs rw,bg,hard 0 0
/usr/local@bashful /usr/local nfs rw,bg,hard 0 0
/audit1@bashful /audit1 nfs rw,bg,hard 0 0
------------- bashful exports
/ace1 -root=0
/ace3 -root=0
/ace4 -root=0
/ace5 -root=0
/ace6 -root=0
/usr/local -root=0
/audit1 -root=0
/ace9 -root=0
/mnt1 -root=0
------------- kumba exports
/home -root=0
/ace2 -root=0
/ace3 -root=0
/audit -root=0
/ris1 -root=0
/ace7 -root=0
/ace8 -root=0
/sey -root=0
--------
The rest of the hosts have fstab and export files similar to the
following:
---------------- typical fstab
/dev/rz16a / ufs rw 1 1
/dev/rz18g /usr ufs rw 1 2
/dev/rz16b swap1 ufs sw 0 2
/dev/rz18b swap2 ufs sw 0 2
/dev/rz18a /var ufs rw 1 2
/ace1@bashful /ace1 nfs rw,bg,hard 0 0
/ace4@bashful /ace4 nfs rw,bg,hard 0 0
/ace5@bashful /ace5 nfs rw,bg,hard 0 0
/ace6@bashful /ace6 nfs rw,bg,hard 0 0
/usr/local@bashful /usr/local nfs rw,bg,hard 0 0
/audit1@bashful /audit1 nfs rw,bg,hard 0 0
/ace2@kumba /ace2 nfs rw,bg,hard 0 0
/ace3@kumba /ace3 nfs rw,bg,hard 0 0
/ace7@kumba /ace7 nfs rw,bg,hard 0 0
/ace8@kumba /ace8 nfs rw,bg,hard 0 0
/home@kumba /home nfs rw,bg,hard 0 0
/audit@kumba /audit nfs rw,bg,hard 0 0
/sey@kumba /sey nfs rw,bg,hard 0 0
-------------- typical fstab is empty
These are the yp passwd and group files from dumbo (nis server):
(I have the real copies of these files if you need them. The passwd
file has 178 accounts. The group file has 75 groups. janet)
and finally a copy of typical local passwd and group files:
(I have these files as well. The local passwd file has 24 entries. He
may have mistakenly sent the same group file twice. The one he sent as
the yp is identical to the local one. janet)
You asked about syslog.dated files. Heres the only one I found from
yesterday with any pertinent information:
---------------
Jun 2 13:12:20 bashful vmunix: NFS Slookup failed for server kumba:
RPC: Timed out
Jun 2 13:12:32 bashful vmunix: NFS Slookup failed for server kumba:
RPC: Timed out
Jun 2 13:13:41 bashful vmunix: NFS Sgetattr failed for server kumba:
RPC: Remote system error
Jun 2 13:13:42 bashful last message repeated 7 times
Jun 2 13:13:42 bashful vmunix: rfs_dispatch: dispatch error, no reply
Jun 2 13:13:42 bashful vmunix: NFS Sgetattr failed for server kumba:
RPC: Remote system error
Jun 2 13:13:42 bashful last message repeated 3 times
Jun 2 13:13:42 bashful vmunix: rfs_dispatch: dispatch error, no reply
Jun 2 13:13:44 bashful vmunix: rfs_dispatch: dispatch error, no reply
Jun 2 13:15:06 bashful vmunix: NFS Sgetattr failed for server kumba:
RPC: Timed out
------------------
Janet, I realize the difficulty in determining these type of problems
remotely. Fortunately we are not in a production environment and any
trap and/or debug code we can and will put in. So let us know what we can
do.
sam
----------------
Please let me know if you need additional information.
Thanks!
janet
770-514-1050
| |||||