| Title: | DCE Product Information |
| Notice: | Kit Info - See 2.*-4.* |
| Moderator: | TUXEDO::MAZZAFERRO |
| Created: | Fri Jun 26 1992 |
| Last Modified: | Fri Jun 06 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 2269 |
| Total number of notes: | 10003 |
Hi friends
A customer is having strange problems with a CDS Replicas
He is using DCE1.4 on VMS 6.2
He has two CDS server, both holding CDS replicas for some
directories. Lately he discovered that CDSD was looping (100% CPU)
on the "Replica" CDS server.
He saw that a skulk was still pending,
cdscp show server:
Skulks Initiated = 1
Skulks Completed = 0
The directory where the skulk was pending had the master copy on this node.
So he moved the master copy to the other server.
cdscp set dir to new epoch master Masternode exclude "hanging server"
This worked fine and the cdsd stopped looping.
Now the questions:
cdscp show /.:/hosts/gdcw9e ( the replicated directory )
....
Timeout = :
Expiration = 1997-04-24-12:05:54.419
Extension = +1-00:00:00.000I0.000
MyName = /.../og.rzc2.ptt.com/hosts/gdcw9e
CDS_DirectoryVersion = 3.0
CDS_ReplicaState = dying replica
CDS_ReplicaType = readonly
What does the dying replica means ? I remember in DNS was a tool
to fix such a replica ? Surgeon ?
Does the Expiration time means that this replica will disappear
after this time ?
Is there a fast way to remove a directory which was excluded ?
If I try to write a new copy of the directory to this clearinghouse
I see that the TLOG file grows very fast, but nothing happens.
I use the command:
cdscp set dir to new epoch master Masterserver readonly "hanging server"
Any comments on this greatly appreciated...
Marco
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 2227.1 | perhaps corrupt database | TUXEDO::ZEE | There you go. | Thu May 01 1997 18:02 | 30 |
Sorry for the delay - on vacation, then sick. A corrupt database could cause the CDS server process to take 100% of the CPU. I'm not sure if DCE V1.4 has a certain database fix that would fix the above behavior. You should run the surgeon tool to -scanrx the .checkpoint file to check for any corruptions. Then you would use the tool to excise out the appropriate bad data. A previous bug caused index records to be placed incorrectly in the B-tree, so traversing the tree would result in an infinite loop. >What does the dying replica means ? I remember in DNS was a tool This is a direct result of the "cdscp set dir to new epoch" command when you exclude a clearinghouse. The replica state will change from On to Dying. After a successful skulk, the replica state should change from Dying to Dead. My guess is the skulk is not returning, perhaps because of the looping above. >Does the Expiration time means that this replica will disappear >after this time ? I believe these fields go with the attribute above the Timeout: field, probably the CDS_ParentPointer attribute. >Is there a fast way to remove a directory which was excluded ? You mean to say "replica" instead of "directory". It should be fast if the skulk is successful. --Roger | |||||
| 2227.2 | Replica still dying... | VIRGIN::BILL | BILL is my lastname !!! | Mon May 12 1997 10:29 | 23 |
Hi Roger The "poor" replica is still dying. I tried to reproduce a dying replica with following steps: - cdscp set dir to new epoch exclude a clearinghouse - cdscp delete replica from the above clear In this state I've my dying replica. As soon as I recreate the replica the state is back to on. As expected.. As far as I understand the exclude should only be used if you intend to bring the replica back to life and NOT if you'll delete it afterwards. Anyway, the customer is not able to recreate the replica (TLOG grows rapidly) neither he is able to remove the dying replica. Is there any hard way to get rid of such a replica ? Is it possible that the mentioned bug is still in the VMS CDS ? Thanks for any comment.. /Marco | |||||
| 2227.3 | TUXEDO::ZEE | There you go. | Mon May 12 1997 11:42 | 25 | |
>As far as I understand the exclude should only be used if >you intend to bring the replica back to life and NOT if you'll delete >it afterwards. This is generally true, since if you wish to delete a replica, you do not need to "new epoch exclude" it first, just delete it. I have been assuming that the directory in question is not the root directory. Also, are there any other directories replicated at this clearinghouse containing the replica you wish to delete? >Anyway, the customer is not able to recreate the replica (TLOG grows rapidly) >neither he is able to remove the dying replica. Is there any hard way to >get rid of such a replica ? Is it possible that the mentioned bug is still in >the VMS CDS ? Creating or recreating a replica would cause the TLOG file to grow rapidly. In removing a dying replica, try skulking the directory and note the error if it fails. Yes, the mentioned bug may be in that version of VMS CDS, but someone from VMS DCE would need to verify that. There is the brute force method of deleting that clearinghouse altogether, but you would need to clean up all of the other directories that are replicated there. --Roger | |||||
| 2227.4 | Database Corruption fix may not be in VMS | STAR::SWEENEY | Mon May 12 1997 12:08 | 7 | |
If the database corruption fix mentioned in .1 was released in Digital Unix ECO 1 for 1.3, then I do not believe OpenVMS has picked up the fix. Roger, I will contact you offline about the fix. We have all the source differences for all the ECO 1 kit changes, but are having a difficult time determining exactly which source module changes required for the database corruption fix. Dave | |||||