[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference tuxedo::dce-products

Title:	DCE Product Information
Notice:	Kit Info - See 2.-4.
Moderator:	TUXEDO::MAZZAFERRO

Created:	Fri Jun 26 1992
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2269
Total number of notes:	10003

2227.0. "CDS dying replica" by VIRGIN::BILL (BILL is my lastname !!!) Thu Apr 24 1997 05:27

Hi friends

A customer is having strange problems with a CDS Replicas

He is using DCE1.4 on VMS 6.2

He has two CDS server, both holding CDS replicas for some
directories. Lately he discovered that CDSD was looping (100% CPU)
on the "Replica" CDS server.

He saw that a skulk was still pending, 

cdscp show server:
            Skulks Initiated = 1
            Skulks Completed = 0

The directory where the skulk was pending had the master copy on this node.
So he moved the master copy to the other server.
cdscp set dir to new epoch master Masternode exclude "hanging server"

This worked fine and the cdsd stopped looping. 

Now the questions:

cdscp show /.:/hosts/gdcw9e ( the replicated directory ) 
....
                     Timeout = :
                  Expiration = 1997-04-24-12:05:54.419
                   Extension = +1-00:00:00.000I0.000
                      MyName = /.../og.rzc2.ptt.com/hosts/gdcw9e
        CDS_DirectoryVersion = 3.0
            CDS_ReplicaState = dying replica
             CDS_ReplicaType = readonly

What does the dying replica means ? I remember in DNS was a tool
to fix such a replica ? Surgeon ?

Does the Expiration time means that this replica will disappear 
after this time ?

Is there a fast way to remove a directory which was excluded ?
If I try to write a new copy of the directory to this clearinghouse
I see that the TLOG file grows very fast, but nothing happens.
I use the command:
cdscp set dir  to new epoch  master Masterserver readonly "hanging server"

Any comments on this greatly appreciated...

Marco

T.R	Title	User	Personal Name	Date	Lines
2227.1	perhaps corrupt database	TUXEDO::ZEE	There you go.	`Thu May 01 1997 18:02`	30
	Sorry for the delay - on vacation, then sick. A corrupt database could cause the CDS server process to take 100% of the CPU. I'm not sure if DCE V1.4 has a certain database fix that would fix the above behavior. You should run the surgeon tool to -scanrx the .checkpoint file to check for any corruptions. Then you would use the tool to excise out the appropriate bad data. A previous bug caused index records to be placed incorrectly in the B-tree, so traversing the tree would result in an infinite loop. >What does the dying replica means ? I remember in DNS was a tool This is a direct result of the "cdscp set dir to new epoch" command when you exclude a clearinghouse. The replica state will change from On to Dying. After a successful skulk, the replica state should change from Dying to Dead. My guess is the skulk is not returning, perhaps because of the looping above. >Does the Expiration time means that this replica will disappear >after this time ? I believe these fields go with the attribute above the Timeout: field, probably the CDS_ParentPointer attribute. >Is there a fast way to remove a directory which was excluded ? You mean to say "replica" instead of "directory". It should be fast if the skulk is successful. --Roger
2227.2	Replica still dying...	VIRGIN::BILL	BILL is my lastname !!!	`Mon May 12 1997 10:29`	23
	Hi Roger The "poor" replica is still dying. I tried to reproduce a dying replica with following steps: - cdscp set dir to new epoch exclude a clearinghouse - cdscp delete replica from the above clear In this state I've my dying replica. As soon as I recreate the replica the state is back to on. As expected.. As far as I understand the exclude should only be used if you intend to bring the replica back to life and NOT if you'll delete it afterwards. Anyway, the customer is not able to recreate the replica (TLOG grows rapidly) neither he is able to remove the dying replica. Is there any hard way to get rid of such a replica ? Is it possible that the mentioned bug is still in the VMS CDS ? Thanks for any comment.. /Marco
2227.3		TUXEDO::ZEE	There you go.	`Mon May 12 1997 11:42`	25
	>As far as I understand the exclude should only be used if >you intend to bring the replica back to life and NOT if you'll delete >it afterwards. This is generally true, since if you wish to delete a replica, you do not need to "new epoch exclude" it first, just delete it. I have been assuming that the directory in question is not the root directory. Also, are there any other directories replicated at this clearinghouse containing the replica you wish to delete? >Anyway, the customer is not able to recreate the replica (TLOG grows rapidly) >neither he is able to remove the dying replica. Is there any hard way to >get rid of such a replica ? Is it possible that the mentioned bug is still in >the VMS CDS ? Creating or recreating a replica would cause the TLOG file to grow rapidly. In removing a dying replica, try skulking the directory and note the error if it fails. Yes, the mentioned bug may be in that version of VMS CDS, but someone from VMS DCE would need to verify that. There is the brute force method of deleting that clearinghouse altogether, but you would need to clean up all of the other directories that are replicated there. --Roger
2227.4	Database Corruption fix may not be in VMS	STAR::SWEENEY		`Mon May 12 1997 12:08`	7
	If the database corruption fix mentioned in .1 was released in Digital Unix ECO 1 for 1.3, then I do not believe OpenVMS has picked up the fix. Roger, I will contact you offline about the fix. We have all the source differences for all the ECO 1 kit changes, but are having a difficult time determining exactly which source module changes required for the database corruption fix. Dave