[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssag::ask_ssag

Title:	Ask the Storage Architecture Group
Notice:	Check out our web page at http://www-starch.shr.dec.com
Moderator:	SSAG::TERZAN

Created:	Wed Oct 15 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	6756
Total number of notes:	25276

6736.0. "Rz29 Alternate cylinder???" by MSAM03::RAHMAN () Fri May 30 1997 06:28

Hi,
1. I need clarification why the total cylinder of RZ29B-VA became less one
after a "retry exausted" error as shown in uerf entry.

THe disklayout before the error:

-----------------------------------------------------
# /dev/rrzc33c:
type: SCSI
disk: HSZ40
label: 
flags:
bytes/sector: 512
sectors/track: 113
tracks/cylinder: 20
sectors/cylinder: 2260
cylinders: 3707
sectors/unit: 8378028
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0		# milliseconds
track-to-track seek: 0	# milliseconds
drivedata: 0 

8 partitions:
#        size   offset    fstype   [fsize bsize   cpg]
  a:       32        0    unused     1024  8192       	# (Cyl.    0 - 0*)
  b:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  c:  8378028        0    unused     1024  8192       	# (Cyl.    0 - 3707*)
  d:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  e:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  f:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  g:  4188998       32    unused     1024  8192       	# (Cyl.    0*- 1853*)
  h:  4188998  4189030    unused     1024  8192       	# (Cyl. 1853*- 3707*)
-----------------------------------------------------

Then the SAP R3 under Informix RDBMS performed some works. After some time, 
Informix DB crashed due to chunk offline problem.
Informix uses partition g and h for its raw devices.


When checked in the uerf, the following output was found:

						  uerf version 4.2-011 (122)
********************************* ENTRY     2. *********************************

----- EVENT INFORMATION -----

EVENT CLASS                             ERROR EVENT 
OS EVENT TYPE                  199.     CAM SCSI 
SEQUENCE NUMBER                 16.
OPERATING SYSTEM                        DEC OSF/1 
OCCURRED/LOGGED ON                      Thu May 29 16:43:51 1997
OCCURRED ON SYSTEM                      posmal1 
SYSTEM ID                 x00070016
SYSTYPE                   x00000000
PROCESSOR COUNT                  2.
PROCESSOR WHO LOGGED      x00000000

----- UNIT INFORMATION -----

CLASS                         x0000     DISK 
SUBSYSTEM                     x0000     DISK 
BUS #                         x0004
                              x0112     LUN x2
                                        TARGET x1

----- CAM STRING -----

ROUTINE NAME                            cdisk_complete 

----- CAM STRING -----

                                        Retries Exhausted 

----- CAM STRING -----

ERROR TYPE                              Hard Error Detected 

----- CAM STRING -----

DEVICE NAME                             DEC     HSZ4 

----- CAM STRING -----

                                        Active CCB at time of error 

----- CAM STRING -----

                                        CCB request completed with an error 
ERROR - os_std, os_type = 11, std_type = 10


----- ENT_CCB_SCSIIO -----

*MY ADDR                  x3FE29B28
CCB LENGTH                    x00C0
FUNC CODE            x01
CAM_STATUS                    x0084     CAM_REQ_CMP_ERR 
                                        AUTOSNS_VALID 
PATH ID              4.
TARGET ID            2.
TARGET LUN           2.
CAM FLAGS                 x00000442
                                        CAM_QUEUE_ENABLE 
                                        CAM_DIR_IN 
                                        CAM_SIM_QFRZDIS 
*PDRV_PTR                 x3FE29828
*NEXT_CCB                 x00000000
*REQ_MAP                  x3FE08400
VOID (*CAM_CBFCNP)()      x00526660
*DATA_PTR                 x400A5828
DXFER_LEN                 x00002000
*SENSE_PTR                x3FE29850
SENSE_LEN            xA0
CDB_LEN              x0A
SGLIST_CNT                    x0000
CAM_SCSI_STATUS               x0002     SCSI_STAT_CHECK_CONDITION 
SENSE_RESID          x8E
RESID                     x00002000
CAM_CDB_IO           x000000100000ACD47F000028
CAM_TIMEOUT               x0000003C
MSGB_LEN                      x0000
VU_FLAGS                      x4000
TAG_ACTION           x20

----- CAM STRING -----

                                        Error, exception, or abnormal 
                                         _condition 

----- CAM STRING -----

                                        ILLEGAL REQUEST - Illegal request or 
                                         _CDB parameter 

----- ENT_SENSE_DATA -----

ERROR CODE                    x0070     CODE x70
SEGMENT              x00
SENSE KEY                     x0005     ILLEGAL REQ 
INFO BYTE 3          x00
INFO BYTE 2          x00
INFO BYTE 1          x00
INFO BYTE 0          x00
ADDITION LEN         x0A
CMD SPECIFIC 3       x00
CMD SPECIFIC 2       x00
CMD SPECIFIC 1       x00
CMD SPECIFIC 0       x00
ASC                  x21
ASQ                  x00
FRU                  x00
SENSE SPECIFIC       x0200C0
ADDITIONAL SENSE    
0000:   02000000  00000000  00000000  00000000        *................*
0010:   00000000  00000000  00000000  00000000        *................*
0020:   00000000  00000000  00000000  00000000        *................*
0030:   00000000  00000000  00000000  00000000        *................*
0040:   00000000  00000000  00000000  00000000        *................*
0050:   00000000  00000000  00000000  00000000        *................*
0060:   00000000  00000000  00000000  00000000        *................*
0070:   00000000  00000000  00000000  00000000        *................*
0080:   00000000  00000000  00000000  00000000        *................*
0090:   7E250000  00005E3C  00000000  00000000        *..%~<^..........*


****************END Of uerf extraction****************************
After the error, the following steps was carried out:
1) disklabel -z /dev/rrzc33c
2) disklabel -wr /dev/rrzc33c hsz40

The total cylinder has becomes less 1, ie before error=3707, 
  after error=3706.


# /dev/rrzc33c:
type: SCSI
disk: HSZ40
label: 
flags: dynamic_geometry
bytes/sector: 512
sectors/track: 113
tracks/cylinder: 20
sectors/cylinder: 2260
cylinders: 3706
sectors/unit: 8377528
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0		# milliseconds
track-to-track seek: 0	# milliseconds
drivedata: 0 

8 partitions:
#        size   offset    fstype   [fsize bsize   cpg]
  a:   200000        0    unused     1024  8192       	# (Cyl.    0 - 88*)
  b:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  c:  8377528        0    unused     1024  8192       	# (Cyl.    0 - 3706*)
  d:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  e:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  f:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  g:  4088700   200004    unused     1024  8192       	# (Cyl.   88*- 1897*)
  h:  4088700  4288708    unused     1024  8192       	# (Cyl. 1897*- 3706*)

*************End of disklabel*********************************

Can anyone explain???
I have read some notes regarding the problem with SCSI timeout error with
regards to a very busy IO disks connected to a SCSI controller. The note
mentioning that SCSI timeout error can be due to several reasons: 
1. Due to arbitration/priority of SCSI ID can cause the less priority disk to 
timeout while waiting for the higher priority disk being serviced.
2. When the application perform a long request to the disk. Informix uses raw
devices and the IO request is handled by it.
3. When the disk firmware does not cater for longer timeout setting.
4. Others.

I have a feeling, once the OS detected the harderror (as shown in the uerf)
it will mark the block as bad and therefore the disk is considered 
corrupted from Informix. Since this is not a real hardware fault, ie the
bad block was marked due to the "retry exhausted", the disk firmware
will perform a self recovery by creating an "alternate cylinder" and recover
the block on this cylinder. But since, all blocks of the disk are being
used, the creation of alternate cylinder will actually be on the informix
raw device, and therefore corrupted the informix pages and informix database
crashed as a result.

Therefore to eliminate this possibilty, I have reserved the first few 
cylinders for the alternate cylinder---if my theory is correct.
I have also reserved 4 blocks (2KB) before and after parttions g and h, to
eliminate possibility of informix to overwritten accross the disklayout
partition boundary. Infromix block size is 2KB.


Please I need the explaination for this strange behaviour/bug. If possible,
I need also some permanent solution for this problem.

Thanks in advance,

rahman ibrahim@MSA

T.R	Title	User	Personal Name	Date	Lines
6736.1	from the drive side...	SUBSYS::VIDIOT::PATENAUDE	Ask your boss for ARRAY's...	`Fri May 30 1997 13:49`	11
	No idea why you have 1 less cylnder reported. The drive does not use any type of alternate cylnder to use for revectoring. It has spare sectors pre-mapped at the factory on each track and each head that it uses. If the drive has major media/head problems and uses up all of these sectors (that are NOT included in the capacity of the device), any subsequent reassign results in an Sense Key of 04, ASC of 19 (Defect List Error). Not an Illegal Request as your log indicated. Roger.
6736.5	SCSI Timeout is an Issue for heavy IO	MSAM03::RAHMAN		`Sun Jun 01 1997 02:48`	152
	Hi Roger, That is exactly the case.. the block is not bad as u could detect it during the formating at the manufacturing. It is an ugly block, ie bad because of difficulties arise when attempting to read the block. I believe the situation that I am encountering is similar to the 2 notes I attached below. Please analyse this situation. If there is no logical explaination, then the customer has the right to change digittal's Hardware. I have checked the in /usr/include/sys/disklabel.h, about the definition of alternate sector and alternate cylinder and it seems that it is not used in /etc/disktab. Please verify the Rz29-va (is it seagate baracuda) and is the unix driver does not comply to the SCSI command from seagate? MCS engineers has verified the "suspected" disk is OK at local digital office!! I would say it is because of heavy IO, that the driver mark it as bad, and the alternate track is running out, because of so many "UGLY" block. Please look into this matter more seriously. If u need info please ask for it. I am very interested to solve this matter once and for all. Otherwiese, tommorow I walk into the customer and selling different vendors box. rahman ibrahim@MSA SSU Malaysia. 132.0">Topic #132: ``Bad RCT causes an err on BBR? I believe the term "Good" block and "Bad" block in the RCT should be clearly understood. The term "bad" generally implies unreadable If the block is deemed bad at the factory (PBN entry in the FCT) or the Formatter "detects" the block as bad, then it will format the header with header code "11", marking it unusuable. If the block header is still "00" (Good LBN) but difficulties arise attempting to read the block (continued uncorectable ECC, smashed header, etc) the block is again deemed bad. Alternate copys of the relative block will be acessed in the RCT during BBR or revector operations. There is, however, a condition I like to call "ugly". This is a block that is not bad but contains "bad data" with good ECC, EDC, etc. Alternate copies of these type blocks WILL NOT BE ACCESSED under normal circumstances. Example: K.SDI fails and "forgets the HOST/RCT boundary" and writes a data pattern into the first few blocks of the RCT during periodics, for example. This corrupts the first copy of the RCT control block. The data happens to get written with good ECC,EDC. This could have a variety of effects during host mount of that disk. Continuing on, problems arise and the Field Engineer determines the K.SDI is bad and replaces it. Good ! The disk is still corrupt but the symptoms may not be obvious. If the corruption "clobbered word 4 in the RCT (BBR control word) the symptoms appear during each attempt to ONLINE the disk (VMS Mount for example). If the P1 or P2 flags happen to be set, the system will attempt to finish a BBR that never really started. If the replaced LBN address field gets filled with this erroneous pattern, the HSC may attempt a BBR to a "non-existent" LBN and crash the HSC "Every time a mount is attempted. If undefined bits get set in the control word, the HSC will "data safety write-protect" the disk every time it is mounted. The list is endless, esp if the descriptor blocks become affected. The point is this, if blocks in the RCT get written with bad data but good ECC, then alternate copies of the blocks are NOT ACCESSED because the block is considered "good" (better term is readable, not necessarily good). I can produce these symptoms manually, and they do happen in the field, fortunately infrequently (I hope). We had two occassions of K.SDI failure in our lab (CSSE lab) that produced these very same "subtle" but serious problems. I saved the printout for one and use it during my seminar (DSA troubleshooting ) to teach FE's how to deal with logical failures usually resulting from hardware failures. Rule of thumb. If you have experienced any hardware problem that could affect the R/W data path to the disk (controller, SDI, disk electronics, you may have experienced corruption on the media, which stays around "after" the HW is resolved. I call it logical recovery. Mark Himes CX/CSSE href="5752.0">Topic #5752: ``command timeout issue '' Looks like HSJ01$DUA62 and HSJ04$DUA702 are suffering Command Timeouts; What rev firmware are they running? (If it's running V007, upgrade to 0016...if it's running 0014, then it should be OK). I've included a blitz that Roger Patenaude put out in relation to Command Timeouts. BTW, you should REALLY upgrade HSOF to V2.7 and SWEAT to X2.7Copyright (c) Digital Equipment Corporation 1995. All rights reserved. +---------------------------+TM \| \| \| \| \| \| \| \| \| d \| i \| g \| i \| t \| a \| l \| TIME DEPENDENT CASE \| \| \| \| \| \| \| \| +---------------------------+ TITLE: What are SCSI Command Timeouts Errors? AUTHOR: Roger Patenaude DATE: August 16, 1995 DTN: 237-3705 TD #: 1904 ENET: BABAGI::Patenaude CROSS REFERENCE #'s: DEPT: Storage External Products (PRISM/TIME/CLD#'s) Continuation Engineering INTENDED AUDIENCE: All PRIORITY LEVEL: 2 (U.S./EUROPE/GIA) (1=TIME CRITICAL, 2=NON-TIME CRITICAL) ===================================================================== PROBLEM: -------- The purpose of this Blitz is to give you some insight as to what a SCSI "Command Timeout" error is. I've kept this very generic as more of an informational Blitz for a change. These errors are telling you that a specific "command" did not complete in a specified period of time. This can be caused by multiple sources and in most all cases can be recovered by the host system by reissuing the failed command. Some of the reasons for "Command Timeouts" are; 1) The SCSI bus is too busy. The SCSI bus priority is designed using the drives ID in arbitration with no regard for how many times the device wins the bus. So, if you have a bus with the highest priority device doing VERY heavy workload ("hogging" the bus), then other devices on the bus will not be able to arbitrate and win the bus. These devices will then have commands outstanding that they cannot complete. The host will then log an error "command timeout" and sometime follow it with a bus reset. 2) The host issued a command to a drive that took to long to complete. This could be due to a broken device but more common is that the device is doing a long commands and does not have time to answer the host. Normal convention is the host will only ask "how things are proceeding" (as in the case where you issued a rewind to a tape drive and are waiting for it to become ready) via a Test Unit Ready command but if data type (read/write) command are continually issued to the unit this the first command can not be completed and may time out. 3) Operating system driver issues. The drivers may not be allowing reasonable enough time for the commands to complete. A case in point, VMS recently increased the command timeout values in MKDRIVER (TAPE) and DKDRIVER (DISK) (from 3 seconds to 10 in MK). This was because 3 was just to aggressive on a busy bus and command timeouts and bus resets were occurring under heavy load. 4) Device issues. The drive may not have enough horsepower to complete the commands it accepted in a reasonable amount of time. OR, the drive may be not be working on commands it has accepted because it is too busy. RZ28B's running version 003 code are one such case, the drive will optimize it's seeks by working commands that are in the local area of the heads. One side effect is that a command may timeout if it was not in the local area of where the drive is spending all it's time thus not getting serviced. RZ28B's running 006 do not have this issue. RESOLUTION/WORKAROUND: ----------------------- For the most part these are just events and should be left alone. In the rare case where this is disruptive due to resets occurring, review the four points above and see how they fit into your environment. You may need to split heavily loaded devices between multiple busses, or you may need new firmware or maybe move a device off to another bus. ADDITIONAL COMMENTS: -------------------- None. ** DIGITAL INTERNAL USE ONLY **
6736.6	Man you are ALL over the place...	SUBSYS::VIDIOT::PATENAUDE	Ask your boss for ARRAY's...	`Mon Jun 02 1997 11:51`	45
	> That is exactly the case.. the block is not bad as u could detect it > during the formating at the manufacturing. It is an ugly block, ie > bad because of difficulties arise when attempting to read the block. Exactly WHAT case????? You got a failure in the errorlog that said; ----- CAM STRING ----- ILLEGAL REQUEST - Illegal request or _CDB parameter The drive also returned status that said it got an invalid request! How are you equating that with a note about DSDF / RCT / FCT information that was written about SDI device's (RA81, RA82, RA90, etc...) and a note about command timeouts??????? > Please verify the Rz29-va (is it seagate > baracuda) and > is the unix driver does not comply to the SCSI command from seagate? It is a Seagate drive and YOU can dig through UNIX drivers. Not I. > MCS engineers has verified the "suspected" disk is OK at local digital > office!! > So it's probably not the drive ;^) > Please look into this matter more seriously. If u need info please ask > for it. NOTES IS NOT AN ESCALATION PATH!!!!! You need to look at this more seriously and follow proper escalation to get this looked at. Have you tried any local sales and service support folk? (Don't answer, rhetorical question) > I am very interested to solve this matter once and for all. Otherwiese, > tommorow I walk into the customer and selling different vendors box. UNBELIEVABLE!!!! You have what most likely is a SOFTWARE problem and you are about to condem our hardware. Unbelievable is all I can say. Glad I only have 250 shares of DEC stock as of today with this mindset. roger.
6736.7	Help is needed......	MSAM03::RAHMAN		`Mon Jun 02 1997 20:09`	8
	Thanks for ur response to the problem. Opp! Sorry this is not the ESCALATION.... path. I will be more careful next time. However thanks for ur time in looking into my problem. I will escalate this problem to our support people. Rahman
6736.8	Roger is right: escalate it	SUBSYS::BROWN	SCSI and DSSI advice given cheerfully	`Tue Jun 03 1997 07:19`	19
	I don't think it's clear whether this is a software problem or a configuration error. The SCSI sense data is 05/21/00, which means the software attempted to read a block beyond the drive's capacity. Now, we know the capacity after the error was smaller than the capacity before the error. We know the blocks being read (16 blocks, starting at 0x7fd4ac) were within the drive's capacity before the error, and outside the capacity after the error. We don't know when the capacity changed, or who changed it. The obvious candidates are: - the Informix software - the HSZ40 controller - a bus reset, causing the drive to return to the most recently saved capacity It may take a fair amount of time and engineering support to find the cause. Please escalate, so the right people can be identified and assigned.
6736.9	notes collision	WRKSYS::HOUSE	Kenny House, Workstations Engineering	`Tue Jun 03 1997 07:23`	26
	So far as I can tell, there are two issues in the basenote. (1) The error log is quite explicit about the HSZ40's complaining about an out-of-range logical block address used by a READ(10) command. The LBA requested was 8377516(decimal), although the number of sectors claimed in the disklabel was 8378028(decimal). (2) Writing over the disklabel changed the geometry, so that the number of sectors is now 8377528(decimal). Note that the flags now have "dynamic_geometry" set, too. The whole concept of a simple sector/head/track geometry is an industry-wide falsehood. Zoned drives (with different number of sectors per track) and RAID volumes, for example, do not have this structure. It would be nice, however, if all logical blocks on this "geometry" were addressable -- this does not seem to be the case in (1) above. Do SAP or Informix bypass the normal file structure to get to the raw drive? Are they likely to be writing the disklabel? There is no indication of a "retry exhausted" error or "SCSI timeout" in the information presented in this note string to date. Nor is there clear evidence of a hardware problem. -- Kenny House
6736.10		SSDEVO::ROLLOW	Dr. File System's Home for Wayward Inodes.	`Tue Jun 03 1997 09:05`	13
	Many database class applications on UNIX use the raw device, it avoid any issues of whether the file system buffers the data (sync, fsync or not) and it avoids a buffer copy. If you remember that disk read and writes have to be multiples of the sector size it is also easy, using the same system calls as reading and writing files. Since Digital UNIX disklabels have been around for a few years most vendors that use raw disks have either figured out where the label is and don't use it, or require the user to partition the disk to protect the label. If this is the same disklabel that got posted to the DIGITAL_UNIX conference this morning, that's what that 32 sectors is in the A partition.
6736.11	Not broken H/W	SMURF::KNIGHT	Fred Knight	`Wed Jun 04 1997 15:06`	19
	What most likely happened, is that some user labeled this device BEFORE it was put into the HSZ40 (note that there is NO dynamic geometry in the first disklabel). Then, after installing in into the HSZ40, they just started to use it (with the WRONG disklabel). After the error, they put a NEW disklabel (now a correct one) on the media (now note that dynamic geometry IS set). And magically, it now works! The only other option is the HSZ40 firmware bug that has been BLITZed about conditions when the firmware would change the size of a volume (not common, but still possible). In both cases, NOTHING is broken in the H/W. If it's case 1, then educate your customer, if case 2, use the documented firmware workaround. Fred Knight
6736.12	Hmm, did somebody INIT SAVE_CONFIG?	SSDEVO::JACKSON	Jim Jackson	`Wed Jun 04 1997 17:46`	25
	Sure, we've seen this type of error a bunch when folks got careless about reusing disks. Here's a recipe for the problem: 1) Have a direct-connected SCSI disk. Put a filesystem on to it. 2) Move the disk to an HSZ40 3) INIT the disk from the HSZ40 console 4) ADD UNIT At this point, the host sees a disk that has a valid filesystem on it. The only problem is that the last few blocks have been lopped off by the HSZ40 to contain its metadata. One of the rules we have in our lab is if you INIT it on the HSZ, then you have to put a new filesystem on it (VMS INIT, Unix ??). Our documentation has stated for eons that you should assume that an HSZ INIT destroys the user data on the disk. disklabel value 8378028 new value 8377528 ----------------------- difference 500 500 blocks is exactly the number of blocks consumed by SAVE_CONFIG. So, in your case, it would appear that you had a JBOD with a filesystem on it, the disk got an INIT SAVE_CONFIG, and a new filesystem was not put in place.