| T.R | Title | User | Personal Name
 | Date | Lines | 
|---|
| 1242.1 | Does 3.1 really help? | PFSVAX::WUENSCHELL |  | Wed Apr 30 1997 08:15 | 7 | 
|  |     Keith;
    	We continue to fight battery failures in the field even with
    batteries dated 97.  Are you continuing to have good results with HSOF
    3.1?  If so, this may be a solution for some of our problem customers.
    
    By the way, are there any patches to 3.1 that you know of?  A note in
    this conference says there aren't, but then I saw a reference to 3.1-6.
 | 
| 1242.2 |  | SSDEVO::THOMPSON | Paul Thompson, Colorado Springs | Wed Apr 30 1997 13:35 | 5 | 
|  | Do the 1997 batteries with which you are having problems have white labels?
If so, the manufacturing date of those batteries pre-dates 1997.  The date
on the white label shows the date that the battery was most recently
re-charged.
 | 
| 1242.3 | White label has MAR 1997 | PFSVAX::WUENSCHELL |  | Thu May 01 1997 07:57 | 6 | 
|  |     Yes, the batteries have a white label with Mar 1997 on it.  They also
    have MAR 97 stamped in black on the edge.
    Are you saying that these batteries may have problems?  How do we
    determine which 1997 batteries are good or bad?
    
    
 | 
| 1242.4 | Batteries WITHOUT white labels | SSDEVO::THOMPSON | Paul Thompson, Colorado Springs | Thu May 01 1997 14:32 | 4 | 
|  | Batteries dated 1997 that do not have a white label on the face of the battery
with this date are good.  Batteries with the date on a white label on the face
of the battery were originally manufactured in 1996 and are subject to the
problem from the vendor's manufacturing defect.
 | 
| 1242.5 |  | GEM::SHERGOLD | We are 100% sure; well almost!! | Tue May 06 1997 06:28 | 3 | 
|  |     OK Guys what about an answer to .0??? Any takers?
    
    Keith
 | 
| 1242.6 | Answers... | SSDEVO::FAVA | 4 Yrs of Eng Sch & Never Saw a Train | Tue May 06 1997 12:07 | 59 | 
|  | 	RE:  .5
	OK, I accept your challenge.
	I presume the questions in .0 that you would like answered are the 
	following:
>>
>>                                                  The strange thing was
>>    that the next mornng the HSJ reported the battery as good with the
>>    added message of "Cache battery is now sufficiently charged". Can
>>    someone explain this? 
>>
	Yes, I can.  This entire battery problem has been extremely painful
	for everyone.  It has been caused by several problems, both hardware
	and software.
	Major changes were made to the battery diagnostic in V3.1 which 
	specifically corrected many of the software issues.  We know now that 
	the diagnostic in V2.7 and V3.0 was declaring many batteries "failed" 
	when there was no problem with them at all.  One of our tests here
	in the past few days was with a set of batteries which failed 
	consistently on V3.0 and passed consistently on V3.1.  These
	batteries have a date code of 10/94 and our testing shows that they 
	would still hold up a cache for 70 - 80 hours!!
	Keep in mind, however, some of the failures detected by the software 
	were true battery failures.  The big problem was to eliminate the 
	"false" failures while still detecting true failures.
>>
>>                          Have we been replacing batteries like mad because
>>    the old battery testing routine was bad (despite the patch)? 
>>
	As I mentioned above, some, but not all, of the problems were due 
	to the software falsely declaring some batteries bad.
>>
>>                                                                 As there
>>    is always some delay in getting all our customers up to V3.1 is there a
>>    better patch we can apply to V2.7 to facilitate the same response. 
>>
	NO.
>>
>>                                                                       OR
>>    (heaven forbid :-)  ) has the battery test been fudged to cope with all
>>    these failing batteries??
>>    
	I hope this suggestion was entirely facetious.  But if there is any 
	doubt, NO!!!, the test was NOT fudged simply to pass all batteries, 
	bad ones included.  Many MONTHS of effort by both hardware and 
	software people have been spent trying to resolve this serious
	customer satisfaction problem.  No one here has treated it lightly.
	The changes in V3.1 were a big step.  However, more work is going 
	on now.  This issue is still not closed to our satisfaction.
	Hope this helps.
						Tom Fava
						Colorado Springs
 | 
| 1242.7 | Fair's fair! | GEM::SHERGOLD | We are 100% sure; well almost!! | Fri May 09 1997 09:10 | 11 | 
|  |     Tom,
    
    Thanks for the reply. Not the one I wanted but at least it is an honest
    one and we know where we are.
    
    Oh and by the way the last part was facetious but I didn't know the
    symbol for "tongue in cheek".  [ :-Q  maybe??]
    
    Regards
    	Keith
    
 | 
| 1242.8 | Cache Battery Low Messages | BSS::BERGLING |  | Thu May 22 1997 09:15 | 11 | 
|  |     I have a new twist on this.
    
    We have installed 3.1 on about 26 HSJ40's. We are getting cache battery
    low notices. When we check the J later, like 8 hours later it says:
    
    "Cache battery is now sufficiently charged"
    
    Any explanation for this?
    
    Thanks,
    Vern Bergling
 | 
| 1242.9 | Normal operation from what you describe | SSDEVO::RMCLEAN |  | Thu May 22 1997 09:49 | 4 | 
|  | Yup... When you get batteries or if you have batteries that have been supporting
the cache there is some chance that they have been discharged somewhat.
Batteries sitting on the shelf or in an unpowered module discharge naturally.
The starting low and later becoming charged is perfectly natural.
 | 
| 1242.10 | Installed Batteries | BSS::BERGLING |  | Thu May 22 1997 16:53 | 5 | 
|  |     These J's have not reported the batteries being low before. It seems to 
    be running fine and then gets a low indication. After 8-12 hours this
    changes back to normal. Is the HSJ recharging the batteries or what?
    
    Thanks,
 | 
| 1242.11 | 3.1 Crashes on low Batteries???? | BSS::BERGLING |  | Fri May 23 1997 08:42 | 198 | 
|  |     The following is a console output from one of these "J"s. It seems that
    when we get the first DRAB interrupt the J crashes. It then logs a
    number of failure codes all pointing to the cache batteries. 4 hours
    later the batteries are again sufficiently charged. 
    
    Is this crash a new feature for 3.1?
    
    The batteries will be replaced today.
    
    Vern
    	
    22:01:30 HJ2202> SHOW THIS
    00:01:29 Controller:
    00:01:29         HSJ40    (C) DEC ZG61013832 Firmware V31J-0, Hardware 
    H09
    00:01:29         Configured for dual-redundancy with ZG61013838
    00:01:29             In dual-redundant configuration
    00:01:29         SCSI address 7
    00:01:29         Time: 31-MAR-1997 14:56:04
    00:01:29 Host port:
    00:01:29         Node name: HJ2202, valid CI node 15, 16 max nodes
    00:01:29         System ID 4200100FD4C0
    00:01:29         Path A is ON
    00:01:29         Path B is ON
    00:01:29         MSCP allocation class   30
    00:01:29         TMSCP allocation class  30
    00:01:29         CI_ARBITRATION = ASYNCHRONOUS
    00:01:29         MAXIMUM_HOSTS = 15
    00:01:29 Cache:
    00:01:29         32 megabyte write cache, version 2
    00:01:29         Cache is GOOD
    00:01:29         Battery is GOOD
    00:01:29         No unflushed data in cache
    00:01:29         CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
    00:01:29         CACHE_POLICY = A
    00:01:29         NOCACHE_UPS
    00:01:29 HJ2202> SHOW FAIL
    00:01:34 Name          Storageset                     Uses            
    Used by
    00:01:34
    ----------------------------------------------------------------------
    00:01:34
    00:01:34 FAILEDSET     failedset
    00:01:34         Switches:
    00:01:34           NOAUTOSPARE
    00:01:34 HJ2202>
    01:22:45
    01:22:45 %LFL--HJ2202> --31-MAR-1997 16:17:20-- Last Failure Code:
    010B2380
    01:22:55  Occurred on 31-MAR-1997 at 16:17:20
    01:22:55  Power On Time: 0. Years, 302. Days, 0. Hours, 58. Minutes,
    17. Second
    01:22:55  Controller Model: HSJ40
    01:22:55  Serial Number: ZG61013832 Hardware Version:  H09(4F)
    01:22:55  Controller Identifier:
    01:22:55   Unique Device Number: 000961013832 Model: 40.(28) Class:
    1.(01)
    01:22:55  Firmware Version: V31J(31)
    01:22:55  Node Name: "HJ2202" CI Node Number: 15.(0F)
    01:22:55  Instance Code: 01010302
    01:22:55  Last Failure Code: 010B2380 (No Last Failure Parameters)
    01:22:55
    01:22:55  Additional information is available in Last Failure Entry: 4.
    01:23:44
    01:23:44 Copyright Digital Equipment Corporation 1993, 1997. All rights
    reserve
    01:23:44 HSJ40 Firmware version V31J-0, Hardware version  H09
    01:23:44
    01:23:44 Last fail code: 010B2380
    01:23:44
    01:23:44 Press " ?" at any time for help.
    01:23:44
    01:23:44
    01:23:44 Cache battery charge is low
    01:23:44 Write-back caching is disabled
    01:23:44 HJ2202>
    01:23:44
    01:23:44 %EVL--HJ2202> --31-MAR-1997 12:46:09-- Instance Code: 01010302
    01:23:44  Template: 1.(01)
    01:23:44  Occurred on 01-MAR-1997 at 18:13:07
    01:23:44  Power On Time: 0. Years, 302. Days, 0. Hours, 58. Minutes,
    18. Second
    01:23:44  Controller Model: HSJ40
    01:23:44  Serial Number: ZG61013832 Hardware Version:  H09(4F)
    01:23:44  Controller Identifier:
    01:23:44   Unique Device Number: 000961013832 Model: 40.(28) Class:
    1.(01)
    01:23:44  Firmware Version: V31J(31)
    01:23:44  Node Name: "HJ2202" CI Node Number: 15.(0F)
    01:23:44  Command Reference Number: 00000000 Sequence Number: 0001
    01:23:44  Instance Code: 01010302
    01:23:44  Last Failure Code: 010B2380 (No Last Failure Parameters)
    01:23:44
    01:23:44 %EVL--HJ2202> --31-MAR-1997 12:46:09-- Instance Code: 02052301
    01:23:44  Template: 18.(12)
    01:23:44  Power On Time: 0. Years, 302. Days, 0. Hours, 58. Minutes,
    20. Second
    01:23:44  Controller Model: HSJ40
    01:23:44  Serial Number: ZG61013832 Hardware Version:  H09(4F)
    01:23:44  Controller Identifier:
    01:23:44   Unique Device Number: 000961013832 Model: 40.(28) Class:
    1.(01)
    01:23:44  Firmware Version: V31J(31)
    01:23:44  Node Name: "HJ2202" CI Node Number: 15.(0F)
    01:23:44  Command Reference Number: 00000000 Sequence Number: 0002
    01:23:44  Memory Address: 00000000
    01:23:44  Instance Code: 02052301
    01:23:44 HJ2202
    01:23:44
    01:23:44 %EVL--HJ2202> --31-MAR-1997 12:46:10-- Instance Code: 024B2401
    01:23:44  Template: 20.(14)
    01:23:44  Power On Time: 0. Years, 302. Days, 0. Hours, 58. Minutes,
    20. Second
    01:23:44  Controller Model: HSJ40
    01:23:44  Serial Number: ZG61013832 Hardware Version:  H09(4F)
    01:23:44  Controller Identifier:
    01:23:44   Unique Device Number: 000961013832 Model: 40.(28) Class:
    1.(01)
    01:23:44  Firmware Version: V31J(31)
    01:23:44  Node Name: "HJ2202" CI Node Number: 15.(0F)
    01:23:44  Command Reference Number: 00000000 Sequence Number: 0003
    01:23:44  Reported via low level DRAB interrupt
    01:23:44  Memory Address: 40000000
    01:23:44  Byte Count: 0.(00000000)
    01:23:44  DRAB Registers:
    01:23:55   DSR:  00000000  CSR:  00000000 DCSR:  00000000  DER: 
    00000000  EAR:
    01:23:55   EDR:  00000000  ERR:  00000000  RSR:  00000000  CHC: 
    00000000  CMC:
    01:23:55  Diagnostic Registers:
    01:23:55   RDR0: 00000000  RDR1: 00000000  WDR0: 00000000  WDR1:
    00000000
    01:23:55  Instance Code: 024B2401
    01:23:55 HJ2202> SHOW THIS
    04:01:28 Controller:
    04:01:28         HSJ40    (C) DEC ZG61013832 Firmware V31J-0, Hardware 
    H09
    04:01:28         Configured for dual-redundancy with ZG61013838
    04:01:28             In dual-redundant configuration
    04:01:28         SCSI address 7
    04:01:28         Time: 31-MAR-1997 15:23:54
    04:01:28 Host port:
    04:01:28         Node name: HJ2202, valid CI node 15, 16 max nodes
    04:01:29         System ID 4200100FD4C0
    04:01:29         Path A is ON
    04:01:29         Path B is ON
    04:01:29         MSCP allocation class   30
    04:01:29         TMSCP allocation class  30
    04:01:29         CI_ARBITRATION = ASYNCHRONOUS
    04:01:29         MAXIMUM_HOSTS = 15
    04:01:29 Cache:
    04:01:29         32 megabyte write cache, version 2
    04:01:29         Cache is GOOD
    04:01:29         Battery is LOW
    04:01:29         No unflushed data in cache
    04:01:29         CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
    04:01:29         CACHE_POLICY = A
    04:01:29         NOCACHE_UPS
    04:01:29 Cache battery charge is low
    04:01:29 Write-back caching is disabled
    04:01:29 HJ2202> SHOW FAIL
    04:01:33 Name          Storageset                     Uses            
    Used by
    04:01:33
    ----------------------------------------------------------------------
    04:01:34
    04:01:34 FAILEDSET     failedset
    04:01:34         Switches:
    04:01:34           NOAUTOSPARE
    04:01:34 Cache battery charge is low
    04:01:34 Write-back caching is disabled
    04:01:34 HJ2202> SHOW THIS
    08:01:28 Controller:
    08:01:28         HSJ40    (C) DEC ZG61013832 Firmware V31J-0, Hardware 
    H09
    08:01:28         Configured for dual-redundancy with ZG61013838
    08:01:28             In dual-redundant configuration
    08:01:28         SCSI address 7
    08:01:28         Time: 31-MAR-1997 19:23:54
    08:01:29 Host port:
    08:01:29         Node name: HJ2202, valid CI node 15, 16 max nodes
    08:01:29         System ID 4200100FD4C0
    08:01:29         Path A is ON
    08:01:29         Path B is ON
    08:01:29         MSCP allocation class   30
    08:01:29         TMSCP allocation class  30
    08:01:29         CI_ARBITRATION = ASYNCHRONOUS
    08:01:29         MAXIMUM_HOSTS = 15
    08:01:29 Cache:
    08:01:29         32 megabyte write cache, version 2
    08:01:29         Cache is GOOD
    08:01:29         Battery is GOOD
    08:01:29         No unflushed data in cache
    08:01:29         CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
    08:01:29         CACHE_POLICY = A
    08:01:29         NOCACHE_UPS
    08:01:29 Cache battery is now sufficiently charged
    08:01:29 HJ2202>
 | 
| 1242.12 | what is the date? | SSDEVO::RMCLEAN |  | Fri May 23 1997 09:49 | 3 | 
|  | The important thing here is what is the date on the batteries?  They may well
be very near failure but they should still be able to hold up the cache for
100 hours.
 |