|  |     Gentlemen,
    
    
    MTTR and MTBF figures are very interresting but are allmost meaningless 
    today. Some disks we specify MTBF is 800,000 hours! (Do you believe it?)
    Also we must have corporate approval to give them to customers.
    Digital has developed a couple of services called High Availability
    Services under the leadership of Dave Varner @OGO. He is the corporate
    business manager for this. Engineering for the AVANTO application is
    located in Shrewsbury Mass. The engineer for AVANTO is Ron Rocheleau @SHR.
    In Holland we have made availability models for hundreds of systems
    dutring the past year and we have earned lots of revenue with it. This
    is a unique capability. A short write-up of what we can do is included.
    Please contact Dave Varner @OGO for more information. This is the best
    in the IT industry. With Availability Review and Partnership services
    we have taken out the competition many times in Holland so please 
    involve Dave.
    In its simplest form AVANTO can be used to produce hardware
    availability figure in a very very professional way.
    
    
    
    Best Regards,
    
    
    Aad Kooijman @ UTO (The Netherlands, which is over in Europe)
    Business manager High Availability Services
                                                
    AVANTO
    Over the past four years, Digitals Multivendor Customer Services
    organization has developed and frequently applied an availability
    analysis application. This application is called AVANTO (AVailability
    ANalysis TOol) and has been used successfully in practice to determine
    the availability of many hundreds of configurations. It has also been
    empirically established that the predicted results are realized
    actually in 95% of all cases. This means a unique tool is now available
    to IT managers. This of course leaves other factors unimpeded such as
    the management organization, the applications, etc. These aspects
    (domains) will be involved in an Availability Review Service conducted
    by Digital. So how do things proceed when using AVANTO?
    
    The essence of AVANTO is that it enables a system to be designed so
    that the anticipated availability can be made to correspond to the
    demands made upon it from the business. AVANTO is frequently used when
    configuring new application systems. AVANTO is an application that
    enables the availability of very complex systems to be modeled and to
    determine beforehand how the demands with respect to availability can
    be realized without working in an arbitrary way.
    In the simplest of applications it is possible to calculate the average
    availability of a systems hardware by using MTTR  and MTBF  data.
    However, as stated above, this will result in an incomplete picture as
    many other aspects will codetermine the availability in practice. One
    example is the quality of the environment in which the equipment has
    been installed. Also very important is the organization of the helpdesk
    and the underlying second and third-line support of the various
    suppliers.
    When establishing the potential availability of an existing system, it
    will be necessary to investigate how the management, the environment,
    the software and the other domains have been set up. Besides the
    hardware, all these domains influence the level of availability. By
    incorporating parameters and setting up a business scenario, AVANTO can
    show availability as a function of the business requirements.
    
    Digitals Availability Review and Partnership Services  are also
    conducted with the aid of AVANTO. An existing situation is scrutinized
    in an Availability Review and a very detailed investigation determines
    what can be done to improve availability management. Alternative
    situations can also be modelled. It is not difficult to imagine that
    this approach is much more preferable than one in which measures are
    taken more or less by guesswork, after which we have to measure what
    effects these measures have had. Moreover, the costs incurred when
    improving availability in retrospect are generally much higher than
    those associated with conducting an analysis in advance. AVANTO is now
    in structural use by a large number of organizations within Change and
    Availability Management in order to determine the availability effects
    of scheduled configuration changes in advance.
    
    
    Availability investigation
    
    Lets assume that an IT organization with a given infrastructure wants
    to determine what availability can be offered to its users. Or the
    existing availability has to be increased. Digital Multivendor Customer
    Services can provide answers to these questions based on investigation
    and with the aid of AVANTO.
    So how does such an investigation proceed?
    At the start, the customer is consulted to establish which areas
    (domains) are to be involved in the investigation. If a decision is
    taken to limit the investigation to the hardware configuration, the
    result will be of limited value. In accordance with the information
    provided by ITIL (Information Technology Infrastructure Library), 
    there are five other domains besides the hardware
    that must be involved in an Availability Review:
    1. The environment, including climatic control, power supply, service
    contracts, etc.
    2. The system management, the organization, the procedures and the
    system configuration
    3. The network
    4. The system software
    5. The applications and the application management.
    
    Only when all these domains have been exhaustively charted will it be
    possible to determine what availability can be offered. If it is found
    that the availability to be offered is inadequate, alternative
    scenarios can be formulated with the aid of AVANTO. An Availability
    Review provides a complete picture of all the aspects of availability
    management.
    
    AVANTO business model
    When carrying out an Availability Review Service it will be established
    what availability can be offered with an existing configuration or a
    new one yet to be installed. Furthermore, extensive modelling
    activities are also possible using AVANTO. Based on an initial AVANTO
    model, the reference model, we can modify the redundancies and/or the
    service contracts to determine the costs at which the desired
    availability can be realized.
    In the first place, however, it will be necessary to indicate when a
    certain availability is to be offered. In this way, a central system
    with an important database may require 100% availability during the day
    while this level will also be required for the back-up equipment during
    the evening and night. It might also be so that not all the hardware is
    important for a particular application while a different application
    does require all the hardware in the system in order to operate.
    Charting these aspects is called setting up the business model or
    business scenario in AVANTO. This business scenario is entered into
    AVANTO, which always displays the availability as a function of the
    business model.
    
    Example of a simple business scenario					
    	Day 1	Day 2	Day 3	Day 4	Day 5	Day 6	Day 7
    Shift 1	50%	50%	100%	100%	75%	50%	25%
    Shift 2	100%	100%	100%	100%	100%	50%	50%
    Shift 3	100%	100%	100%	100%	100%	50%	50%
    Shift 4	75%	75%	75%	100%	100%	50%	25%
    The percentages indicate when and to what extent availability is
    required.
    
    AVANTO enables a business scenario to be created for each part of the
    configuration. This means, for example, that a systems ideal service
    mix can be modelled.
    
    Cost of downtime
    If known, the costs of the downtime can now be entered. In several
    cases it is possible to establish which costs are associated with the
    failure of the information system. This can also be entered into AVANTO
    as part of the business scenario. If the costs of downtime cannot be
    quantified, AVANTO will express the costs of downtime in a number of
    points per hour. The customer can then call upon the assistance of his
    financial department to convert this into the costs of downtime.
    Various risk-analysis techniques are available for determining the
    costs of downtime.
    
    
      
     Figure 4  Redundancy model (Figures not included)
    Redundancy
    The diagram on the left displays the topology of a simple hardware
    configuration. This diagram shows that components A, B, D, E and F must
    function correctly for the operation of the entire chain. If component
    A fails, the chain will be broken and the application will come to a
    standstill. If component B1 fails then B2 will assume functionality.
    Here, component B has been executed redundantly and will assume the
    function from B1 automatically or via a manual procedure and vice
    versa. It will be clear that the likelihood of failure of the
    functionality of the entire chain as a consequence of component A is
    greater than as a consequence of component B. Especially if A and B are
    equally reliable. The reliability of the entire chain will decrease as
    more non-redundant components are included in the chain.
    For configurations as in Figure 1, the number of elements in the chain
    can easily rise to many dozens.
    AVANTO also offers facilities to take account of the effects of
    activating redundancies and then switching back to the normal
    situation. If a redundancy measure is to be activated, it will often
    have consequences for the performance during the switching time. These
    effects of switching the functions on and off (invocation and
    devocation) are also included within AVANTO when determining the
    eventual availability and costs of downtime. When calculating the
    average availability, AVANTO will make use of MTBF and MTTR data of all
    components within the given configuration. The calculation normally
    takes place during a simulation period of twenty years and the result
    of the calculation represents an average expectancy. This means the
    calculated average availability will be realized in 95% of the cases.
    AVANTO does not take account of unscheduled failure as a consequence of
    human actions and other completely arbitrary factors. Nor is it
    possible to express the quality of the management organization as a
    figure. It is, however, possible to incorporate operational
    characteristics of the management organization in AVANTO. For example,
    the average throughput time at the help-desk and similar types of data.
    
    AVANTO can be used to model many hundreds of components. Each component
    can then have three redundancies.
     Levels of maintenance service
    
     
    Figure 5
    
    AVANTO can be used to calculate which form of maintenance agreement
    best suits the business scenario of the particular customer. A wide
    coverage in the maintenance agreement will not necessarily benefit
    availability. In other words, the yield per extra Dollar spent on
    maintenance decreases as the coverage increases. In many cases in which
    the daytime availability must be 100%, but can be less at other times,
    it is sufficient to provide less coverage in the maintenance agreement.
    AVANTO can also calculate the optimum form of maintenance. A graph can
    be used to illustrate that an effect develops in which the added yield
    actually decreases. In this way, the customer can determine precisely
    where the optimum lies in respect of coverage in his maintenance
    agreement. The above graph in Figure 5 clearly indicates that there is
    no point for the customer to switch to a service contract providing
    more coverage than seven days at sixteen hours a day and a response
    time of four hours (7 x 16 + 4). A more expensive maintenance contract
    does not provide extra cost reductions for the downtime.
    If the customer in this example enters a maintenance agreement of six
    days a week at ten hours a day and a response time of four hours, then
    each extra guilder spent on maintenance will result in a 190 guilder
    reduction of the downtime costs.
    
    Environmental factors
    
    A large number of parameters relating to environmental factors can be
    entered in AVANTO. This particularly concerns aspects relating to the
    quality of the power supply and air conditioning, no-breaks and
    possibly even diesel generators. AVANTO sees a diesel generator as a
    redundancy measure. For aspects like no-breaks it can now be clearly
    justified whether the investment provides sufficient yield. After all,
    the downtime costs will fall as a consequence of deploying a no-break.
    
    Performance aspects
    
    An important factor when establishing availability is the performance.
    It can be argued that there will be no downtime if one user can still
    work. In reality a relation exists between performance and
    availability. AVANTO will also take account of this providing it has
    been set correctly.
    Consultants using AVANTO must very clearly understand which performance
    effects arise when redundancy measures are introduced. The loss of part
    of the configuration, such as, for example, memory may also affect
    performance. Assume that when a redundant disk comes into operation the
    database performance temporarily drops by 25%. In that case, AVANTO
    will decrease availability accordingly and include this information in
    the final result of the calculation. So when installing AVANTO, we must
    have a thorough knowledge of how the system elements and components
    operate.
    
    Software and applications
    
    There are, of course, very limited possibilities for including
    quantitative data in AVANTO on the reliability of system software and
    applications. In recent years, increasing numbers of figures have been
    made available, such as Mean Time Between Crash (MTBC). However, these
    figures may differ considerably from one situation to the next. It is
    also practically impossible to take account of such aspects as
    programming errors and software bugs. When conducting an Availability
    Review, the investigation will focus primarily on the total management
    environment and the software and application management. Attention is
    paid in particular here to the procedures for reporting problems in the
    software, the applications, the help-desk organization and the second
    and third-line support. A well-designed infrastructure for
    problem-solving will make a considerably positive contribution to the
    availability of the applications.
    In several Availability Reviews, the investigation focuses on the
    availability of a certain application in the infrastructure. When
    setting up the AVANTO model and the Fault Tree analysis for this,
    account is clearly taken of the fact that the application in question
    uses only part of the hardware. In this way, AVANTO can also be used to
    create models for a certain group of users.
    
     
    System Health Check
    
    An equally important part of the Availability Review Service is the use
    of the System Health Check (SHC). Although mentioned several times
    above, the intrinsic (hardware) availability is not the only aspect
    that is important for the availability of an information system. The
    way in which the system management is exercised is particularly
    important for the eventual result. During an Availability Review, the
    Digital consultant conducts an SHC on all the systems involved in the
    investigation. This involves scrutinizing a large number of aspects in
    the areas of:
    � Security
    � Performance
    � Capacity utilization and occupied space
    � etc.
    
    Hundreds of checks are carried out in an SHC by the specially developed
    software. The result is that a detailed fingerprint of the management
    is obtained.
    The consultant carrying out the Availability Review Service indicates
    all the bottlenecks in his report and gives concise advice on how
    certain matters might be solved. Taking note of this advice will make a
    clear contribution to improving the entire system performance in all
    the fields investigated. Please refer to the brochure on SHC for more
    information in this respect.
    
    Supplementary investigation
    
    The above description sets out a clear picture of the availability an
    IT organization can offer. An Availability Review that has been
    conducted exhaustively uses questionnaires to obtain even better
    insight into the organization of the system and network management.
    Some of the additional aspects to be scrutinized are physical security,
    reporting, change management, problem management and other aspects that
    are allied to availability management.
    
    Conclusion
    
    The previous sections indicate the facilities available for using
    AVANTO to model availability. This document also indicates general
    aspects of the power of the Availability Review Service in combination
    with AVANTO. An Availability Review was conducted at an Australian
    bank. This involved charting the availability of a network of cash
    dispensers and particularly the availability at particular points of
    issue. Finally, AVANTO was used to formulate advice for improving the
    availability of certain points at the lowest possible costs. The method
    described for this is universally applicable and not limited to Digital
    hardware and Digital users.
    It almost goes without saying that this method in combination with
    AVANTO is completely unique within the IT sector. There is no other
    application like AVANTO.
    
    Reports are made to the customer by means of a management summary with
    recommendations, a detailed report containing the background
    information and all the detailed information from AVANTO and the System
    Health Check. All this is complemented with the results of the
    supplementary investigation.
    
    The Availability Review Service is applied to situations such as those
    set out below:
    � The IT management must guarantee a certain availability and looks for
      facilities to realize this.
    � There is uncertainty about the availability that can be offered with
      the existing infrastructure.
    � A system is to be expanded and an investigation is to determine what
      effects this may have on the availability.
    � A new application system is to be configured with a view to a
      particular availability.
    � The availability demands are approaching one hundred percent.
    � For ITIL implementations and determining Service Levels.
    
    Even when conducting an Availability Review in combination with the
    application of AVANTO, it will be possible to configure an application
    system so that a predetermined objective relating to availability can
    also be realized. In practice, designs have already been made of
    systems that exhibit fewer than an average four hours downtime
    (intrinsic) per year. Of course, the management organization must again
    comply with the high quality requirements as, for example, described in
    ITIL.
    
    Digitals Multivendor Customer Services in the Netherlands are ISO 9001
    certified.
    
    
	Introduction
    Today, the availability of information systems is as natural as that of
    telephone and electricity facilities. Without information systems the
    greater part of todays economic activities would come to a standstill.
    Increasing numbers of business managers are aware of this and are
    taking measures to help safeguard the availability of the information
    supply. Government also plays a role here with, among other things, its
    publication of the Code for information security . One of the
    objectives of this Code is the promotion of business confidence. This
    objective clearly indicates the relation between economic activity on
    the one hand and supply of information using automated systems on the
    other. Indeed, many companies and government organizations rely
    entirely on information systems for their operations. In everyday
    situations however, very little account is unfortunately taken of the
    availability requirements that have to be imposed when developing and
    configuring new application systems. Availability management, if
    already deployed explicitly, is almost always limited in practice to
    subsequent measurements and adjustments where this is possible.
    The computer industry has been successful in developing fault-tolerant
    systems for highly critical applications, which systems can operate
    practically without unscheduled interruptions. Fault-tolerant systems
    are frequently deployed particularly by financial institutions and
    logistics organizations. However, these systems are relatively
    expensive and the alternatives are limited.
    In addition, for many years there has been a trend among suppliers to
    design normal systems that can offer very high availability. Increasing
    numbers of suppliers are also providing lifelong guarantees for certain
    components. Several years ago personal computers used to break down
    with frightening regularity, but today, we anticipate that the
    technical life span of our PCs will far outstrip their economic life
    expectancy. There are also numerous possibilities for incorporating
    redundancy into computer configurations and the use of RAID technology
    is also applied with increasing frequency.
    There is on the one hand a greater need for information systems
    configured to provide high to very high availability, while on the
    other suppliers are offering more and more facilities for building
    reliable systems. However, the question that information managers are
    having to answer with increasing regularity is: How can I determine in
    advance what kind of availability I should offer my users? This is
    partly attributable to the development in which functional/business
    management determines the conditions to which the supply of information
    must comply. And this, of course, at the lowest possible cost.
    This is complicated by the fact that until recently there was no
    effectiveway to determine the availability of a complex information
    system before it was actually implemented. In addition to the many
    organizational aspects, the book titled Availability Management from
    the ITIL series only ventures on a mathematical approach to this
    problem in Appendix B.
    In practice, when establishing the availability figures in Service
    Level Agreements in advance, we usually take an arbitrary approach.
    Based on experience, the inclusion of a considerable dose of
    redundancy, negotiations and a sound maintenance agreement, it is
    thought that a certain guarantee can be provided to the users. Practice
    must then indicate the level of possibility for achieving the agreed
    availability and how adjustments can be made if the availability is
    inadequate. This situation is far from ideal since adjustments are
    almost always associated with a large number of frustrations,
    unexpected and often high costs. If this relates to strategic
    applications, it is an unacceptable and also unnecessary state of
    affairs. Evidently the demand for a well-established foundation for the
    availability to be expected will become increasingly important.
    The solution, however, is not simply a well-designed configuration. We
    only need read a few publications on this subject. The reasons for
    unscheduled failure of information systems can be significantly
    attributed to aspects such as environmental factors, service,
    management, the applications and the network. We will therefore need to
    take a holistic approach if we wish to find out more about the subject
    of availability and not simply take the hardware into account.
    It is not without reason that ITIL has become very popular. If we look
    at management models described in ITIL, it will become clear that the
    success or failure of availability depends on the entire system of
    measures, methods and procedures. All this must be complemented with
    the clear safeguarding of quality and security measures. Indeed, good
    security is the wall surrounding our availability. If we examine the
    situation closely, the only functions of the management organization
    are making and keeping available the applications necessary for the
    business.
    A hardware configuration is at the basis of a high availability, and
    this hardware can offer a certain availability. This basic availability
    is called the intrinsic availability of a system. Intrinsic
    availability is always better than or the same as the actual
    availability to be realized for the user.
    A system or infrastructure will only be able to approach its intrinsic
    or nominal availability if the management and all other external
    factors are at an optimum. According to ITIL, these external factors
    are the environment, the software, the network, the applications and
    the management of the entire system.
    Investigations conducted during the mid-nineteen-eighties demonstrated
    that hardware is accountable for only about twenty percent of all
    instances of non-availability. This only holds true if we look at all
    the causes of system failure. The greatest threat to the organization,
    however, is based on unscheduled failure. Most of this unscheduled
    failure is clearly attributable to the hardware, the service, the
    network and the environmental factors. If they are well-organized and
    tested, the management, the applications and the software will cause
    much fewer unscheduled failures.
    This is why, when we design a new information system, we must pay great
    attention to the topology of the hardware configuration. It is not
    difficult to imagine that the failure of one single hard disk may have
    serious consequences for a very large databases. An important role is
    played here not merely by the repair or replacement of the faulty unit
    but also by the time required to restore the database.
    Moreover, increasing numbers of systems are becoming part of a much
    wider infrastructure that is closely linked to the systems of other
    organizations. These may include, for example, systems for EDI,
    logistics, telephone sales and electronic payment transactions. But
    these may also include real-time applications for telecoms companies
    and within the chemical industry. The consequences of these kinds of
    computer systems being unavailable are catastrophic. The failure of
    information systems deployed in the dealing-rooms of banks may have
    disastrous consequences for the entire banking organization and extend
    far beyond the limits of that company alone. The reservation systems of
    airline companies are a case in point. In certain cases, we will need
    to design systems so that downtime will be limited to a few hours each
    year. In one situation, it was demanded that the application may never
    fail even in the event of a disaster. These very high availability
    demands are without exception prompted by the commercial importance of
    the application. We also know that as the need for availability
    increases, so the costs will increase exponentially.
    But how do we design a configuration to comply with such high
    availability requirements? Or, what must we do when we have to make a
    very critical application operational on an existing platform? Which
    measures must be met to set up the management so that we can continue
    to meet the demands that have been set? We will be able to find the
    answers to these questions using Digitals Availability Review,
    Partnership Services and with the aid of AVANTO.
    
    
    
    
 |