|  | The mutex is not always wait state = jib, from stars;
[OPENVMS] How to Troubleshoot a Process in MUTEX State
     Any party granted access to the following copyrighted information
     (protected under Federal Copyright Laws), pursuant to a duly executed
     Digital Service Agreement may, under the terms of such agreement copy
     all or selected portions of this information for internal use and
     distribution only. No other copying or distribution for any other
     purpose is authorized.
Copyright (c) Digital Equipment Corporation, 1994, 1995. All rights reserved.
PRODUCT:    OpenVMS Alpha, All Versions                                         
            OpenVMS VAX, All Versions
COMPONENT:  Scheduler
SOURCE:     Digital Equipment Corporation
OVERVIEW:
This is a general troubleshooting article for processes hung in the
MUTEX wait state.  See the RELATED ARTICLE section for specific
troubleshooting steps on more unique issues relating to the MUTEX
wait state.
QUESTION:
The DCL command SHOW SYSTEM shows one or more processes hung in the
MUTEX wait state.  How do you determine what the processes are waiting
for, and why they are waiting?
   $ SHOW SYSTEM
    VAX/VMS V6.1  on node COORS  10-AUG-1994....
     Pid    Process Name    State  Pri
   20E00401 SWAPPER         HIB     16
   20E03402 wahkaw::Write   LEF      5
   20E02C03 DECW$TE_2C03    LEF      6
   20E00C05 SOFTBALL MANIAC LEF      7
   20E00406 CONFIGURE       HIB     10
   20E07008 J_HASSENPFEFF   LEF      5
   20E00E46 Marty           LEF      4
   20E08647 BOO_BOO         MUTEX    4 <--- Process hung in MUTEX
   20E00E48 Harv            LEF     16
   20E04E4A Dave            LEF      4
ANSWER:
The operating system uses MUTEXes (Mutual Exclusion Semaphores) as a
synchronization technique for shared data structures that do not
require the process to be operating at elevated IPL (Interrupt
Priority Level).
A MUTEX is a data structure consisting of a longword for OpenVMS VAX
systems, and longwords or quadwords for OpenVMS Alpha systems.
   Longword Format:
    31              16              0
     +-------------+-+--------------+
     |   Status    | |  Owner Count |
     +-------------+-+--------------+
                    ^
                    |
                    +-------------------------------------------+
                                                                |
       Bit 0 or 16 = Write-Pending or Write-in-Progress flag ---+
                                                                |
   Quadword Format:                                             |
                                                                |
    31                             0                            |
     +----------------------------+-+                           |
     |         Status             | | <-------------------------+
     +----------------------------+-+
     |       Owner Count            |
     +------------------------------+
   NOTE:
     The "Status" field of a MUTEX is undefined and reserved to DIGITAL.
     The "Owner Count" field is initialized to negative 1, i.e; all "F"s,
     so that a value of 0 indicates that there is 1 owner.
A process is placed in the MUTEX state when it is unable to gain read or 
write access to a specified MUTEX.  The inability to gain access will be 
due to the write-pending or write-in-progress flag being equal to 1.
To determine what MUTEX a process is waiting for, examine the value for the 
"Event flag wait mask" field from the SDA command SHOW PROCESS.
EXAMPLE #1:  
                    Finding the Mutex
                    -----------------
The following information is a simple approach for troubleshooting a single 
process in MUTEX, with a single process blocking the acquisition of the 
mutex. For troubleshooting techniques involving multiple processes, see 
EXAMPLE #3.
1.  Invoke the System Dump Analyzer Utility (SDA) to examine the running
    system:
    $ ANALYZE/SYSTEM
2.  Read in the system definitions for SDA so that any MUTEX address
    can be interpreted.
    
    For OpenVMS Alpha
    -----------------
         SDA> READ SYS$LOADABLE_IMAGES:SYSDEF
    For OpenVMS VAX 
    ---------------
         SDA> READ SYS$SYSTEM:SYSDEF
3.  View the process on the system, noting those processes in the 
    MUTEX state.
    SDA> SHOW SUMMARY       Current process summary
    Extended Indx Process name    Username    State   Pri
    -- PID -- ---- --------------- ----------- ------- ---
    20E00401 0001 SWAPPER         SYSTEM       HIB     16
    20E03402 0002 Write_Crmp      KING         LEF      5
    20E02C03 0003 DECW$TE_2C03    SYSTEM       LEF      6
    20E00C05 0005 BASEBALL        ROCKIE       LEF      7
    20E00406 0006 CONFIGURE       SYSTEM       HIB     10
    20E07008 0008 HASSENDOODOO    BAYWTCH      LEF      5
    20E00E46 0246 Marty           MARTY        LEF      4
    20E08647 0247 BOO_BOO --+     HUNTER       MUTEX    4
    20E00E48 0248 Harv      |     HOGGIE       LEF     16
    20E04E4A 024A Dave      |     STUCKIE      LEF      4
                            +--------+
                                     |
4.  View the process hung in MUTEX.  |
                                     |
   SDA> SHOW PROCESS/INDEX=247 <-----+
  Process index: 0247   Name: BOO_BOO   Extended PID: 20E08647
  ------------------------------------------------------------
  Status : 02040001 res,phdres,inter
  Status2: 00000001 quantum_resched
  PCB address              840BE140    JIB address           83D58DC0
  PHD address              9CD08E00    Swapfile disk address 00000000
  Master internal PID      00210247    Subprocess count             0
  Internal PID             00210247    Creator internal PID  00000000
  Extended PID             20E08647    Creator extended PID  00000000
  State                       MUTEX    Termination mailbox       0000
  Current priority                7    AST's enabled             KESU
  Base priority                   4    AST's active              NONE
  UIC                [00022,000050]    AST's remaining            197
  Mutex count                     0    Buff I/O cnt/limt      100/100
  Waiting EF cluster              1    Direct I/O cnt/limt    100/100
  Starting wait time       1B001B1B    BIO byte cnt/limt  65344/65344
  Event flag wait mask     80004360    # open files allowed left   99
                              |
                              +----------+
5.  Translate the Event flag wait mask:  |
                                         |
  SDA> EXAMINE 80004360 <----------------+
  LNM$AL_MUTEX:  00010000
                    ^
                bit 16, "write" flag
  The process is waiting on the "Shared Logical Names Data Structure"
  MUTEX, LNM$AL_MUTEX, (see the list at the end of this article for
  other data structures protected by a MUTEX).  The MUTEX has a single 
  owner, i.e; Owner Count=0, who has write access to the structure, i.e;
  bit 16=1.
  NOTE:
    If the "Event flag wait mask" for the process is the same as the
    "JIB address", see another database article titled:
        [OpenVMS] Discussion Of Unusual MUTEX Wait State
6.  To see approximately how many seconds the process has been in the
    wait state, issue the following SDA command. The value you see may
    not be 100% correct due to other areas of the operating system that
    affect PCB$L_WAITIME.                                         
    SDA> EVAL (@EXE$GL_ABSTIM_TICS-@(PCB+PCB$L_WAITIME))/64
Determining why the process is blocked from gaining access to the
mutex requires that you determine which process owns the mutex.
Determining the owner is difficult because there is no owner field
defining this information.
When a process gains access to a mutex, its priority is raised to 16
to decrease the amount of time it has the resource.  The "Mutex
count" field for the process will also be incremented.  Use the SDA
command "SHOW SUMMARY" to determine which processes are at priority 16;
those processes are possibly blocking access to the mutex (ignore the
SWAPPER process, which is always at priority 16).
Isolate this list further by using the "Show Process" command, in SDA,
for those suspected processes and checking to see if their "Mutex count"
field is non-zero.
EXAMPLE #2:  
                     Examining the Suspect Process
                     -----------------------------
  (For this example we'll use the displays from the first three commands
   in the previous example.)
  Notice in Step 3 of EXAMPLE #1 that process "Harv" has a priority of
  16 (the SWAPPER process is ignored as its priority is always 16).
1.  Look at process Harv in detail, check the "Mutex count":
  SDA> SHOW PROCESS/INDEX=248
  Process index: 0248   Name: Harv   Extended PID: 20E00E48
  ---------------------------------------------------------
  Status : 02040001 res,phdres,inter
  Status2: 00040001 quantum_resched
  PCB address              840D2E00    JIB address           83ECC880
  PHD address              A7F54600    Swapfile disk address 00000000
  Master internal PID      00030248    Subprocess count             0
  Internal PID             00030248    Creator internal PID  00000000
  Extended PID             20E00E48    Creator extended PID  00000000
  State                       LEF      Termination mailbox       0000
  Current priority               16    AST's enabled             KESU
  Base priority                   4    AST's active              NONE
  UIC                [00060,000044]    AST's remaining            197
  Mutex count                     1    Buff I/O cnt/limt       99/100
  Waiting EF cluster              0    Direct I/O cnt/limt    100/100
  Starting wait time       1B001B1B    BIO byte cnt/limt  65088/65344
  Event flag wait mask     DFFFFFFF    # open files allowed left   99
  The non-zero "Mutex indicates that Harv owns a MUTEX, so this process
  is the most likely suspect to be blocking the process hung in MUTEX
  from gaining access to the data structure.
From this point you need to determine why this process is not releasing
the mutex. However, this determination is not the scope of this
article. The process is probably hung.  To continue troubleshooting
this problem see another article titled:
    [OpenVMS] How To Troubleshoot a Hung Process
EXAMPLE #3:  
              Investigating Multiple MUTEX Processes
              --------------------------------------
Typically, when a mutex problem occurs, it will affect more then a
single process. There may also be more then 1 mutex that processes
are waiting on. A single process may be blocking those processes hung
in the MUTEX state, ie; "Mutex count" field greater then 1, or
multiple processes may be hung and own a single mutex.
If multiple processes are hung in MUTEX and/or multiple processes have
a priority 16 or higher, use SDA to produce a text file that can be
searched, as opposed to using single SDA commands for each process.
Use the following commands in SDA to produce a text file for your
search:
  SDA> SET OUTPUT <filename>
  SDA> SHOW SUMMARY
  SDA> SHOW PROCESS ALL
  SDA> SET OUTPUT TT:
You may now search the text file for the "Event flag wait mask" of
all processes hung in MUTEX, and/or for those processes with a
non-zero "Mutex count" field.
List of Data Structures Protected by Mutexes:
For both OpenVMS VAX and Alpha
------------------------------
  +--------------------+-------------------------------------------+
  | SYMBOL             |        MUTEX TYPE                         |
  +--------------------+-------------------------------------------+
  | EXE$GL_CEBMTX      |  Common Event Block List                  |
  | EXE$GL_PGDYNMTX    |  Paged Dynamic Memory                     |
  | EXE$GL_GSDMTX      |  Global Section Descriptor List           |
  | UCB$L_LP_MUTEX     |  Line Printer Control Block               |
  | ORB$L_ACL_MUTEX    |  Object Rights Block Access Control List  |
  | CHANGE_MODE_MUTEX  |  System Service Database                  |
  | TFF$L_VEC_MUTEX    |  Terminal Fallback Database               |
  | CIA$GL_MUTEX       |  System Intruder List                     |
  +--------------------+-------------------------------------------+
For OpenVMS VAX
---------------
  +--------------------+-------------------------------------------+
  | SYMBOL             |        MUTEX TYPE                         |
  +--------------------+-------------------------------------------+
  | LNM$AL_MUTEX       |  Shared Logical Name Data Structures      |
  | IOC$GL_MUTEX       |  IO Database                              |
  | EXE$GL_SHMGSMTX    |  Shared Memory Global Section Descriptor  |
  | EXE$GL_SHMMBMTX    |  Shared Memory Mailbox Descriptor         |
  | EXE$GL_BASIMGMTX   |  Loadable Executive Image Data Structures |
  +--------------------+-------------------------------------------+
                                                                                
For OpenVMS Alpha
-----------------
  +--------------------+-------------------------------------------+
  | SYMBOL             |        MUTEX TYPE                         |
  +--------------------+-------------------------------------------+
  | LNM$AQ_MUTEX       |  Shared logical name data structures      |
  | IOC$GQ_MUTEX       |  I/O Database                             |
  | UCB$L_SO_MUTEX     |  Audio Device Unit Control Block          |
  | EXE$GQ_BASIMGMTX   |  Loadable Executive Image Data Structures |
  +--------------------+-------------------------------------------+ 
 
RELATED ARTICLES:
Other articles in the OPSYS database describe some specific problems
with processes in the MUTEX wait state.  These articles can be found 
using a search string of:           
                                                                              
        SHADOW_SERVER PROCESS MUTEX
        MUTEX HANG PATHWORKS 4.0
        SESSION MANAGER HANGS MUTEX CREATING APPLICATION
        DISCUSSION UNUSUAL MUTEX STATE
REFERENCES:
"VAX/VMS Internals and Data Structures, Version 5.2", 1991,
 (EY-C171E-DP)
"OpenVMS AXP Internals and Data Structures, Version 1.5", 1994,
 (EY-Q770E-DP)
"VMS System Dump Analyzer Utility Manual", April 1988, (AA-LA87A-TE),
 page(s) SDA-72
    
 |