CHARON-VAX/CHARON-AXP - OpenVMS fails with CPUSPINWAIT fatal bug check
Problem
CHARON-VAX/CHARON-AXP / OpenVMS fails with "Fatal BUG CHECK: CPUSPINWAIT, CPU spinwait timer expired
" message
Solution
When running CHARON, there are two most typical reasons that can cause this problem:
- CHARON node is overloaded and is unable to cope with workload. Upgrading hardware or CHARON flavor usually helps to resolve the issue.
The timeout which causes the bug check is managed by two internal VMS parameters and by one parameter which can be set though SYSGEN (modparams.dat).
Internal parameters, calculated automatically on OpenVMS startup, areCPU$L_TENUSEC
andCPU$L_UBDELAY
. They depend on hardware (in case of CHARON – on Intel) performance and are out of our control.
The third parameter we can manage isSGN$GL_SMP_SPINWAIT
orSGN$GL_SMP_LNGSPINWAIT
on older OpenVMS versions
VMS takes these 3 parameters, multiplies them, and uses the result to calculate the loop counter which will be used to measure the delay:(SP) = SGN$GL_SMP_SPINWAIT * CPU$L_TENUSEC * CPU$L_UBDELAY
A potential issue here is that all three source variables are LONG INT. The result (SP) is also a LONG INT. So, if the result of multiplication exceeds 2^32, it could actually result in a very small number.
Settings
We recommend reducing the value of
SMP_LNGSPINWAIT
to 1 million (1 000 000), test the system stability, and set it to 300 000 if the stability is not so good. If it’s not better with 300 000 then try with 100 000. In case the problem persists, please call Stromasys supportWe highly recommend not changing the value of
SMP_SPINWAIT
and let it to its default value of 100 000. Same forSMP_SANITY_CNT
, default value to 300.
Example / OpenVMS 7.3-2 using SYSGEN
$ MC SYSGEN
SYSGEN> SHOW /MULTI
Parameters in use: Active
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
SMP_CPUS 1 -1 0 -1 CPU bitmask
MULTIPROCESSING 3 3 0 4 Coded-value
SMP_SANITY_CNT 300 300 1 -1 10ms.
SMP_SPINWAIT 100000 100000 1 8388607 10 usec.
SMP_LNGSPINWAIT 3000000 3000000 1 8388607 10 usec.
SYSGEN> USE CURRENT
SYSGEN> SET SMP_LNGSPINWAIT 1000000
SYSGEN> WRITE CURRENT
SYSGEN> SHOW SMP_LNGSPINWAIT
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
SMP_LNGSPINWAIT 1000000 3000000 1 8388607 10 usec.
SYSGEN> EXIT
$
You should also update theMODPARAMS.DAT
file and runAUTOGEN
to store the new value that must survive a reboot. Please refer to your OpenVMS version documentation..
Definitions (from HP OpenVMS Systems Documentation)
SMP_LNGSPINWAIT: certain shared resources in a multiprocessing system take longer to become available than allowed by the SMP_SPINWAIT parameter. SMP_LNGSPINWAIT establishes, in 10-microsecond intervals, the length of time a processor in a multiprocessing system waits for these resources. A timeout causes a CPUSPINWAIT bugcheck.
SMP_SPINWAIT establishes, in 10-microsecond intervals, the amount of time a CPU in an SMP system normally waits for access to a shared resource. This process is called spinwaiting. A timeout causes a CPUSPINWAIT bugcheck.
Related articles
© Stromasys, 1999-2024 - All the information is provided on the best effort basis, and might be changed anytime without notice. Information provided does not mean Stromasys commitment to any features described.