The cluster status monitoring is the new 1C: Enterprise platform feature intended to increase the reliability of corporate deployments and cloud services.
It increases the server fault tolerance, protecting it from various faults that may occur in its working processes, such as execution of incorrect applied solution code in a server working process.
The goal of the cluster monitoring system is timely detection of failures and fixing them automatically.
The monitoring system is a part of the server agent process. Every 10 seconds, it scans each cluster process. A cluster can include several working servers, each managed by its own server agent.The cluster process scans are only performed by the agent that controls the central server.
All of the processes running in the cluster are scanned: both cluster managers and working processes. A process running on a working server is scanned via its server agent, so that agent availability is also checked.
The system scans each working process against the following criteria:
• Connection to the process, which must be established within 20 seconds
• Standard query (performance test, database connection test, and disk operations test)
• Memory volume used by the process
• The number of errors per query (EXCP messages per CALL messages per minute in the technological log)
• Completion of the processes that have been removed from the cluster registry, such processes must be completed within 20 minutes
The scan results are recorded to the technological log.
The monitoring system can terminate faulty processes by itself, with a prior creation of process dumps.
The cluster monitoring system is one of the several features aimed to improve the system reliability. Soon we will tell you about other enhancements that make cluster operation more predictable when network connection is lost.