Use Case
In the following example a real life scenario is introduced and it is described how it can be fixed with Reaction.
Let's assume we have a Hermes CRM application. This system suffers with a memory leak which causes the application to crash and it has to be restarted by the system administrators.
The application has two running instances, on ACME00 and on ACME01. In order to perform a correct restart both application instances have to be restarted. The application runs on Weblogic application server.
The following video shows how Reaction can help when the memory leak emerges (the
demo Docker container
contains the execution flow of the use case and the event life records of the running flow).
The
Quick set-up
page contains details how to set-up the Reaction components to fix the issue.
Short explanation
Preliminary setup
The Hermes CRM system is modelled like a tree by creating a top level (parent) system (
Hermes CRM
) and two low level (child) systems (
Hermes - ACME00
,
Hermes - ACME01
) where only the host names are set. All the common properties (e.g. the location of the log files which are the same on the server machines, maintenance window, etc.) of the two running instances are specified in the top level system.
The OS commands to be executed are specified in the execution flow which also contains if-else branches and email sending commands (first stop the Weblogic managed server on host ACME000 and check if it is stopped -> if it is not then fail the flow and send a mail -> if it is stopped correctly then start it on host ACME00 and check if it is started -> ...).
The error detector is the place where the systems and the execution flow are assigned to each other. The top level system is selected (i.e. both low level systems will be part of the log file monitoring), a pattern has to be set that will be monitored in the log files (
.*java.lang.OutOfMemoryError: Java heap space.*
) and the execution flow is chosen to be executed if the pattern is found.
The workers are installed on ACME00 and on ACME01 machines, the reader worker will observe the log file based on the data set on the top level system.
Execution flow in action
First the out of memory error is caused in Hermes CRM. As soon as it occured a mail is sent that a new incident has to be dealt with and the execution flow has to be confirmed to start. As a demonstration a faulty command is added to the flow which can be reexecuted or skipped.
Another fake command is put to the flow which holds the execution for a few seconds. The reason of it is to show if another similar incident emerges while the flow (which is suposed to fix the issue) is running then this new incident won't start the flow again (while the other one still running).