Agent can be monitored by SMA which is a manual activity. Support team has to spent time to monitor these teams. We would like to implement automated solution which can monitor agents in all nodes and can send a notification if agent is down or got exception.
Please suggest idea how to implement this solution.
Configured properly, agent failure will trigger two notifications from AES. First, when agent 'dies' a PEGA0010 (agent failure) alert is sent to AES. AES recognizes PEGA0010 and associates it to agent failure scorecard, and emails the agent failure scorecard to whomever has subscribed to it.
Each monitored node sends AES the count of runnable agents every two minutes in the HLTH0001 health status message. Inside AES one configures decision table rules to evaluate agent count for each node. Properly configured, you track the exact number of agents and consider it 'critical' if you have more or fewer agents than expected. The health status messages trigger health status change svcorecards