Overview
Job Monitor takes over job monitoring and remediations tasks to monitor job running status, analyzes job failure issues to identify root cause, auto remediates the issues based on a runbook (if available). If an issue requires human intervention, Job Monitor creates and assigns the issue in ITSM tool and notifies the resolver group. Job Monitor reduces application support operation cost and enables support team to spend their effort in high value work.
Job Monitor Skills
Following skills are available for deployment
Skills | Description |
---|---|
Job Monitor.Job Status Monitoring | Enables proactive job monitoring based on a pre-defined schedule |
Job Monitor.Job Failure Analysis | Enables categorization of job failures using Machine Learning Model based on application logs or alerts/events received |
Job Monitor.Job Failure Remediation | Automates remediation steps of job failure issues provided a pre-defined runbook exists |
The Solution
The Job Monitor automates the workflow for job monitoring, analyzes failure issues and resolves the issues based on a runbook. The solution has 3 key functional components - Data Collection, Investigation and Remediation as described in the solution diagram. The solution requires the Job Monitor to interact with the Ticket Management System. This will ingest application and infrastructure logs to gather the required data for job failure analysis. It also interacts with different source systems to implement remediation actions.