

Job Monitoring is a tedious process for application support team to monitor job running status (canceled, long running), check job failure issues, perform the runbook steps (if available) for resolution of issues, create & assign the issues in ITSM tool and notify to resolver group. This takes significant effort from application support team to perform such repetitive mundane activities regularly.

Job Monitor takes over job monitoring and remediations tasks to monitor job running status, analyzes job failure issues to identify root cause, auto remediates the issues based on a runbook (if available). If an issue requires human intervention, Job Monitor creates and assigns the issue in ITSM tool and notifies the resolver group. Job Monitor reduces application support operation cost and enables support team to spend their effort in high value work.

Job Monitor Skills

Following skills are available for deployment

Skills Description
Job Monitor.Job Status Monitoring Enables proactive job monitoring based on a pre-defined schedule
Job Monitor.Job Failure Analysis Enables categorization of job failures using Machine Learning Model based on application logs or alerts/events received
Job Monitor.Job Failure Remediation Automates remediation steps of job failure issues provided a pre-defined runbook exists

The Solution

The Job Monitor automates the workflow for job monitoring, analyzes failure issues and resolves the issues based on a runbook. The solution has 3 key functional components - Data Collection, Investigation and Remediation as described in the solution diagram. The solution requires the Job Monitor to interact with the Ticket Management System. This will ingest application and infrastructure logs to gather the required data for job failure analysis. It also interacts with different source systems to implement remediation actions.

An example of Job Monitoring Workflow

Case Study