Incident Management Best Practices
Let's start by defining Incident Management
Incident management defines the orchestration of personnel, technology and processes to quickly resolve IT service interruptions. When IT service is interrupted, it is imperative to restore the quality of service to normal operations as quickly as possible. Incident management’s focus is on how to return the IT service to users as quickly as possible.
Incident management distinguishes between real incidents and service requests. Service requests such as resetting passwords or providing access to technologies are minor events which do not typically disrupt the provision of service. Incident management is instead focused on the handling of major incidents.
Adopting an incident management process can appear daunting. What is important though is to realize that the process will need tools and technologies all its own to be effective.
Incident Management Best Practices - 1) Avoid email
Managing a critical incident through email is a recipe for disaster. Email provides a lack of visibility into the situation and can be easily drowned under a pile of other communications. There is no immediacy provided by using email.
Incident Management Best Practices - 2) Avoid home grown solutions
Home grown solutions are often created early on in a company’s development by developers who have not yet thought through the implications of their actions on execution speed and up-time. To be fair, some home-grown solutions are benign and do help out. However, often home-grown tools are meant to substitute for technologies that the company cannot afford at the time. Incident management tools often fall into this category.
The problem is that when home-grown tools are used for incident management purposes they often end up having significant quirks and issues that make resolving the issue take longer than it should or needs to. These quirks in the end hurt SLAs and cost the company in terms of downtime and customer loyalty.
Incident Management Best Practices - 3) Don’t skip steps
It might seem tempting when resolving an issue to go directly from learning about an issue to sending it to the project team that can resolve the incident. However, this path fails to ticket the event, catalogue it and then prioritize it. By skipping steps, any events which were previously in queue have now been pushed back and delayed.
Skipping steps also interferes with the workflow which IT teams have agreed to. By disrupting this flow, teams are susceptible to alert fatigue and missing SLAs.
Incident Management Best Practices - 4) Buy-in
It is very important to gain the buy-in of executives and upper management when trying to design a new incident management practice. Before adoption a new process, it’s important to have at least one person dedicated to the overall project management and orchestration of adherence to best practices for Incident Management.
Incident Management Best Practices - 5) Benchmarking
According to Karl Pearson, that which is measured, improves. If teams want to improve the amount of time it takes them to get an incident to the right team or to catalogue an event or to resolve an issue then they need to measure the time it currently takes for each of these processes. Subsequently, these times need to be compared to industry standard times.
The gap between industry standard times and the team’s performance provide the basis for a design of how the process can be improved.
Incident Management Best Practices - 6) Create a road map
Once the management team of IT team has benchmarked the current process, it can begin the process of creating a road map for implementation. The road map should identify how long various parts of the plan will take, who will have significant responsibilities and what processes will need to change.
Incident Management Best Practices - 7) Begin project implementation
Once a road map has been established, the project implementation can begin. A project plan needs to be created with actionable steps that are communicated all along the way.
OnPage is the perfect tool for Incident Management
OnPage’s escalation policies, redundancies, and scheduling algorithms ensure that a critical message is never missed. Infinitely more reliable and secure than emails, text messages and phone calls combined, OnPage reduces incident resolution time thereby improving productivity and advancing the digital operations of your business.
OnPage is the perfect tool for Incident Management because teams can:
- Consolidate alerts from RMMs, PSA tools, monitoring tools and IOT sensors through out of the box integrations.
- Elevate notifications to the right person on-call within seconds. Monitoring tools connected to critical systems can trigger critical alerts based on predefined criteria. Notifications are sent to your team on the OnPage application.
- Integrate OnPage with any system that sends off an e-mail notification.
- Manage your on-call team by entering them into OnPage’s full proof scheduler. OnPage Enterprise users can handle the most complex employee, team and group calendar configurations with ease.
- Escalate alerts to individuals or groups. The scheduler mitigates human error by notifying everyone in the group when a time period in the OnPage scheduler is left blank.
OnPage provides powerful integrations with mission critical systems through the industry’s easiest integration framework.
OnPage has written several relevant whitepapers that can assist you in understanding the complexities of an effective IT on-call policy.
WHITE PAPER [IT]: How To Survive Being On-Call:
WHITE PAPER [IT]: Mastering On-Call Scheduling
WHITE PAPER [IT]: 5 Ways To Conquer Alert Fatigue