The OnPage Guide to Mastering Escalation Management
Escalation for better communication
While effective communication can be challenging in the best of circumstances, it can be especially trying when an internal or external customer is facing an issue. The first goal of the incident management process is to restore a normal service operation as quickly as possible and to minimize the impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained.
To achieve this end, there are a number of tools that IT engineers should have at their disposal in order to expedite resolution of the issue.
- The first maxim of true incident management is to develop runbooks so that teams can manage incidents as independently as possible. Through information-sharing and judgment skill development, escalations are less likely to occur. With runbooks, engineers are clear on what steps need to be taken to effectively handle incidents and what precautions to take in responding to the situation
- The second maxim of true incident management actually suggests a tool to not use. This rule is to never use email to effectively escalate and manage an event. Escalation can easily generate an overwhelming number of emails notifications which can effectively derail the incident management process.
According to Harvard Business Review:
The only way to keep productive energy flowing through this [email] network is for everyone to continually check, send, and reply to the multitude of messages flowing past—all in an attempt to drive tasks, in an ad hoc manner, toward completion.
Email becomes the platform where all tasks get dumped – including important IT incidents whose speedy resolution is key to keeping customers happy and the business running. As such, teams should look to communicate with their colleagues on a separate messaging application that has immediacy as well as priority settings.
Challenges of digital escalation
Many IT teams define digital escalation as raising the priority of an issue by alerting the whole on-call group to an incident. Unfortunately, this practice often works to create alert overload. Sending alerts to everyone all the time can result in alerts being treated as noise. If the whole team is being alerted then the individual engineer is left to believe that another team member will respond to the alert. Hence, alerting the whole team fosters a culture of ignoring alerts or as one Google engineer chimed, “foo-alerts“.
Alert escalation best practices
- Best practices for escalating the actual alerts focus on tying alerts to a digital scheduler. That is, alerts should be tied to an on-call group schedule so only the engineer on-call receives the alert. This sort of design ensures accountability. By only alerting one engineer, the engineer knows that he or she is responsible for fixing the issue.
- Ensure there’s an escalation order and a defined escalation group to your digital scheduler. This second point is needed to ensure back-up for the escalation. If the first engineer on-call is unavailable or is occupied with another issue, the alert should escalate to the next engineer on-call.
As described above, the word escalation clues us in to the need to resolve an important issue. For IT, we need to understand how to effectively communicate and raise the message priority for effective escalations to take place. IT needs to have the tools accessible which makes these escalations possible.
To read more tips on how to master escalation management, download our whitepaper.