incident management process

Incident Management Process Explained

Pre-Incident Prep Work

When incoming tickets are bombarding you all day long on the front lines of IT, it’s common to fall into an autopilot, “find it and fix it” mode. In fact, many standard service desk metrics encourage agents to resolve as many issues as possible, and rightfully so.

So what’s our beef with root cause analysis? Nothing, except that it’s only a fraction of the true responsibility (and misses the opportunity to add value back to the business) of the incident management process. As opposed to just reacting to problems, the true purpose of problem management is and always will be to prevent recurrence of incidents, so that IT service can be continuous and problem-free.

Setting Up Alerts

The beginning of an incident is perhaps the point where you have the most control. Most systems that are under your care will send off an alarm if something is not right. Most of these notifications are in the form of email. Emails, however, are not effective, as most inboxes bury important alerts. Emails tend to be ignored because they don’t trigger blaring, audible alarms that draws your attention. Any system that sends off an email notification should be integrated with a monitoring tool or an alerting app that can be accessed using any smartphone, anywhere.

Be Smart – Use a Smartphone

Smartphones are a miracle to those who work with random things that go bump in the night. The alternative is the antiquated pager. Pagers are unable to continue alerting until the messages are read. On the other hand, smartphones are readily available and can host apps that act like pagers.

While there are a lot of pager apps available, the key is to invest in one that continues to broadcast the alert until it is read so that a response is ensured. Moreover, if the recipient of the smartphone message is unavailable when the page is originally sent, smartphone applications can ensure that the notification continues until read. This is not the case with pagers, which are often missed if the intended recipient is unavailable or out of range.

Catalog and Map Everything

The first thing you need to do is inventory your prospect’s business processes. Ask your prospect to describe the company’s overall business model. Then, assess the contribution of each IT application to the model. This will tell you what kind of protection you need to provide and expose any related applications that will need to be protected in kind. To protect your prospective customer’s business, it’s vital that you take a high-level business view of these operations.

A seasoned IT team draws a lot of information on how to deal with incidents from past experiences. In order to have a catalog of your clients’ past incidents, you need to document them. The best way to do this is by using ticketing tools that track the progress of the incident and everything that happens to it until it’s resolved. No incident response plan is complete without clear documentation of the policies and procedures—and personnel (including you)—charged with carrying them out.

Define Roles and Act as a Team

Problem management is not a single-person issue. It takes a village and the first thing to do is define roles within the team. To properly prepare for and address incidents across the organization, a centralized incident response (IR) team should be formed. This team is responsible for analyzing security breaches and taking any necessary responsive measures. At its core, an IR team should consist of:

  • Incident response managers
  • Security analysts
  • Threat researchers

incident management process

Incident Response Managers

The IR manager oversees and prioritizes actions during the detection, analysis and containment of an incident. The manager is also responsible for conveying the special requirements of high-priority incidents to the team by judging the severity of the alerts received and passing it along to the right person. In an ideal situation, everyone would be on the same messaging platform with elevated alerting capabilities.

Security Analysts

The manager is supported by a team of security analysts that work directly with the affected network to research the time, location and details of an incident. There are two types of analysts:

  • Triage Analysts: Filter out false positives and watch for potential intrusions. The right information can then be sent to those managing the incident using a priority messaging app tied in with ticketing tool triggers.
  • Forensic Analysts: Recover key artifacts and maintain integrity of evidence to ensure a forensically sound investigation.

Threat Researchers

Threat researchers complement security analysts by providing threat intelligence and context for an incident. They are constantly combining the internet and identifying intelligence that may have been reported externally. Combining this information with company records of previous incidents, they build and maintain a database of internal intelligence. This is where ticketing tools come in, which enable you to catalog all the various aspects of the unfolding incident, so that it can be used at a later instance to provide insight.

Secure Communication Within and Across Teams is Critical

Communication during an incident should be conducted in a manner that protects the confidentiality of the information that is being disseminated. The IR manager should be the central point of all communication. Only those with a valid need-to-know should be included in communications regarding key incident details, indicators of compromise, adversary tactics and procedures.

Securing this communication, so that unsavory characters are unable to snoop your messages, is extremely vital to avoid tipping them off that an ongoing investigation is occurring. Any indication that “you’re onto them” may lead to swift changes by the attackers to further mask their activity.

Have a "Plan B"

Not having a “plan B” means not having a backup person receiving the alerts when the first person alerted is unavailable.

Establishing an effective back-up plan requires:

  • Setting up an escalation policy
  • Setting up a fail-over policy
  • Alerting across communication channels

incident management process

Set Up an Escalation Policy

Make sure your team is organized within an escalation policy. An escalation policy makes sure that if an incident is not acknowledged or resolved within a pre-determined amount of time, it will escalate to the next person. You can customize who you want to receive the alert, the amount of time to wait before escalating to the next user, and which user the alert should be escalated to.

Those who need to receive alerts are placed in one escalation group. The order in which the people are alerted should be adjusted according to who on your team wishes to be the first responder. Set “escalation intervals” (time to escalation) and “escalation factors” (the factor that stops an escalation. E.g., the message being read) to determine how the escalation policy behaves.

Incident Management Process

Set Up a Failover

If an alert is sent to an escalation group and does not reach anyone in the group, you need to have a fail-over policy in place that notifies the team leader, so that they can take the right actions. This can be as simple as sending an email with details of the unanswered alerts. Incident post-mortem reports are useful to track what happened with the alert and why it was not acknowledged.

Incident Management Process

Alert Across Communication Channels

When alerts are set up, it is imperative that alert redundancies are set in place in case the original alert fails to get delivered. Ideally, you want a system that can send the same alert to a team member’s smartphone via push notifications, SMS, automated phone calls describing the alert, or email.

Incident Management Process

Get an Incident Management Tool

While you can catalog an incident using your IT service management (ITSM) ticketing tools, there is very little you can do to better manage the incident by simply using tickets. Furthermore, with most ticketing tools’ workflows, you are only able to receive a text or email when a ticket is created. This limitation inhibits a virtuous workflow and hinders the alerting process.

The answer is to integrate ticketing tools with an incident alert management system. This way, you can convert tickets into intelligent alerts that can be sent to responder teams whenever there is a critical incident.

The incident alert management tool needs to be more than just an alerting service. Here are some requirements of any incident alert management solution:

  • High-priority notifications that bypass the silent switch on mobile
  • Ability to integrate into ticketing tools
  • Persistent, distinguishable mobile alerts
  • A digital on-call scheduler
  • An alert escalation policy
  • Fail-over options if an alert escalation fails
  • Secure messaging to aid team communication
  • High and low-priority alerting
  • The ability to track incoming and outgoing alerts and messages
  • Reporting to summarize and gain insights into historical data

incident management process

email to onpage

 

OnPage Customer Testimonial

“We service a large number of clients, and with OnPage, we are able to respond very quickly to user issues. The alerts contain all the information and the canned response feature allows us to reply quickly with predefined messages. If I need to get in touch with another technician, I can easily send a message directly from the [OnPage] app.”

Enterprise IT | OnPage Customer

More Reviews

OnPage