Incident Management Process
Pre-incident Prep work
When incoming tickets are bombarding you all day long on the front lines of IT, it’s common to fall into an autopilot “find it and fix it” mode. In fact, many standard service desk metrics encourage agents to resolve as many issues as possible, and rightfully so.
So what’s my beef with root cause analysis? Nothing, except that it’s only a fraction of the true responsibility (and misses the opportunity to add value back to the business) of the Problem Management process. As opposed to just reacting to problems, the true purpose of problem management is and always will be to prevent recurrence of incidents, so that IT service can be continuous and problem-free.
Setting up alerts
The beginning of an incident is perhaps the point where you have the most control. Most systems that are under your care will send off an alarm if something is not right. Most of these notifications are in the form of email. Emails however are not effective as most inboxes bury important alert. Emails tends to be easily ignored because they don’t come with a blaring audible alarm that draws your attention. Any system that sends off an email notification should be integrated with a monitoring tool or an alerting app that can be accessed using any smartphone, anywhere.
Be smart – use a smartphone
Smartphones are a miracle to those who work with random things that go bump in the night. The alternative is the antiquated pager. Pagers are unable to continue alerting until the messages are read. Smartphones on the other hand are readily available and can host apps that act like pagers.
While there are a lot of pager apps out there the key is to get one that continues to broadcast the alert until it is read so that a response is ensured. Moreover, if the recipient of the smartphone message is unavailable when the page is originally sent, smartphone applications can ensure that the notification continues until read. This is not the case with pagers which are often missed if the intended recipient is unavailable or out of range.
Catalog and Map everything
The first thing you need to do is inventory your prospect’s business processes. Ask your prospect to describe the company’s overall business model. Then assess the contribution of each IT application to the model. This will tell you what kind of protection you need to provide and expose any related applications that will need to be protected in kind. To protect your prospective customer’s business, it’s vital that you take a high-level, business view of these operations.
A seasoned IT Team draws a lot of information on how to deal with incidents from past experiences. In order to have a catalog of all your clients past incidents you need to document them. The best way to do this is by using a ticketing tools that tracks the progress of the incident and everything that happens to it until it’s resolved. No Incident response plan is complete without clear documentation of the policies and procedures—and personnel (including you) — charged with carrying them out.
Define roles and act as a team
Problem management is not a single person issue. It takes a village and the first thing to do is to define roles within the team. To properly prepare for and address incidents across the organization, a centralized incident response team should be formed. This team is responsible for analyzing security breaches and taking any necessary responsive measures. At its core, an IR (Incident Response) team should consist of:
Incident Response Manager:
The IR manager oversees and prioritizes actions during the detection, analysis, and containment of an incident. The manager is also responsible for conveying the special requirements of high severity incidents to the team by judging the severity of the alerts received and passing it along to the right person. In an ideal situation everyone would be on the same messaging platform with elevated alerting capability.
The manager is supported by a team of security analysts that work directly with the affected network to research the time, location, and details of an incident. There are two types of analysts:
- Triage Analysts: Filter out false positives and watch for potential intrusions. The right information can then be sent out to those managing the incident using a priority messaging app tied in with ticketing tool triggers.
- Forensic Analysts: Recover key artifacts and maintain integrity of evidence to ensure a forensically sound investigation.
Threat researchers complement security analysts by providing threat intelligence and context for an incident. They are constantly combing the internet and identifying intelligence that may have been reported externally. Combining this information with company records of previous incidents, they build and maintain a database of internal intelligence. This is where ticketing tools comes in that allow you to catalog all the various aspects of the unfolding incident so that it can be used at a later instance to provide insight.
Secure Communication Within and Across Teams is Critical
Communication during an incident should be conducted in a manner that protects the confidentiality of the information that is being disseminated. The incident response manager should be the central point of all communication and only those with a valid need-to-know should be included in communications regarding key incident details, indicators of compromise, adversary tactics, and procedures. Securing this communication so that unsavory characters are unable to snoop your messages is extremely vital to avoid tipping them off that an ongoing investigation is occurring. Any indication that ‘you’re onto them’ may lead to swift changes by the attackers to further mask their activity.
Have a PLAN B
Not having a Plan B could means not having a backup person receiving the alerts when the first person alerted is unavailable. Let’s look at some solutions:
Set up an escalation policy
Make sure your team is organized into an Escalation Policy. An Escalation Policy makes sure that if an incident is not acknowledged or resolved within a pre-determined amount of time, it will escalate to the correct user(s). You can customize who you want to receive the alert, the amount of time to wait before escalating to the next user(s), and which user(s) the alert should be escalated to.
Those who need to receive alerts are put in one escalation group. The order in which the people are alerted should ideally be adjusted according who on your team wishes to be the first responder. Set Escalation Interval (time to escalation) and Escalation Factor (the factor that stops an escalation. Ex: the message being read) to determine how the escalation policy behaves.
Set up a failover
In the event an alert is sent to an escalation group and does not reach anyone in the escalation group you need to have a fail-over policy in place that notifies either the team leader or the boss so that they can take the right actions. This can be as simple as sending an email with details of the unanswered alerts. In a post mortem of the incident, this kind of fail-over reporting will be useful to track what exactly happened with the alert and why it was left acknowledged.
Alert across communication channels
When alerts are set up it is imperative that alert redundancies are set in place in case the original alert sent through the preferred mode of communication fails to get delivered. Ideally you want a system that can send the same alert to a team member’s smartphone via push notifications, SMS and an automated phone call describing the alert or via email.
Get an incident management tool
While you can catalog an incident using your ITSM ticketing tools, there is very little you can do to better manage the incident by simply using tickets. Furthermore, with most ticketing tools’ workflows, you are only able to receive a text or email when a ticket is created. This limitation inhibits a virtuous workflow and hinders the alerting process.
The answer then is to integrate ticketing tools into an incident alert management tool that lets you convert tickets into smart alerts than can be sent out to responder teams and management whenever there is an incident.
The incident alert management tool needs to be more than just an alerting service. When you are shopping for a solution here is a checklist that serves as a guide to getting you the most out of an incident alert management tool. Here are some must haves of any incident alert management tool:
- Ability to integrate into ticketing tools
- An on-call scheduler
- An alert escalation policy
- Failover options if escalation fails
- Secure messaging to aid team communication
- High and low priority alerting
- The ability to track and incoming and outgoing alerts and messages
- Reporting – to summarize and gain insights into historical data