incident management

Why proper incident management is key to proper IT management

Proper IT management requires proper incident management. Otherwise, you court Murphy’s law at your peril. In the IT world, if a server can fail, a cache overload or traffic overload the network – it will. And the consequences are significant.

Many IT organizations face database, hardware, and software downtime, lasting short periods to shutting down the business for days. According to a January 2016 article in Network Computing on the high price of IT downtime, organizations face:

“an average of five downtime events each month, with each downtime event being expensive indeed: from $1 million a year for a typical midsize company to more than $60 million for a large enterprise.”

The major cause of this downtime is equipment failures which account for almost 40% of downtime. The second most frequent cause of downtime is human error which accounts for 25% of downtime. Cybersecurity accounts for only about 10% of this downtime. Yet in each of these cases, traditional workflows use emails to alert those in charge of downed networks. The use of email alerts assumes – falsely – that an email will get the attention of a data center manager. Yet data managers are faced with 100s of other emails per day. Clearly, an email doesn’t break through the noise and get noticed in this instance.

Best practices for effective incident management during downtime

While effective use of network monitoring tools is required to minimize the impact of downtime, using emails to provide effective response means you are expecting the person responding to the incident is sitting at their computer or hovering over their iPhone. And what happens when the servers go down at 3am? One hopes even the most devoted of employees is asleep at that hour.

Furthermore, traditional pagers are inadequate as they go off and then go silent. Pagers, when used either as an alternative to email or in addition, don’t always escalate and they don’t persistently get the attention of the necessary individual. Instead, you need data security control tools coupled with proper incident management applications. This means, that when incidents do occur the appropriate individuals are alerted and the alerts don’t stop until the requisite action happens.

Impact of having solutions in place

Mitigating downtime requires good workflows, human response and – most importantly – proper alarms to alert relevant individuals when things go wrong. Proper incident notification is crucial to effect management of IT downtime. And there’s more than just the cost savings. There’s also the savings to reputation. If a company frequently experiences downtime to its IT infrastructure, then it is courting a besmirched reputation for lacking reliability. When a company has a bad reputation, business is more difficult and costly to conduct. Much of the writing on customer service notes that it is more difficult to retain customers and important stakeholders when a company’s reputation is damaged. This, in turn, makes the costs of doing business significantly higher.

Conclusion

Of great importance in this is that while you cannot avoid every incident, you can ensure proper incident management. In their attempts to provide proper alerts when trouble raises its ugly head and things go south, heads of IT need to ensure there are proper alerts that rise above the clutter.

Want to learn more about how alerts helped one IT team drive down response time? Download our whitepaper.

OnPage Corporation

Share
Published by
OnPage Corporation

Recent Posts

PagerDuty vs Opsgenie vs OnPage (2025): Which On-Call & Alerting Tool Is Right for Your Team?

Disclosure: This comparison is based on my experience working closely with on-call workflows, incident alerting…

2 weeks ago

Top Incident Alerting and On-Call Management Software (2026 Buyer’s Guide)

Disclosure: This comparison is written by our product marketing team that works closely with IT…

2 weeks ago

AI Reliability, Part 2: When the Datacenter Becomes the Bottleneck

In Part 1, we talked about all the hidden complexity inside AI systems: the pipelines,…

3 weeks ago

OnPage Introduces Multi-Language Mobile App Localization on iOS & Android

As organizations continue to adopt OnPage across regions and operational environments, providing an experience that…

1 month ago

AI Infrastructure Is Creating a New Wave of Incidents, And Why Enterprises Need a Modern On-Call Strategy

Over the past couple of months, my entire world has felt flooded with AI breakthroughs.…

1 month ago

Manual Call Forwarding vs. Schedule-Based Call Routing: What’s the Better Way to Handle On-Call Support?

When your team shares one support number, someone has to decide who gets the calls…

2 months ago