Enterprise incident management
Enterprise incident management
At its core, enterprise incident management is the process of providing incident management to a specific team or business. IT often faces incidents which have the potential to disrupt or waylay the team or the company. To successfully combat these critical incidents, teams need to have a deliberate process in place by which to resolve issues quickly.
Enterprise incident management brings together processes such as ticketing, alerting, escalations, reporting and documentation to the company as a whole so that there is a sustainable and repeatable process to achieve process excellence.
Purpose of Enterprise Incident Management
The main goal of the enterprise incident management process is to restore normal service operations to the enterprise as quickly as possible. By doing so, the company will minimize the adverse impact of outages on the business and ensure that the optimal level of service quality is maintained. This optimal level of service is defined as the level of service operation as defined within service level agreement (SLA) limits.
What is the enterprise incident management process?
1. Create a service level agreement (SLA)
The company creates an SLA between itself and its customer that define the path for incident priorities, escalation paths and response times.
2. An incident is identified and logged
When an incident occurs, it is identified and logged in a ticketing system so a record is kept. Ideally, the ticket will be updated along the way as the team works to resolve the issue.
3. Templates are used to categorize the issue.
The ticket is categorized according to type. For example, the ticket might be defined as a server issue or a networking issue.
4. The issue is prioritized based on severity and impact on the business.
High priority issues are prioritized above other issues based on the significant financial impact they have on the business. Low priority issues are ones that have minimal financial impact and thus are typically resolved after high and medium priority issues.
5. Issue is escalated if more technical expertise is required to resolve the issue
If the team which receives the alert needs to call in assistance from other groups in the organization or is unable to resolve the issue on their own, the issue is escalated to bring further expertise to the issue.
6. Investigation and diagnosis of the issue
By using messaging between team members and run books, It professionals are able to rapidly investigate and diagnosis the issue which is impeding the appropriate functioning of the company’s technology.
7. Resolution and recovery
Once the issue has been diagnosed, it can be resolved and service levels can return to their expected level of performance
8. Incident closure
The incident is closed. This typically happens through the ticketing system.
9. Customer survey or internal post mortem.
By bringing in a step for reflection on the process, teams are able to review the processes and steps they took to resolve the issue and see what can be done better next time.
Each of these steps is important in the creation of a clear incident management process. Skipping the steps in an attempt to resolve the issue more quickly can easily lead to overwhelming IT teams and hurting SLAs.
OnPage is the perfect tool for Enterprise Incident Management
OnPage can be implemented enterprise wide. Consolidate all enterprise alerts on to one Incident Management system hosted in a secure, SSAE-16 compliant hosting facilities across the USA. Handle enterprise wide communication through the built-in team messaging.
- Fragmented teams are no longer a Problem! The intuitive built in messaging allows for the entire ticket details to be forwarded. Get full event visibility!
- Add notes, a conference bridge number, attachments and predefined message templates to the event alert.
- OnPage “Alert-Until-Read” ensures that critical alerts are never missed.
- Follow the audit-trail to ensure a notification was read and replied to.
- The fault-proof scheduler defaults to “always full” i.e. if a person is removed from an on-call shift by mistake with no replacement, the entire team will be alerted to ensure the alert is delivered.
OnPage provides powerful integrations with mission critical systems through the industry’s easiest integration framework.
OnPage has written several relevant whitepapers that can assist you in understanding the complexities of an effective IT on-call policy.