Incident management is the process of identifying and correcting IT incidents that threaten or interrupt a business’s services. A component of IT service management (ITSM), incident management aims to keep services running or — if they’re taken offline — restore them as quickly as possible, while minimizing the impact to the business.
An “incident,” according to Information Technology Infrastructure Library (ITIL), is “an unplanned disruption, or impending disruption, to an IT service.” By this broad description, anything from degrading network quality to running out of disk space to a cyberattack would qualify as an incident. The process of detecting and responding to security-related incidents is called security incident management.
There are numerous ways to approach incident management, and policies, tools and service-level agreements (SLAs) will vary across organizations. In general, IT teams try to prevent incidents through regular software updates, event monitoring and other practices, and they have an incident response plan in place to quickly resolve incidents and identify the root cause to prevent future occurrences.
Incident management is important because service interruptions can be extremely costly, potentially running up to hundreds of thousands of dollars per hour — not including regulatory fines and customer attrition.
In the following sections, we’ll look at the phases and best practices of incident management and how it can help organizations reduce harmful downtime.