Incident management process, to transform crises into opportunities for continuous improvement

By Jennifer Montérémal • Published: July 4, 2025

From a blocked printer output to an application that's out of service, your IT system is subject to numerous incidents of varying degrees of criticality. That's why it's so important to set up an effective incident management process.

But how can you ensure the performance of your incident management procedure? What resolution steps should be defined? Is it possible to provide a satisfactory solution for the user, in line with your SLA, and within a reasonable timeframe?

In this article, Appvizer explains the principles and steps to follow, based on the ITIL framework, and reminds you of the benefits to be gained from this method of working.

What is IT incident management?

Definition of incident management

Most IT incidents are managed in accordance with the ITIL (Information Technology Infrastructure Library) standard.

This project, developed in the 1980s by the British Office of Government Commerce, is a set of documents listing the best practices to be applied in the management of IT services on a broad scale. The aim is to provide methodological support for professionals, with a view to continuous improvement.

The ITIL process covers several themes (organization of the information system, configuration management, change management, etc.), including incident management, specified as follows:

An incident is defined as any event which is not part of the standard operation of a service and which causes, or may cause, an interruption or reduction in the quality of that service.

Different types of incident

The above definition encompasses different types of incident:

Software or application incidents. Examples include

program error slowing down the user ;
application slowdown, etc.

Hardware incidents. Examples include

printer output blocked ;
hard disk nearly full, etc.

Service requests. Examples: forgotten password

forgotten password ;
request for special documentation, etc.

Incident management VS problem management

Incident management is often confused with problem management. Yet they involve different procedures.

According to ITIL, problem management is used to :

Minimize the negative impact on business of incidents and problems caused by errors in the IT infrastructure, and prevent the recurrence of incidents induced by these errors.

➡️ In other words, problem management is more proactive, while incident management is more reactive.

Nevertheless, the two processes work in parallel, with problem management operating through the identification of recurring incidents.

Why is incident management important?

A standardized process for managing your incidents generates numerous benefits for your company 🤩 :

it reduces the sometimes critical impact of incidents on the company and the business more quickly;
it greatly simplifies the procedure by avoiding, for example, back and forth emails ;
identifies recurring incidents, enabling the deployment of the problem management process mentioned above;
it improves the quality of the business knowledge base, thanks to the creation of incident handling databases;
provides transparency within the organization regarding incident resolution;
increases user and customer satisfaction, as well as the productivity of all company players.

☝️ Keep in mind that an incident management process goes beyond simply resolving an IT problem. It provides solid support for the company's business functions, reducing the number of slowdowns or stoppages that impact on sales.

Example of a 7-step IT incident management procedure

#1 Identifying and recording the incident

To begin with, you need to identify the incident, specifying :

its name and identification number
the identity of the person responsible ;
the date ;
and above all its characteristics (nature, severity and impact on operations).

👉 E.g.: a server breakdown affecting several departments will be considered a major incident, while a connection problem at a single workstation will be considered less critical.

It's up to the responsible department to record these details on the chosen medium (software, spreadsheet, form, etc.) and report it to the support teams responsible for handling it according to procedure.

#2 Incident classification and analysis

The incident is then classified according to the order of priority defined upstream and specific to your organization, depending for example on the impact on the business and the urgency of the situation.

👉 E.g.: a network failure could be classified as a "connectivity" incident, with a "high" severity level if it paralyzes the entire company.

At the same time, an initial analysis is carried out to determine the possible causes of the incident. Diagnostic tools or even previous experience can be mobilized for this assessment.

☝️ Note that if this is a service request, you must follow the procedure associated with that service.

#3 Gathering evidence

Then it's time to gather as much evidence as possible. The objective? Understand what happened, when, how and why.

We're talking here, for example, about :

system or application logs;
screenshots or videos;
error messages displayed ;
network data or metrics from monitoring tools;
any other element that can support the technical analysis.

☝️ Don't neglect this stage, as it determines the quality of the diagnosis to come, and therefore the speed of resolution.

#4 Incident investigation and diagnosis

All information relating to the incident is analyzed, with the aim of resolving it and getting it back into service as quickly as possible. The teams in charge of this work use a variety of methodologies, from log analysis to real-time testing.

👉 E.g.: if a server goes down, the team will consult event logs for critical errors, or use monitoring tools to check hardware performance.

Be aware that sometimes the first level of service is unable to resolve the incident: this triggers an escalation of incidents, i.e. their resolution is transferred to the next level.

#5 Incident resolution and return to service

Incident resolution takes various forms:

the incident is repaired immediately. It has been resolved and operations are back to normal;
a workaround has been found. Indeed, incident management must lead to the rapid restoration of services. If the system is not perfect, but makes the situation "acceptable", the process is respected.

☝️ Note that if the underlying causes of an incident are unknown, but seem to share the same origin, it is recommended to initiate a problem management process. Remember that incident and problem management flows are often crossed.

#6 Verifying resolution

Once the solution has been applied, it's time to check that everything is working as it should, by verifying that :

that the service is up and running ;
that users can resume their activities without any inconvenience;
that no side-effects have been generated.

This step is crucial to validate the effectiveness of the corrective action. It also avoids "boomerang" incidents, those that return without warning.

#7 Closing the incident

To close an incident properly, the teams in charge of the process take a number of actions:

they take care to record all the details of the incident and the time spent on it. ☝️ This documentation is used to create a searchable history to improve incident management protocols;
they inform the user of the resolution;
ensure that all solution details are clear and legible.

This level of detail reduces the risk of conflict between different stakeholders.

What about the DevOps and SRE incident management process?

In a DevOps or SRE environment, incident management takes on a whole new dimension. It's no longer just about fixing things fast: it's about ensuring the ongoing resilience of systems, while maintaining a high level of performance.

Here, you don't "wait for incidents to happen". You anticipate them, you document them, and above all, you learn from them. In other words, every bug becomes an opportunity for improvement.

👉 More concretely, the DevOps/SRE process relies on specific tools and practices:

proactive monitoring via dashboards and intelligent alerts ;
the use of observability tools (logs, traces, metrics, etc.) to diagnose problems in real time;
asynchronous communication channels (Slack, Teams, PagerDuty, etc.) to coordinate response;
runbooks for fast, stress-free action;
conducting post-incident reviews to prevent mistakes from happening again.

In this context, why is it important to put in place a solid incident management process? Because in a cloud-native environment, interruptions are costly in terms of time, money and reputation. What's more, systems have become increasingly complex and interconnected.

The human factor: a strategic issue in incident management

In most digital environments, incidents are not caused by technical failures alone. The human factor is a major cause. According to several studies, the human factor is involved in over 80% of IT incidents. A configuration error, a click on a malicious link, an incorrectly followed procedure... human error remains one of the most fragile links in the operational chain.

As a result, you need to integrate this parameter into your incident management process. It's not simply a question of correcting an error, but of understanding why it happened and how to prevent it from happening again.

👉 Implementing a human and systemic approach makes it possible to:

strengthen the culture of prevention;
encourage transparent reporting of errors;
provide targeted, ongoing training;
establish a climate of mutual trust.

Technology can fail, but it's often the human being who raises the alarm... or ignores it. By treating them as key players, you can transform incident management into a lever for continuous improvement and resilience.

Which tools are needed for incident management?

You've got a clearer picture of incident management, but perhaps you're wondering how to put all these recommendations into practice? Can you see yourself applying your incident management procedure using an Excel spreadsheet or a conventional project management tool?

Fortunately, specific software has been developed to support your teams at every stage of the incident management procedure.

To help you, take a look at our selection ✔️:

Jira. Developed by Atlassian, the Jira ticketing tool standardizes the processing of tickets opened following the reporting of an incident.
😀 Why Jira?

create tickets with a precise level of information (descriptions, severity level, etc.) and follow all the processes required to manage them ;
easily classify and prioritize bugs, and assign them to the right employee or department;
integrate your tickets into a ready-made workflow, or customize one to suit your needs and processes.

NinjaOne. NinjaOne is a complete IT asset management solution for SMEs, ETIs and large corporations.
😀 Why NinjaOne?

centrally and proactively supervise your entire IT infrastructure to detect incidents as early as possible ;
automatically and reliably apply the necessary patches to all your terminals;
store all standardized, structured documentation relating to your processes within the platform.

Octopus. Octopus is ITSM (Information Technology Service Management) software.
😀 Why Octopus?

benefit from a tool developed in line with ITIL best practices: your teams can apply them naturally without needing to master them perfectly beforehand ;
easily manage requests from your users, whether incidents or service requests;
improve preventive action thanks to a database that manages all aspects of your information systems' configuration.

Splunk Enterprise Security. Splunk Enterprise Security is a SIEM (Security information and event management) designed to support you in strengthening the security of IT systems, and in incident management.
😀 Why Splunk Enterprise Security?

benefit from a solution focused on analytics and therefore streamlining cybersecurity-related tasks ;
get real-time information thanks to customized dashboards and views;
detect incidents faster and take preventive action.

What does IT incident management mean?

Incident management, as standardized by ITIL, is a procedure that you should quickly integrate into your information system, as it promises to provide a clear and rapid response in the event of a setback.

What's more, it gradually leads to a reduction in the number of incidents by feeding your problem management processes, and thus your preventive actions.

And the good news is that everyone benefits from implementing such a working method:

technical teams work more efficiently and transparently ;
users are less affected by bugs and more satisfied with your product;
the company incurs fewer losses in the event of a critical incident.

Finally, it's worth remembering that good incident management goes hand in hand with the use of relevant tools, which support your process and save your teams precious time.

Jennifer Montérémal is Editorial Manager at Appvizer, where she helps micro and small to midsize businesses (SMBs) improve their processes and choose the right tools. A specialist in making digital transformation accessible, she has authored several hundred pieces of content (guides, comparison articles, white papers, social media posts). Her motto? Turning complex topics into clear, concrete, and immediately actionable advice for decision-makers. Fun fact: before demystifying business trends and software, Jennifer used to decipher… medieval records. Trained as a medievalist, she has kept the same rigor and analytical mindset to deliver information in a reliable, intelligible way.