The Clash of Incident and Problem Management


The critical distinction between Incident Management and Problem Management can be defined by their contradictory goals. Incident Management is concerned with restoring service as quickly as possible and maintaining SLA targets, while Problem Management is concerned with finding root causes and eliminating errors from the infrastructure. Root cause investigation often requires extended periods of unplanned downtime, exactly what Incident Management is trying to avoid.

The goal of Problem Management is to minimize the adverse effect on the business of incidents and problems, caused by errors in the infrastructure, and to proactively prevent the occurrence of incidents, problems, and errors.

Many organizations have difficulty getting Incident and Problem Management activities under control. The underlying cause usually boils down to having the two processes combined, resulting in confusion over which goal to pursue at any given time. By separating the processes expectations become very clear. When practicing Incident Management, restoration of service and maintaining SLA targets takes precedence. When practicing Problem Management, root cause analysis and eliminating errors takes precedence. This simple clarity of purpose results in significant organizational efficiencies, improves the end-user and customer perception of IT, and contributes to high IT staff morale.

There are times when underlying problems cause potentially more unplanned downtime in accumulated incidents than is practical within given service levels. Incident management will identify this type of problem by conducting trend analysis on its collected incident data. Once this type of situation is identified, Problem Management will open a problem record and work towards problem resolution.

Problem Management also acts as an escalation point for Service Desk and Incident Management, by providing specialized technical resources. In this capacity, Problem Management resources will be working under the Incident Management goal of quickly restoring service. They will not perform root cause investigation, while working as an escalation resource for an incident.

Although not absolutely necessary, it makes sense that Problem Management take the leading role in ensuring that Service Desk, Incident Management, and Problem Management all work together to ensure that the minimum possible disruption of service to the business occurs over time. This does not mean that Problem Management takes over difficult incidents. If Problem Management needs to take over, then a problem record should be created.

Problem management staff determines appropriate action based on the type of record they are working against. If they are recording their activities in an incident record, they operate under the goal of restoring service. If they are recording their activities in a problem record, they operate under the goal of eliminating errors from the infrastructure.

Incidents and problems are intimately related and should always be linked. This means that you can open any problem record and quickly identify every incident that was created as a result of that problem. Likewise, you can open any incident and quickly identify the problem that caused it if a problem record was created.

Every problem should be classified just as incidents are classified with category, impact, urgency, and priority. Categories are a way to help assign problems to the appropriate staff.