It’s not uncommon for me to field questions from my customers about the relationship between ITIL’s Problem Management process and Root-Cause Analysis. I am in a unique position to discuss this as I’ve spent much of my professional career training and consulting in the ITIL® framework and in various Root-Cause Analysis systems. It pains me to say this, but I have to be upfront – there are many excellent professionals who work in IT Service Management who are completely clueless about root-cause analysis. I don’t say this to be offensive – I’ve just found IT organizations to be historically short-sighted when it comes to true root-cause analysis.
What does ITIL teach us about Root-Cause Analysis?
ITIL does a great job of sharing best practice within IT Service Management. Remember, ITIL is a self-proclaimed non-prescriptive framework. When it comes to root-cause analysis, ITIL touches (very lightly) on this subject in the domain of Problem Management. While ITIL does touch on a handful of techniques and even discusses getting to the root cause of incidents and problems, it does not give organizations a prescriptive path toward executing true root-cause analysis. You could argue this is by design in the non-prescriptive framework or you could also argue that root-cause analysis is broadly misunderstood even by some of the best industry professionals. It would be fair to say that both may be true.
Common questions that really need some consideration
While ITIL’s discussion of root-cause analysis does point you toward some methods and techniques, it is far from an exhaustive view and leaves much of the guidance to the organization. This is more beneficial than it is harmful as there are some key questions that require organizational context. Let’s discuss a few of considerations.
Q. Is root-cause analysis the domain of Problem Management?
A. True root-cause analysis will quickly depart from technical factors to organizational conditions. It some organizations, addressing these (or at least tracking) may be appropriate for problem management. Other organizations will need to decide who the right stakeholders are that have the power and influence to address broader organizational issues. This often is outside of Problem Management and even outside of the IT organization.
Q. When should we conduct root-cause analysis?
A. There are many factors at play that guide this question and can be many triggers. I address it in more detail in another post and cover the common triggers. For now, consider that we need to understand what root-cause analysis is (and isn’t) and how deep we should go. Root-cause analysis can become a ceaseless pursuit with varying results and varied levels of actions. In the IT world, most organizations are comfortable pursuing deeper levels of causes until they are outside of the control of their stakeholders. A key determination needs to be made regarding when to stop root-cause analysis.
Q. What do customers or stakeholders expect when they ask for root-cause?
A. It is very common for customers or other stakeholders to ask for root-cause analysis (often referred to as an RCA report), a post-mortem, an incident report, after action review, etc. It is critical to understand the expectations upfront. Often, the stakeholders are simply trying to understand the immediate technical cause of a failure and your actions to prevent such failures in the future and really aren’t interested in anything deeper. Setting the expectation in the beginning allows you to focus on the critical elements and then use organizational considerations to determine how deep you need to go in your analysis.
There is much, much more than can be said about root-cause analysis and many other dynamics at play. Let us know if you have specific questions and we’ll write more to help address them. As always, please contact us if your organization could benefit from ITIL, DevOps, or Root-Cause Analysis training or consulting.