When Configuration Management Fails
Most practitioners who hold a basic understanding of IT Service Management quickly recognize the value of practices like Configuration Management. After all, it is one of those ITIL practices that you simply can’t ignore. Understanding how our assets are configured to work with one another is useful to some of the most common and important practices and processes in our organizations. While obtaining good configuration data can be a daunting task, the maintenance of said configuration information often proves the Achilles heel of the practice.
Isn’t that why we have tools?
One common attempt to address the challenge of maintaining good configuration data is to focus heavily on tools. While every major ITSM suite can provide a relatively robust configuration management database and will surely integrate with an automated discovery tool (or provide one out of the box), the tool is only one part of the equation. Tools without supporting processes are generally not highly effective. Consider that tools must account for inputs, outputs, dependencies, information sources, etc. We must account for these in the underlying processes.
Since I argue that tools without process are not highly effective, I find it necessary to also state that a the processes feeding configuration management (including changes, additions, deletions) must be holistic for configuration management to be most effective. When we miss small pieces of the whole, the configuration data that we produce is compromised. The result is less than complete and accurate configuration data.
Accuracy versus Detail – Know Your Goal
What’s the right approach? Should my goal be to extract every detail of every system, component, device, element, etc.? Many failed configuration management efforts are built on the idea that if we can capture it, we should. This leads to a question of scope. What is the right scope for configuration management? Once we consider our use cases, goals, legislative and regulatory requirements, the scope question often gets reduced to more data versus better data. Now is a good time to ask the question, “What can I capture accurately and maintain accurately?” Remember, good configuration information must be reliable.
If the configuration information we capture and provide to other processes and practices is not reliable, downstream activities are impacted. I could tell you dozens of real-world stories about the effects of bad configuration data. These accounts will often involve time wasted, extended customer outages, major financial losses, etc. Instead of rehashing someone else’s painful and unfortunate experiences, I’d simply ask you to reflect on a time when you relied on configuration information that was inaccurate. Most people have, in some way, experienced first-hand the effects of bad configuration information. Have you ever tried assembling bookshelves (or other furniture) with incorrect drawings and poor instructions? Accuracy and detail are both important, but as soon as we reach a level where it is unreliable, decisions made based on it will be compromised.
What to do When Configuration Management Fails
So, what do you do when you suspect that configuration management data is incorrect? Here are a few quick tips to implement when you find yourself in that situation.
First, don’t be afraid to question the integrity of configuration data. We should immediately recognize that our accuracy is only as good as the tools, processes, workflows, and practices that feed our configuration data. If there is a compromise along the way, it potentially affects the resultant data. I’ve been involved in troubleshooting efforts where a system diagram was considered to be unquestionable truth and later turned out to be incorrect. Get in the habit of asking these questions about the integrity of the configuration data:
- Is this information correct?
- How do we know this is correct?
- What are the history of changes related to this configuration?
- When was the last update to this configuration?
Second, when it is possible, use alternate sources to provide another view of configuration data. Yes, you are taught that there should be a single source of truth. But remember, that single source of truth is only as accurate as what feeds it. Occasionally, something will stop feeding it. You should know when it does, but if you don’t, it’s a good idea to check. Often, a customer, vendor, or partner will have their own configuration data about a system under your control. It’s a good idea to ask them to share what they have with you.
Third, recognize failures as opportunities to understand what goes wrong and why things go wrong. Don’t just fix the symptoms and shrug your shoulders afterwards; look for the underlying causes of bad configuration information. Create a plan to address those underlying causes.
At the end of the day, you will likely experience some issues with your configuration data. Rest assured that most organizations have experienced very similar issues with their own configuration data. That doesn’t make the failure acceptable, it just means that there are common issues.
Configuration management must be a well-designed and purpose-driven practice. Make sure you have a good understanding of what you are trying to capture and the reason behind those efforts. Avoid some of the common mistakes and be ready to react when your configuration data fails you.