I know DevOps is the latest and greatest way to do software development and there is a lot that I like about it, but there is a one big problem with being a DevOps team.
We’ve started to learn to plan for interruptions. There are going to production issues that need to be dealt with. We are going to get client escalations. Things are going to come across our plate without any warning and we need to be able to respond to them.
The problem is interruptions keep you from getting done what you were planning to get done. Features don’t get delivered and the team can start to get frustrated. Interruptions can really make the life of a devops team get harder.
Do you know what another word for interruptions is? Bugs.
These interruptions are coming from clients that are having issues. Sometime they are just because we have changed a workflow and they don’t understand it, but to me anything that is impacting the ability of our clients to solve their problems is a bug. These interruptions are bugs.
So let’s rephrase that statement at the top of this article. Bugs are a big problem with being on a devops team. You know I think they are a big problem with being on any team aren’t they? So what do we do about it? Do we need to spend more time finding the bugs up front so they aren’t coming in as interruptions during the next development cycle? Sometimes yes – but sometimes these are things we probably never would have found no matter how much time we spent on it. So what other solutions do we have?
Well we could just account for it in our plans and add a 25% interruption buffer. This doesn’t seem to be the greatest option, but there are times when it makes sense (like when we just changed a major workflow for hundreds of clients). In general though I don’t think this is going to allow us to most effectively deliver value. So what else?
Perhaps we need to make it so that the interruptions don’t hurt as much. If we can pinpoint exactly where the problem is coming from and find the solution more quickly, interruptions becomes less painful. This mean being committed to good telemetry and good coding practices that let you quickly and easily find issues.
This issue of interruptions and bugs is not a simple one. It is one that has been with us from many years and will be with us for many more. There isn’t just one simple, quick fix answer. I’ve shared a few ideas we’ve been using or working towards on our team. What ideas to do you have? Share in the comments below!