I like it simple. And I have a somewhat shaky relationship with maturity models as might have gathered from some of my other posts. I see the power of using a maturity model or assessment to help people understand where they are, but it’s also so prone to Dunning-Kruger (Link). So what else can we do to understand our maturity in IT delivery as it relates to DevOps.
I am running executive training that helps program leads and delivery leads understand how to run programs that use Agile and DevOps methods. As a lead they are unlikely to run stand-ups or configure Jenkins but need to understand enough to support their teams and help stay the course. Hence we created a training that focuses on principles and how things look like from the executive perspective. We call it Modern Engineering Bootcamp. I cheekily usually say in the introduction that the aim is to make executives dangerous for their teams, because they will be able to call things out that teams don’t do well. They will be able to call BS when their teams believe they are doing well but really are not. We had this moment when a participant during the discussion on configuration management spoke up: “Until just then I believed we had our code in configuration management and only after asking a few more questions inspired by this course have I just found out that our Pega configuration is only stored in Pega not in a proper configuration management system. We will change this ASAP to improve our DevOps setup and to be able to automate further”.
This got me thinking, can I come up with a small set of self assessment questions that helps people make the same kind of discoveries where they need to.
Thinking about it for a while, I went through my past projects and realised that I asked my teams certain questions and asked them to do certain things when I got a little bit worried we are missing something, it’s time to share these questions with you (and some of the stories behind it):
Let’s blow away the environment and rebuild it from our stored configuration
When I was still a young developer I was working on a mainframe project. We had a development environment that we used to check our code before deploying to the mainframe. I noticed that the environment was cluttered up with lots of things that seemed to be leftover from earlier iterations and decided to blow the environment away and rebuild it from what we had stored in configuration. We might call this now “infrastructure as code”. It turned out that when I tried to rebuild it, that not only some files were missing (they were not in SCM as they were never migrated when moving to a newer SCM) but also the mainframe CICS configuration was not correct. We spent 2 weeks fixing this and while my boss was initially not happy that I had caused two weeks of extra work, we were lucky to find this in development and not at some later stage in production. Some of the missing files only existed in live environments and not in any SCM system…phew… By the way nowadays this question usually means lets rebuild a cloud environment from scratch, which should cause even less headaches.
When you ask your team to do this you are really looking for two things:
- Does your team look worried or push back against this request? Explore what the concerns are, it’s likely they know where the skeletons are in the closet.
- If they are confident and kick this off, measure how long it takes and whether you are happy with the timing.
Keep repeating this regularly to improve speed and reliability.
Let’s delete the application config and redeploy it
In my second project we ran some pretty fancy Control-M schedules to deploy applications and update configuration as required. Jenkins was still to be invented I think (or I hadn’t heard of it at least). Because we ran two releases in parallel and had lots of queues and databases to connect, we used a massive excel sheet to store our environment configuration. By the time I joined the project the project was already under way for a long time. I was worried that some configuration was missed in the excel sheet and decided to build a new environment and deploy the configuration only from the excel sheet. No surprise I found a handful of configurations that were not stored. Luckily I had learned from my previous project not to blow away an environment in use but rather build a new one in parallel to find these problems.
You are using this question the same way as the first one. See whether your team is comfortable and regularly run the exercise.
Let’s redeploy the application even though nothing has changed
So your team told you the application deployment is automated and reliable? Great, ask them to schedule a deployment of the latest package every morning into your dev or test environment before the team comes to work. Given they are deploying the same thing there is very little risk and it’s automated so no one needs to be around for it. On my third larger project this is what I did. And when the team pushed back initially I asked for it anyway. And guess what… for the first few weeks someone had to come to office early as there was always a small hiccup or another until it was fully smooth. The only thing that proves automation and keeps it clean is running it outside of business hours when no one touches the system and doing it very frequently even (and especially) when no changes to the application are required. It keeps proving to everyone that the deployments are reliable and that if after deploying a new version of the software problems occur, they are likely to be found in the code not in the deployment mechanism. Running it every day also allows you to measure performance and keep improving it. Those of you deploying multiple times a day, well done yours must be reliable already to do it so often, for everyone else: Start doing it frequently to become reliable and take the worry away.
Let’s rerun the test suite and then again
In one of my later projects we were struggling to make test automation work. We had a few goes at getting a test automation framework going. At some stage the test team celebrated that they had a regression suite that works. After a successful run I asked them to quickly rerun it and it turned out that we couldn’t as the data had to be completely refreshed and would have blow away the progress made from manual testing that was running at the same time. We had some more work to do. We had other performance related issues too that made it difficult to grow our regression test suite. But that’s a story for another time.
So with these four questions I think I provide you a simple diagnostic for some of the basic principles underpinning DevOps. See how your team reacts when you ask these. If your team cant perform these tasks then start working on improving the situation until all four of these become second nature and the questions don’t even raise eyebrows anymore.