The DevOps Silver Bullet

If you work in the DevOps space long enough you would have been offered many “Silver Bullets” over the years. Everything from a specific tool, over just putting the Dev and Ops team together to specific practices like Continuous Delivery. Of course the truth is, there is no “Silver Bullet”. Last year at the DevOps Enterprise Summit I sat together with some of the brightest in this space. We spoke about what the potential “Silver Bullet” could be. It was surprising how quickly we all agreed what the one thing is that predicts success for DevOps in an organisation. So let me reveal the “Silver Bullet” we casted in that room and it is unfortunately a “Silver Bullet” that requires a lot of discipline and work: Continuous Improvement.

When we surveyed the room to see what the one characteristic is of companies that end up being successful with their DevOps transformation, we all agree that the ability to drive a mature continuous improvement program is the best indicator of success.

But what does this mean?

prime-directiveFor starters it means that these companies know what they are optimizing for. There is plenty anecdotal evidence that optimizing for speed will improve quality and reduce cost in the long run (as both of those influence speed negatively otherwise). On the flipside trying to optimize for cost does not improve quality or speed. Neither does a focus on quality, which often introduces additional expensive and time consuming steps. If you want to be fast you cannot afford rework and will remove any unnecessary steps and automate where possible. Hence speed is the prime directive for your continuous improvement program.

In my view it’s all about visibility and transparency. To improve we need to know where we should start improving. After all systems thinking and theory of constraints have taught us that we only improve when we improve the bottleneck and nowhere else. Hence the first activity should be a value stream mapping exercise where representation from across the IT and business come together to visualize the overall process of IT delivery (and run).

I like to use the 7 wastes of software engineering ( when doing value stream mapping to highlight areas of interest. And this map with the “hotspots” creates a target rich environment of bottlenecks you can improve.

The key for improvement is to use what basically comes down to the scientific method: Make a prediction of what will improve when you make a specific change, make the change, measure the change to see whether you have in fact improved it. Too often people don’t have the right rigor in the continuous improvement program. As a result, changes wont get implemented or if they get implemented no one can say whether it was successful or not. Avoid this by having the right rigor up front.

The other challenge with continuous improvement is that it unfortunately is not linear. It follows a slightly adjusted J-Curve. Initially you will find easy wins and things will look splendidly. But then things get harder and often regress a litte. You could jump ship here, but that would be a bad choice. If you stick to it you will see real improvements above and beyond easy wins later.



As a result the goal if continuous improvement needs to be to find many smaller J-Curves rather than working on one huge Transformation-type J-Curve. When I saw this concept explained at DevOps Enterprise Summit by Damon Edwards, it was quite a revelation. Such a simple graphical representation that explains so well why many DevOps adoptions struggle. It is too easy to assume a straight line and then get distracted when the initial downturn occurs after the easy wins.

I can tell you from experience that the curve is real and that no matter how much experience I gather, there are always context specific challenges that mean we have that dip that requires us to stay strong and stick with it. You need grit to be successful in the DevOps world.

So the Silver Bullet I promised you does exist, but unfortunately it requires discipline and hard work. It was very encouraging to see that some of the most experienced enterprise coaches and change agents agree that the existence of a rigorous continuous improvement culture. Look around and see whether you are in a place with such a culture and if not, what can you do tomorrow to start changing the culture for the better?

6 Questions to Test whether You are Working with your System Integrator in a DevOps way

anger-1226157_1280If you have been following my blog, you will know that I am disappointed on how little the cultural relationship between companies and their systems integrators is being discussed in blogs, articles and conference talks. As I am working for an SI I find this surprising. Most large organisations work with Sis, so why are we not talking about it? If we are serious about DevOps we should also have a DevOps culture with our SIs, shouldn’t we?

When I speak to CIOs and have a discussion about DevOps and how to improve going forward, I often get a comment at some stage – “Mirco you seem to get this. Why is it then that not all projects with your company leverage the principles you talk about?”.

A good question, and one that a few years ago I didn’t have an answer to and hence made me a bit unsure on how to answer. I have spent a lot of time analyzing in the years since. And the truth is, that often the relationship does not allow us to work in the way most of us would like to work.

The other week I had a workshop with lawyers from both my company and lawyers from a firm that represents our clients to discuss the best way to structure contracts. Finally we all seem to understand that there is a lot of room for improvement. We need to do more of this so that we can create constructs that work for all parties. I am looking forward to continue working them – and how often do you hear someone say that about lawyers 😉

Coming back from yet another conference where this topic was suspiciously absent, I thought I write down this checklist for you to test whether you have the right engagement culture with your system integrator that enables working to the benefit of both organisations:

  • Are you using average daily rate (ADR) as indicator of productivity, value for money, etc.?
    +1 if you said No. You can read more here as to why ADR is a really bad measure all things being equal.
  • Do have a mechanism in place that allows your SI to share benefits with you when they improve through automation or other practices?
    +1 if you said Yes. You cant really expect the SI to invest in new practices if there is no upside for them. And yes there is the “morally right thing to do” argument, but let’s be fair, we all have economic targets and not discussing this with your SI to find a mutually agreeable answer is just making it a bit a too easy for yourself I think.
  • Do you give your SI the “wiggle room” to improve and experiment and do you manage the process together?
    +1 if you said Yes. You want to know how much time the SI spends on improving things, on experimenting with new tools or practices. If they have just enough budget from you to do exactly what you ask them to do, then start asking for this innovation budget and manage it with them.
  • Do you celebrate or at least acknowledge failure of experiments?
    +1 if you said Yes. If you have innovation budget, are you okay when the SI comes back and one of the improvements didn’t work? Or are you just accepting successful experiments? I think you see which answer aligns with a DevOps culture.
  • Do you know what success looks like for your SI?
    +1 if you said Yes. Understanding what the goals are that your SI needs to achieve is important. Not just financially but also for the people that work for the SI. Career progression and other aspects of HR should be aligned to make the relationship successful.
  • Do you deal with your SI directly?
    +1 if you said Yes. If there is another party like your procurement team or an external party involved then it’s likely that messages get misunderstood. And there is no guarantee the procurement teams know the best practices for DevOps vendor management. Are you discussing any potential hindrance in the contracting space directly with your SI counterpart?

A lot is being said about moving from vendor relationship to partnerships in the DevOps world. I hope this little self-test helped you find a few things you can work on with your systems integrator. I am living on the other side and often have to be creative to do the right thing for my customers. It is encouraging to me to see that many companies are at least aware of these challenges. If we can have open discussions about the items above, we will accelerate the adoption of DevOps together. I promise on the side of the SIs you will find partners that want to go the way with you. Find the right partner, be open about the aspects I described above and identify a common strategy going forward. I am looking forward to this journey together. Let’s go!

Impressions from DOES 2016

2016-11-08-16-38-18And now it’s over again, the annual DevOps Family gathering a.k.a. DevOps Enterprise Summit. Another year goes by and we were able to check-in with some of our favorite DevOps leaders and got to know some new family members. The event was full of energy and as every year I am trying to summarize what I have seen.

First of all, some overall trends of things I heard coming up again and again:

  • Attracting people – the DevOps space continues to be a hot spot and we are all competing for rare talent in this space. I think the transformational nature makes it harder to find the right people who have the right technical skills and the right mindset to be in continuous change along the journey
  • Platforms as enabler and answer to the team structure question – At the first DOES the discussion about “DevOps teams” was still heated; should you or should you not have a dedicated team. Having an internal platform team to run and operate the DevOps platform seems to be the most common solution. The idea that the platform provides self-service capabilities to the product teams and uses this to abstract away the org structure problem was mentioned several times.
  • Open Source / Open IP – more companies are now talking about open sourcing some of their tooling, including Accenture. This is a good sign for an industry that too long has focused on internal IP. I think DevOps has done great things to open IT up for sharing and providing an ecosystem where we all work together on the big challenges ahead of us

Let’s look at some of the highlight talks below:

Heather Mickman from Target

We got to check-in with Heather Mickman from Target, to see how she has progressed. It was widely seen as one of the best talks of the conference. Some gems of this talk were:

  • How speaking externally about what Target has been doing, has enabled them to attract talent
  • How they moved more work in-house to control the culture and outcomes better
  • How they build their own platform to manage public and private cloud platforms
  • Key metrics she uses are: Number of incidents / deployment and the onboarding time
  • Heather pretty much addressed all the 3 main themes I mentioned above

Scott Prugh from CSG

Another favourite of previous years provided an update on their journey and how a more Ops focused view looks like. The numbers he mentioned are still impressive, with 10x quality and half the time to market achieved through the adoption of DevOps. Their deployment quality is close to perfect with near 0 incidents post deployment (same metric that Heather mentioned). And he also highlighted the self-service platform as key enabler. Another aspect I liked was his focus on automated reporting and making work visible. His colleague Erica than brought the phoenix project to life by comparing her world to the book. I love this.

Ben and Susanna from American Airlines

I am writing this summary while waiting for my 17hour delayed AA flight, so I assume there is still some room for improvement on the DevOps front 😉 Their talk focused on the “opportunities” provided by the merger of two airlines and what to do with 2 very different stacks initially and how to slowly merge them. They also highlighted the common challenge with test automation and how to measure success with DevOps. It feels better but how do we really measure it?

Gene and John’s fireside chat

I mean what can you say about this one…it was fascinating and a geek-out for all of us. So many threads to follow, it felt like Alice in Wonderland for DevOps guys. When you watch this in replay you will feel to urge to buy books and keep googling things. Hold your fire and buy the Beyond the Phoenix project audiobook when it comes out. I surely will!

Mark Schwartz on Business Value

Mark Schwarz was back and spoke about his book. A great exploration of the concept of business value. He did not provide the answer, but some interesting things to consider:

  • ROI misses the point – profit is a fiction, flexibility and agility and options are not reflected
  • ROI does not easily work to derive decisions, too far away or too much work
  • Not each item in the backlog can feasibly be assigned a value
  • It is so important to have a conversation about business value to decide how Agile teams will use it to derive priorities

Keith Pleas from Accenture

Another exploratory talk about the Automation of Automation. How we focus our attention to automate applications, but are we using the same ideas for our own DevOps architecture. Like Gene and Johns talk, there were many breadcrumb trails to follow with this one. Accenture has also open sourced it’s DevOps platform, which you can find here:

The main themes of Open IP, Platform teams and attracting talent were hit on by Keith.

There were so many more great talks, check them out when the recordings are available. I will choose a few more quick highlights below:

  • Pivotal’s talk added the product orientation as organisational mechanism to the discussion on platform teams.
  • My good friend Sam Guckenheimer from Microsoft had the guts to do a live demo on stage, which worked out really well and show some very interesting insights into Microsofts developer platform.
  • Carmen DeArdo from Nationwide had one of the best slides in the conference in my view. I really like the cycle transformation picture, what do you think?
  • Topo Pal from CaptialOne had some of the best nuggets at the confernece:
    • “It takes an army to manage a pipeline”
    • 16 gates of quality or as he calls it, 10 commandments in Hex 😉
  • We had a really good introduction to Site Reliability Engineering by David Blank-Edelmann and the concepts of Error budgets, Blameless post mortems and much more. He also phrased that “You cant fire your way to reliability” and that maturity models should be there to determine the right help, not to punish someone.

Best thing of course are the hallway talks, the opportunity to talk to old friends and to make new friends. Another great event gone by…

See you all at DOES 17 Nov 13-15 2017 back in San Francisco. I will be there and will look forward to meet you all there. Come join us at the family gathering next year!

Why Develop-Operate-Transition projects need to get the DevOps treatment

The DevOps movement has focused the IT industry on breaking down silos and increase collaboration across the different IT functions. There are popular commercial constructs that did a great job in the past but which are not appropriate for DevOps aligned delivery models. A while ago I talked about the focus on Average Daily Rate, in this post I want to discuss how to change the popular Develop-Operate-Transition (DOT) construct.

Let’s look at the intent behind the DOT construct. The profile of work usually changes over time for a specific system and idealised looks like this:

  • During Development the team needs to be large and deal with complex requirements, new problems need to be solved as the solution evolves
  • During Operate the changes are smaller, the application is relatively stable and changes are not very complex
  • At some stage the application stabilises and changes are rare and of low complexity
  • And then the lifecycle finishes when applications are being decommissioned (– and yes we are not really good at decommissioning applications, somehow we hang onto old systems for way too long. But for the sake of argument let’s assume we do decommission systems)

As an organisation it is quite common to respond to this with a rather obvious answer:

  • During development we engage a team of highly skilled IT workers who can deal with the complexity of building a new system from scratch and we will pay premium rates for this service
  • During Operate we are looking for a ‘commodity’ partner as the work is less complex now and cost-effective labour can be leveraged to reduce the cost profile
  • As the application further stabilises or usage reduces we prefer to take control of the system to use our in-house capabilities

So far so obvious.

If we look at this construct from a DevOps perspective it becomes clear that this construct is sub-optimal as we have two handover points and in the worst case these are between different organisations with different skills and culture. I have seen examples where applications stopped working once one vendor left the building because some intangible knowledge did not get transitioned to the new vendor. It is also understandable if the Develop vendor focuses on aspects that are required to deliver the initial version and less focused on how to keep it running and how to change it after go-live. While the operate vendor would care a lot about those aspects and rather compromise on scope. Now we could try to write really detailed contracts to prevent this from happening. I doubt that we can cover it completely in a contract or at least the contract would become way too extensive and complicated.

What is the alternative you ask? Let’s look at a slight variation:


Here the company is involved from the beginning and is building up some level of internal IP as the solution is being built out. In a time where IT is critical for business success I think it is important to build some level of internal IP about the systems you use to support your business. In this new type of arrangement in the beginning the partner is providing significant additional capabilities, yet the early involvement of both the company itself and the application outsourcing partner makes sure all things are considered and everyone is across the trade-offs that are being made during delivery of the initial project. Once the implementation is complete a part of the team continues on to operate the system in production and makes any necessary changes as required and any additional smaller enhancements. If and when the company chooses to take the application completely back in-house, it is possible to do so as the existing people can continue and the capability can be augmented in-house as required at this point. While there will still be some handover activities the continuous involvement of people makes the process a lot less risky and provides continuity across the different transitions.

Of course having a partner for both implementation and operating is a much better proposition as this will further reduce the fraction. I have now worked on a couple of deals like that and really like that model as it allows for long-term planning and partnership between the two organisations.

Most people I spoke find this quite an intuitive model, so hopefully we will see more of these engagements in the future.

How to Structure Agile Contracts with System Integrators

As you know I work for a Systems Integrator and spend a lot of my time responding to proposals for projects. I am also spending time as consultant with CIOs and IT leadership to help them define strategies and guide DevOps/Agile transformations. An important part is to define successful partnerships.  When you look around it is quite difficult to find guidance on how to structure the relationships between vendor and company better. In this post I want to provide three things to look out for when engaging a systems integrator or other IT delivery partner. Engagements should consider these elements to come to a mutually beneficial commercial construct.

Focus on Dayrates is dangerous

We all know that more automation is better, why is it then that many companies evaluate the ‘productivity’ of a vendor by their dayrates. Normally organisations are roughly organised in a pyramide shape (but the model will work for other structures as well).

It is quite easy to do the math when it comes to more automation. If we automate activities they are usually either low skilled or at least highly repeatable activities which are usually performed by people with lower costs to the company. If we automate more tasks that means our ‘pyramid’ becomes smaller at the bottom. What does this do to the average dayrate? Well of course it brings it up. The overall cost goes down but the average dayrate goes up.


You should therefore look for contracts that work on overall cost not dayrates. A drive for lower dayrates incentives manual activities rather than automation. Besides dayrates it is also beneficial to incentivise automation even further by sharing the upside of automation (e.g. gain sharing on savings from automation, so that the vendor makes automation investments by themselves)

Deliverables are overvalued

To this date many organisations structure contracts around deliverables. This is not in line with modern delivery. In Agile or iterative projects we are potentially never fully done with a deliverable and certainly shouldn’t encourage payments for things like design documents. We should focus on the functionality that is going live (and is documented) and should structure the release schedule so that frequent releases coincide with regular payments for the vendor. There are many ways to ‘measure’ functionality that goes live like story points, value points, function points etc. Each of them better than deliverable based payments.

Here is an example payment schedule:

  • We have 300 story points to be delivered in 3 iterations and 1 release to production. 1000$ total price
  • 10%/40%/30%/20% Payment schedule (First payment at kick-off, second one as stories are done in iterations, third one is once stories are releases to production, last payment after a short period of warranty)
  • 10% = 100$ on Signing contract
  • Iteration 1 (50 pts done): 50/300 *0.4 * 1000 = 66$
  • Iteration 2 (100 pts done): 100/300 * 0.4 * 1000 = 133$
  • Iteration 3 (150 pts done): 150/300 * 0.4 * 1000 = 201$
  • Hardening & Go-live: 30% = 300$
  • Warranty complete: 20% = 200$

BlackBox development is a thing of the past

In the past it was a quality of a vendor to take care of things for you in more or less a “blackbox” model. That means you trusted them to use their methodology, their tools and their premises to deliver a solution for you. Nowadays understanding your systems and your business well is an important factor for remaining relevant in the market. Therefore you should ask your vendor to work closely with people in your company so that you can keep key intellectual property in house and bring the best from both parties together, your business knowledge and the knowledge of your application architecture with the delivery capabilities of the systems integrator. A strong partner will be able to help you deliver beyond your internal capability and should be able to do so in a transparent fashion. It will also reduce your risk of nasty surprises. And last but not least in Agile one of the most important things for the delivery team is to work closely with business. That is just not possible if vendor and company are not working together closely and transparently. A contract should reflect the commitment from both sides to work together as it relates to making people, technology and premises available to each other.

One caveat to this guidance is that for applications that are due for retirement you can opt for a more traditional contractual model, but for systems critical to your business you should be strategic about choosing your delivery partner in line with the above.

I already posted some related posts in the past, feel free to read further on:

Thoughts on State of DevOps report 2016

SOD2016And there it is – the most recent State of DevOps report, the 2016 version. If you read my previous blog posts for these kind of reports, you will expect a full summary. Sorry to disappoint. This time I focus on the things that stood out to me – we all know DevOps is a good thing and this report can give you ammunition if you need to convince someone else. But I don’t see the point of reiterating those. Let’s focus on the things that we didn’t know or that surprise us.

It is great to see that the report continues to highlight the importance of people and processes in addition to automation. That is very much in line with my practical experience at clients. High performance organisations have higher Employee Net Promoter Score (ENPS), which makes sense to me. I think there is an argument to be made that you can use ENPS to identify teams or projects with problems. I would love to test this hypothesis as alternative to self-assessments or other more involved tools that might cost more but might not be more accurate and are harder to deploy.

Another key finding is the impact it has when you build quality into your pipeline and don’t have it as a separate activity (e.g. no testing as a dedicated phase – I wrote about modern testing here) – but then the numbers didn’t really convince me on second look. It’s difficult to get this right and especially as the report has to work with guestimates from people at all levels of the organisation. But I agree with the sentiment behind this and there is anecdotal evidence that this holds true. I would love to have some more reliable data on this from real studies of work in organisations, it could be very powerful.

This year is the first time DevOpsSec is reflected in the report and the results are positive which is great. I have always argued that with the level of automation and logging in DevOps security should be a lot easier. The report has some very useful advice on how to integrate security all through the lifecycle on page 28.

We continue to see a good proportion of people coming from DevOps teams which is not surprising as that is the organisational form that most larger organisations choose for practical reasons (at least as a transition state) and flies in the face of a “DevOps team” is an anti-pattern. Glad to see the reality reflected.

On the results side the report uses some pretty impressive numbers on what high performers can do vs low performers. That’s great info, but I would like to see this compared between companies of similar size and complexity – otherwise we compare the proverbial DevOps unicorns with large enterprises and that is not really a fair comparison as the difference is not just in DevOps then. The more detailed data shows in my view the limitations of the comparison and some “kinks” in the data that are not easy to explain. I am glad they printed this data, as it shows that they researchers don’t massage data to fit their purpose which is good.

I really like how the researchers tried to find evidence for the positive benefits of trunk-based development, but I am not convinced this has been fully achieved yet. The same counts for visualising work – I see the point, but the report does not give me more reason and ammunition than I had before in my view.

Similar the ROI calculation is a good start, but nothing revolutionary. Its worth having a read of it but you will likely not find much new here – reduction in downtime, reduction in outage, increase in value through faster delivery.

Overall a good report, but not much revolutionary new. Great to see the trending over the years and that the data remains consistent. Looking forward to next year’s version. And yes I am writing this against the high expectations from previous year, it’s difficult to have revolutionary news each year…

How to Fail at Test Automation

(This post was first published on

Let me start by admitting that I am not a test automation expert. I have done some work with test automation and have supervised teams who practiced it, but when it comes to the intricacies of it, I have to call a friend. It is from such friends that I have learned why so many test automation efforts fail. Talking to people about test automation validates my impression that this is the DevOps-related practice that people most often fail at.

Let me share the four reasons why test automation fails, in the hope that it will help you avoid these mistakes in your test automation efforts.

Before I go into the four reasons, allow me one more thought: Test automation is actually a bad choice of word. You are not automating testing, you automate quality assessments. What do I mean by that? It is a mistake to think of test automation as automating what you otherwise would do manually. You are finding ways to assess the quality of your product in automated ways and you will execute it way more often than you would do manual testing. This conceptual difference explains to a large degree the four reasons for test automation failure below.

Reason 1: Underestimating the Impact on Infrastructure and the Ecosystem

There is a physical limit of how much pressure a number of manual testers can put on your systems. Automation will put very different stress on your system. What you otherwise do once a week manually you might now do 100 times a day with automation. And into the mix an integrated environment, which means external systems need to respond that frequently, too. So you really have to consider two different aspects: Can your infrastructure in your environments support 100 times the volume it currently supports, and are your external systems set up to support this volume? Of course, you can always choose to reduce the stress on external systems by limiting the real-time interactions and stub out a certain percentage of transactions or use virtual services.

Reason 2: Underestimate the Data Hunger

Very often test automation is used in the same system where manual testing takes place. Test automation is data-hungry, as it needs data for each run of test execution and, remember, this is much more frequent than manual testing. This means you cannot easily refresh all test data whenever you want to run test automation and have to wait until manual testing reaches a logical refresh point. This obviously is not good enough; instead, you need to be able to run your test automation at any time. There are a few different strategies you can use (and you will likely use a combination):

  • Finish the test in the same state of data that you started with;
  • Create the data as part of the test execution;
  • Identify a partial set of data across all involved applications that you can safely replace each time; or
  • Leverage a large base of data sets to feed into your automation to last until the next logical refresh point.

Reason 3: Not Thinking About the System

Test automation often is an orchestration exercise as the overall business process in test flows across many different applications. If you require manual steps in multiple systems, then your automation will depend on orchestrating all those. By just building automation for one system you might get stuck if your test automation solution is not able to be orchestrated across different solutions. Also, some walled-garden test automation tools might not play well together, so think about your overall system of applications and the business processes first before heavily investing in one specific solution for one application.

Reason 4: Not Integrating it into the Software Development Life Cycle

Test automation is not a separate task; to be successful it needs to be part of your development efforts. From the people I have spoken to there is general agreement that a separate test automation team usually doesn’t work for several reasons:

  • They are “too far away” from the application teams to influence “ability to automate testing,” which you want to build into your architecture to be able to test the services below the user interface;
  • Tests often are not integrated in the continuous delivery pipeline, which means key performance constraints are not considered (tests should be really fast to run with each deployment);
  • Tests often are not executed often enough, which means they become more brittle and less reliable. Tests need to be treated at the same level as code and should be treated with the same severity. This is much easier when the team has to run them to claim success for any new feature and is much harder to do when it is a separate team who does the automation. It also will take much longer to understand where the problem lies.

Of course, absence of failure does not mean success. But at least I was able to share the common mistakes I have seen and, as they say, “Learning from others’ mistakes is cheaper.” Perhaps these thoughts can help you avoid some mistakes in your test automation journey. I do have some positive guidance on test automation, too, but will leave this for another post.

And in the case you found your own ways of failing, please share it in the comments to help others avoid those in the future. Failures are part of life and even more so part of DevOps life (trust me, I have some scars to show). We should learn to share those and not just the rosy “conference-ready” side of our stories.

Test automation is for me the practice that requires more attention and more focus. Between some open-source solutions and very expensive proprietary solutions, I am not convinced we in the IT industry have mastered it.

One bonus thought: If you cannot automate testing, automate the recording of your testing.

If you cannot automate testing, find a way to record the screen with each test by default. Once you identify a defect you can use the recording to provide much richer context and make it a lot easier to find the problem and solve it. Verbal descriptions of error are very often lacking and don’t provide all the context of what was done. I keep being surprised how long triage takes because of the lack of context and detail in the defect description. There is really no excuse for not doing this. Record first, discard if successful, attach it to the defect record if you find a problem.