DevOps Enterprise Summit 2017 – A summary

DOES17

It’s over again. The annual DevOps family reunion in San Francisco. This year was extra special because I feel I learnt even more than in previous years and because I was able to hand out preview copies of my upcoming book. I am really looking forward to hear from those of you who got a copy what you think. Reach out and let me know. I will talk a little bit about the book in an upcoming blog post.

 

The summit had a lot more variety of topics than previous years (at least the talks I attended), which I found very refreshing. Technology talks, culture talks, security, case studies, agile – so many different perspectives on IT transformation. Congratulations to the program committee for getting such a good mix and balance.

So let’s recap what I have learned from the summit.

Keybank presentation

This was clearly one of my highlights. Hearing from a bank which is using microservices and the Netflix OSS to stabilise environments was great. To then hear they were able to outperform expectations and delivery faster and with increased scope, just shows what is possible when you get this right. Well done team and looking forward to hear more in the upcoming years. They mentioned one really interesting ideas that I will take away from the conference: “It is not a reason not to automate something if you don’t do it frequently. In fact you should automate in those cases as you don’t get a lot of practice at it.” I will remember this.

John Allspaw

Of course the expectations on my side were high when I saw John will speak. And boy did he deliver. A fascinating talk about how we cannot see what we do directly but rather work with models in our head and manipulate it via a keyhole – the screen – to interact with the invisible system. This makes you think. It requires us to move from incidents as motivators for policy – towards incidents as messages of the invisible system to us that we should use to update the mental model. Incidents show where the models are misaligned. This is tricky to operationalise and speaks to us as individuals.

We should then look at incidents as unplanned investments where the cost is already fixed for us – so how do we maximise the ROI on it? Commonly Post Mortems value the actions items at the end, but more important is the updated mental model we should have at the end. Questions to ask beyond “What went wrong?” and “How did it break?” We should talk about what made it not nearly as bad is could have been. And how can we continue to learn about the invisible systems. This talk created a lot of conversation over drinks. Mind Blown!

Scott Prugh and Erica Morrison from CSG

The continuation of the CSG story which is familiar to many of us. Good to hear they continue to challenge the status quo and push forward. The metric of the conference was “how much sleep do I get when we make changes” which moved from very little to a lot. It also showed the need to shift from a more dev focused devops to a more ops focused or balanced view and what it does to the incidents in production. And of course we might never forget Scott with a sledgehammer destroying mode 1 once and for all…

Columbia’s Scott Nasello

A story of just getting on with it and doing the right thing to improve the situation. Not a transformation with funding etc. A good reminder of what can be done if you really want to do something. The stories around Configuration management as foundation to everything – from emailing scripts to proper SCM – sound very close to some of the things I have experienced. And then the innovative approach of swapping people out regularly to create that constant beginner mindset that allows you to question things and to learn new things. Really interesting approach.

And of course the coolest random fact: It takes only 29 dominoes to take down the empire state building. Yes really!

Damon Edwards

I liked this presentation just like all the other ones that Damon has done. A good reminder that Ops is more than deployments and pipelines. I liked the insight that ticket driven queues are a sign for silos in your org. And that tickets should be for exceptions not for actual work. He went on to define Ops as a service – definition of automated procedure, execution of it and governance – which is a framework I will surely use in the future. Thanks Damon.

Amazon

This was a good industry story of the need for immediacy and how this will continue to increase. They had to learn how to integrate across multiple teams and how you need have teams that look after the end to end business service. I also learned a new word: “hyperconvenience”

Disney – Jason Cox

Phew – Finally I could attend a full talk from Jason – yay! Of course the videos were awesome and a Star Wars trailer always gets my full attention. I could very much relate to the analogy of “corporate” services perceived as the empire. Even though all you want to do is help the team and how they had to overcome this perception. I also really liked the technology rotation program for managers to continually challenge the status quo and build empathy across the business. And of course I wish I could call my training program “Jedi Engineering Training Academy” – best name ever for a training program 😉

Pivotal – Cornelia Davis

Really good semi technical talk about cloud native applications – I will definitely will buy her book. She spoke about:

  • Dynamic load balancing
  • Statelessness in the architecture
  • Application lifecycle – events have rippling effects – you need to ping and deal with it automatically
  • Versioned services and deployment in parallel not replacement of services
  • Dynamically updated router for service discovery or a dedicated server to manage it
  • Data APIs and caching is important to decouple from database
  • Or a database per microservice and event driven data propagation, commonly using kafka as unified log and universal source of truth

Nicole Forsgren

The queen of DevOps data did not disappoint. Nicole went through 4 years of learnings. Most importantly how throughput and stability move together and are not a tradeoff.

That you should use MTTR, lead time to change and deployment frequency as good measure to understand improvements. And that when you improve DevOps performances it is likely to improve organisational performance. Nicole also shares my scepticism about maturity models which are aging too fast due to changes in capabilities. I think they can still be useful in the right hands, but one has to be careful. In a room full of techies she challenged us with “Tech plus”: It takes  IT combined with other things to make companies successful.

Her litmus test for DevOps success: “can you deploy on demand, independently and during business hours?” And if you don’t know where to start, take her advice and look at  Architecture, Continuous Integration and a lightweight change approval process as good starting points

 

Unfortunately I could not attend the third day of the conference, but I will surely catchup on the videos later. I will certainly be back next year and look forward to hear what everyone else learned this year.

Thanks Gene, Thanks organising team, thanks DevOps family – looking forward to see all you brothers, sisters and cousins at the next family gathering with Papa Gene 😉

Paternity Post #0 – Getting ready

You might have wondered why there have been so few updates recently on my blog. The answer is twofold a) my creative juices have gone into finishing off my book (DevOps for the Modern Enterprise) and b) earlier this year my son was born, which is the best possible reason to spend less time in front of the PC. As things are settling down I will start to write more frequently again, which brings me to today’s post: I have decided to write some blogs about my upcoming paternity leave, so you will see some less technical posts in between technical posts.

The reason I decided to do this is to encourage more fathers to take paternity leave and get them some honest first hand descriptions on how it plays out. I have learned over the last few months that many fathers have not taken paternity leave for many different reasons: career, company policies, being unsure what to expect as full time dad. So I decided to write about my experience.

I am about two weeks away from taking around three months off as full time dad. My son is 9 months old and he is a handful. I have heaps of respect for the work that my wife is doing to keep him on a schedule and look after him. The long nights of getting him back to sleep and looking after him during the days sound tough…soon I will know first hand how it is, something that so far has been limited to weekends when I don’t have to work.

You might wonder whether I am worried about what will happen at work when I am gone for such a long time and the truthful answer is “a little”. Of course there is that little voice in my head telling me that lots of important things will happen while I am gone and that I should really be there for it. But then I think of my incredible team that will cover for me when I am off and I know they will do a fantastic job. Over the last few months they already picked up a lot of the scope of work that I would usually deal with and have done great. My bosses are extremely supportive of me taking my paternity time and understand that things will be a little bit different for a period of time.

Until recently I thought that work (and potential career impacts) are the only things to worry about, but then I started to hear from mums and dads on how tough it can be to be the full time parent, the days where you only speak “baby language” and miss a meaningful conversation. The days when all hell breaks loose and you struggle to keep the baby fed and clean. The days when finding the time for a shower and a warm meal is a real challenge. The days when you had hardly any sleep and the little one wants his normal routine while you can hardly keep your eyes open. Phew…

Perhaps this gig is going to be tougher than I thought…but then I look at the little man and see him smile and I know I will enjoy the time no matter how hard it might be. Work will go on and I am sure I will catch-up when I am back. My team will do great and I will see my little man grow over the last 3 months of his first year in this world. I will keep you all posted how my paternity leave is playing out with regular blog posts.

From Factory to Labs – is that the better metaphor?

As you probably know this blog was partly inspired by my frustration with managers and leadership who compared IT delivery with factories. This year at Agile Australia I was very positively surprised that the topic of the factory metaphor came up in a few talks. I am really glad we finally talk about the problems that stem from management using manufacturing thinking for IT delivery. Given I have spoken about this before I don’t want revisit the reasons here and rather spend a bit of time on an alternative model that was put forward at the conference by Dom Price from Atlassian – it’s not a factory it’s a lab.

Look at this slide from the talk for a summary of why the Labs model is more appropriate.

2017-06-23 14.20.51

There is a lot I like about the Labs metaphor that could inspire better management – the inherent uncertainty around IT delivery, the data driven nature supported by the scientific method, building in failure as a normal occurrence for which we try to minimise the impact instead of assuming we could prevent it. That being said, I feel the Labs model might be taking it perhaps a step too far as there is a level of predictability that is required by management and by business stakeholders. A delivery roadmap highlighting features to be delivered is often underpinning the business case. I might be too far away from scientific labs and the right examples might exist, but it is my impression that those roadmaps are less common in labs than we would want in IT. My experience with labs has been that timelines are full of unknowns, more than we would accept in IT delivery.

At this point there are three mental models that I am aware of, the factory, the design studio and the lab. I believe the first one is the dangerous one to use as inspiration for management principles, for the last two I am hopeful that combined it might make for the right inspiration for management going forward. I have to think a bit more about this on the back of Agile Australia. Stay tuned as I will be coming back to this topic.

Impressions from Agile Australia 2017

There it was again the annual gathering of Agilists in Australia – This year in Sydney. The Accenture team turned up in force and we put together a nice little booth as well. Our Planning poker cards were popular with the audience (and of course with our own team too). The booth and the client drinks on the first day gave us the opportunity to talk to people who are adopting Agile in their organisations, many new faces but also many familiar faces who we have been working with for many years. It’s always good to catch-up and get the latest update on someone’s Agile journey. A lot of work goes into organising a conference. A thank you to SlatteryIT for getting a great conference produced each year. Our team put a lot of effort into our part of it, the booth, the presentation and manning the booth. Thanks team!

Of course there were many interesting sessions and choosing the best ones for each time slot proved as difficult as ever – and truth be told, not every session was a winner for me. I will focus on a few really good takeaways from the conference sessions. Of course the most ‘juicy’ information is always exchanged on the ‘hallway track’ – if you go make sure you spend time outside the session rooms talking to people.

Dom Price from Atlassian – He spoke about creating an organisation that enables knowledge workers to do their best work. It was great hearing from someone else about the problem with using factory or manufacturing principles in IT work. During the session I was waiting for the chance to take a photo of his team health framework and then he dropped that it is all freely available online under https://www.atlassian.com/team-playbook Go check it out!

Joshua Arnold on Cost of delay – in the deep dive we talked about uncertainty profiles and what it does for the cost of delay calculation. I found that a very interesting concept and jumped online to learn more. This blog post stood out for me if you want to learn more:  http://xprocess.blogspot.hk/2016/04/cost-of-delay-profiles.html

2017-06-22 08.57.57Barry O’Reilly spoke about the Lean Enterprise – Overall a great and entertaining talk. The one thing that stood out to me was the “delivery gap” which just shows how bad companies are in evaluating themselves – and for that matter how bad people are evaluating themselves ( remember Dunning-Kruger effect).

 

Sami Honkonen on Responsive Organisations – He had some great examples from his podcast in the talk on why incentives don’t work (the make you focus on the incentive not the work at hand) and why the military is not command and control anymore (something I wrote about here).

Jez Humble – I spoke to Jez after the Deep Dive because he mentioned something I absolutely agree with. Universities are teaching outdated management models. I am very passionate about using the wrong, legacy mental models, something I am speaking about at LAST conference in June 2017 and am writing a book about.

How to choose an IT application for your architecture

App questionNear religious wars have been fought over which IT product to choose for a project or business function. Should you use SalesForce, SAP or IBM? I am not a product person, but I have learned over time that just looking at the functionality is not sufficient anymore. It is very unlikely that an organisation will use the product As-Is and the application architecture the product is part of will continue to evolve. The concept of an end-state-architecture is just not valid anymore. Each component needs to be evaluated on the basis of how easy it is to evolve and replace. Which is why architecture and engineering play a much larger role than in the past. This puts a very different view on product choice. Of course the choice is always contextual and for each company and each area of business the decision might be different. What I can do though is to provide a Technology Decision Framework that helps you to think more broadly about technology choices. I wrote about DevOps tooling a while ago and you will see similar thinking in this post.

My TDF (Technology Decision framework) is based on three dimensions for you to evaluate:

  1. Functionality
  2. Architecture Maturity
  3. Engineering Capabilities

Functionality
As I mentioned in the introduction, very often the functionality provided by the software package has been the key decision factor. The closer the functionality aligns with the process that you want to support, the better a choice it would be. For you to determine whether a software package is suitable or whether you should rather build a custom system (which hopefully leverages open source libraries and modules to not start from scratch) requires you to take a good hard look at your organization. Two factors will be important in this decision: your flexibility in the process you are trying to support and your engineering capabilities. If you are not very flexible with the process you are trying to support and you have a bespoke process, then leveraging a software product will likely require a lot of customisations which are very expensive. If you don’t have a strong engineering capability either in-house or through one of your strategic partners, then perhaps leveraging a software package is the better choice. You need to understand where you stand on the continuum from flexible process, low engineering capability (= package) to a bespoke process and high engineering capability (= custom solution).

If you land on the side of a software package then create an inventory of the required functionality either as requirements or user stories and evaluate the candidate packages. Ideally you want real business users to be involved in this. The idea is that a package is giving you a lot right out of the box and it shouldn’t be too much hassle to get a demo installed in your environment for this purpose. If it is a hassle than that’s a warning sign for you.

Architecture maturity
Architecture maturity is important to support your application. The better your IT capability is the more you can build these capabilities yourself and hence you can rely on custom applications, otherwise the product needs to provide it out of the box. Four aspects that you can start the assessment with are:

  • Autoscaling
    When your application becomes successful and is being used more then you need to scale the functions that are under stress. The architecture should support the flexible scaling of different parts of the applications. It should do this intelligently (e.g. not just scale the whole application, but rather the functions that require additional scale)
  • Self-Healing
    When something goes wrong, the application should be able to identify this and run countermeasures. This might mean the traditional restarting of servers/applications or spinning up a new version of the application/server.
  • Monitoring
    You want to understand what is going on with your application. Which elements are being used, which parts are creating value for your business? To do this the application should allow you to monitor as many aspects as possible and make that data available externally for your monitoring solution.
  • Capability for change
    You want to understand what it takes to make customisations. How modular is the architecture for you to make changes to. If there are a lot of common components this will hinder you from making independent changes and will likely increase your batch size due to dependencies on those common modules.

Engineering Capabilities
Engineering Capabilities increase in importance the more you believe that the application will have to evolve in the future, which in turn is often driven by the strategic importance of the application for your customer interactions. Good Engineering capabilities allow you to quickly change things and to scale up delivery to support increasing volumes of change. The better skilled your IT department is the more it will be able to leverage these capabilities – if you don’t have strong IT capabilities then you will focus more on the in-build architecture features. Here are a few things to look out for

  • All Code and configuration should be extractable
    • You want to be able to use enterprise wide configuration management to manage dependencies between systems, to do that the exact configuration of an application must be extractable and quickly be able to be restored. Inbuilt or proprietary solutions don’t usually allow you to integrate with other applications, hence breaking the ability to have a defined state across your enterprise systems
    • You should be able to recreate the application in its exact state from the external source control system in case it is required, this means no configuration should be exclusive to the COTS product
    • The ease with which the extract and import can be done will give you an indication of how well this can be integrated into your delivery lifecycle.
    • The extracts should be text based so that SCM systems can compare different versions, analyse differences and support merge activities as required
  • The application is built with automation in mind and provides hooks (e.g. APIs) to fully automate the lifecycle.
    • This includes code quality checks, unit testing, compilation and packaging. None of these activities should have to rely on using a graphical user interface.
    • The same is true for the deployment and configuration of the application in the target environments; there should be no need for a person to log into the environment for deployment and configuration purposes.
    • Build and Deployment times are short (e.g. definitely less than hours, ideally less than minutes)
  • The application is modular
    • This reduces the build and deployment times
    • It also allows for smaller scale production deployments and overall smaller batch sizes by reducing the transaction cost of changes
    • It minimises the chance of concurrent development and developers having to work on the same code
    • This in turn reduces the risk of complicated merge activities
  • The application is cloud ready
    • First of all its not monolithic so that required components can be scaled up and down as required, not the whole application
    • Licensing is flexible and supports cloud use cases
    • Mechanisms are built into the system so that application monitoring is possible at a granular level

I hope this differentiated look at things will help you make the right choice, a choice that you won’t regret down the line. Next time you have to make a choice use this framework to help you decide.

The DevOps Silver Bullet

If you work in the DevOps space long enough you would have been offered many “Silver Bullets” over the years. Everything from a specific tool, over just putting the Dev and Ops team together to specific practices like Continuous Delivery. Of course the truth is, there is no “Silver Bullet”. Last year at the DevOps Enterprise Summit I sat together with some of the brightest in this space. We spoke about what the potential “Silver Bullet” could be. It was surprising how quickly we all agreed what the one thing is that predicts success for DevOps in an organisation. So let me reveal the “Silver Bullet” we casted in that room and it is unfortunately a “Silver Bullet” that requires a lot of discipline and work: Continuous Improvement.

When we surveyed the room to see what the one characteristic is of companies that end up being successful with their DevOps transformation, we all agree that the ability to drive a mature continuous improvement program is the best indicator of success.

But what does this mean?

prime-directiveFor starters it means that these companies know what they are optimizing for. There is plenty anecdotal evidence that optimizing for speed will improve quality and reduce cost in the long run (as both of those influence speed negatively otherwise). On the flipside trying to optimize for cost does not improve quality or speed. Neither does a focus on quality, which often introduces additional expensive and time consuming steps. If you want to be fast you cannot afford rework and will remove any unnecessary steps and automate where possible. Hence speed is the prime directive for your continuous improvement program.

In my view it’s all about visibility and transparency. To improve we need to know where we should start improving. After all systems thinking and theory of constraints have taught us that we only improve when we improve the bottleneck and nowhere else. Hence the first activity should be a value stream mapping exercise where representation from across the IT and business come together to visualize the overall process of IT delivery (and run).

I like to use the 7 wastes of software engineering (https://www.scrumalliance.org/community/articles/2013/september/how-to-manage-the-7-wastes%E2%80%9D-of-agile-software-deve) when doing value stream mapping to highlight areas of interest. And this map with the “hotspots” creates a target rich environment of bottlenecks you can improve.

The key for improvement is to use what basically comes down to the scientific method: Make a prediction of what will improve when you make a specific change, make the change, measure the change to see whether you have in fact improved it. Too often people don’t have the right rigor in the continuous improvement program. As a result, changes wont get implemented or if they get implemented no one can say whether it was successful or not. Avoid this by having the right rigor up front.

The other challenge with continuous improvement is that it unfortunately is not linear. It follows a slightly adjusted J-Curve. Initially you will find easy wins and things will look splendidly. But then things get harder and often regress a litte. You could jump ship here, but that would be a bad choice. If you stick to it you will see real improvements above and beyond easy wins later.

jcurve

 

As a result the goal if continuous improvement needs to be to find many smaller J-Curves rather than working on one huge Transformation-type J-Curve. When I saw this concept explained at DevOps Enterprise Summit by Damon Edwards, it was quite a revelation. Such a simple graphical representation that explains so well why many DevOps adoptions struggle. It is too easy to assume a straight line and then get distracted when the initial downturn occurs after the easy wins.

I can tell you from experience that the curve is real and that no matter how much experience I gather, there are always context specific challenges that mean we have that dip that requires us to stay strong and stick with it. You need grit to be successful in the DevOps world.

So the Silver Bullet I promised you does exist, but unfortunately it requires discipline and hard work. It was very encouraging to see that some of the most experienced enterprise coaches and change agents agree that the existence of a rigorous continuous improvement culture. Look around and see whether you are in a place with such a culture and if not, what can you do tomorrow to start changing the culture for the better?

6 Questions to Test whether You are Working with your System Integrator in a DevOps way

anger-1226157_1280If you have been following my blog, you will know that I am disappointed on how little the cultural relationship between companies and their systems integrators is being discussed in blogs, articles and conference talks. As I am working for an SI I find this surprising. Most large organisations work with Sis, so why are we not talking about it? If we are serious about DevOps we should also have a DevOps culture with our SIs, shouldn’t we?

When I speak to CIOs and have a discussion about DevOps and how to improve going forward, I often get a comment at some stage – “Mirco you seem to get this. Why is it then that not all projects with your company leverage the principles you talk about?”.

A good question, and one that a few years ago I didn’t have an answer to and hence made me a bit unsure on how to answer. I have spent a lot of time analyzing in the years since. And the truth is, that often the relationship does not allow us to work in the way most of us would like to work.

The other week I had a workshop with lawyers from both my company and lawyers from a firm that represents our clients to discuss the best way to structure contracts. Finally we all seem to understand that there is a lot of room for improvement. We need to do more of this so that we can create constructs that work for all parties. I am looking forward to continue working them – and how often do you hear someone say that about lawyers 😉

Coming back from yet another conference where this topic was suspiciously absent, I thought I write down this checklist for you to test whether you have the right engagement culture with your system integrator that enables working to the benefit of both organisations:

  • Are you using average daily rate (ADR) as indicator of productivity, value for money, etc.?
    +1 if you said No. You can read more here as to why ADR is a really bad measure all things being equal.
  • Do have a mechanism in place that allows your SI to share benefits with you when they improve through automation or other practices?
    +1 if you said Yes. You cant really expect the SI to invest in new practices if there is no upside for them. And yes there is the “morally right thing to do” argument, but let’s be fair, we all have economic targets and not discussing this with your SI to find a mutually agreeable answer is just making it a bit a too easy for yourself I think.
  • Do you give your SI the “wiggle room” to improve and experiment and do you manage the process together?
    +1 if you said Yes. You want to know how much time the SI spends on improving things, on experimenting with new tools or practices. If they have just enough budget from you to do exactly what you ask them to do, then start asking for this innovation budget and manage it with them.
  • Do you celebrate or at least acknowledge failure of experiments?
    +1 if you said Yes. If you have innovation budget, are you okay when the SI comes back and one of the improvements didn’t work? Or are you just accepting successful experiments? I think you see which answer aligns with a DevOps culture.
  • Do you know what success looks like for your SI?
    +1 if you said Yes. Understanding what the goals are that your SI needs to achieve is important. Not just financially but also for the people that work for the SI. Career progression and other aspects of HR should be aligned to make the relationship successful.
  • Do you deal with your SI directly?
    +1 if you said Yes. If there is another party like your procurement team or an external party involved then it’s likely that messages get misunderstood. And there is no guarantee the procurement teams know the best practices for DevOps vendor management. Are you discussing any potential hindrance in the contracting space directly with your SI counterpart?

A lot is being said about moving from vendor relationship to partnerships in the DevOps world. I hope this little self-test helped you find a few things you can work on with your systems integrator. I am living on the other side and often have to be creative to do the right thing for my customers. It is encouraging to me to see that many companies are at least aware of these challenges. If we can have open discussions about the items above, we will accelerate the adoption of DevOps together. I promise on the side of the SIs you will find partners that want to go the way with you. Find the right partner, be open about the aspects I described above and identify a common strategy going forward. I am looking forward to this journey together. Let’s go!