Category Archives: Uncategorized

How to choose an IT application for your architecture

App questionNear religious wars have been fought over which IT product to choose for a project or business function. Should you use SalesForce, SAP or IBM? I am not a product person, but I have learned over time that just looking at the functionality is not sufficient anymore. It is very unlikely that an organisation will use the product As-Is and the application architecture the product is part of will continue to evolve. The concept of an end-state-architecture is just not valid anymore. Each component needs to be evaluated on the basis of how easy it is to evolve and replace. Which is why architecture and engineering play a much larger role than in the past. This puts a very different view on product choice. Of course the choice is always contextual and for each company and each area of business the decision might be different. What I can do though is to provide a Technology Decision Framework that helps you to think more broadly about technology choices. I wrote about DevOps tooling a while ago and you will see similar thinking in this post.

My TDF (Technology Decision framework) is based on three dimensions for you to evaluate:

  1. Functionality
  2. Architecture Maturity
  3. Engineering Capabilities

Functionality
As I mentioned in the introduction, very often the functionality provided by the software package has been the key decision factor. The closer the functionality aligns with the process that you want to support, the better a choice it would be. For you to determine whether a software package is suitable or whether you should rather build a custom system (which hopefully leverages open source libraries and modules to not start from scratch) requires you to take a good hard look at your organization. Two factors will be important in this decision: your flexibility in the process you are trying to support and your engineering capabilities. If you are not very flexible with the process you are trying to support and you have a bespoke process, then leveraging a software product will likely require a lot of customisations which are very expensive. If you don’t have a strong engineering capability either in-house or through one of your strategic partners, then perhaps leveraging a software package is the better choice. You need to understand where you stand on the continuum from flexible process, low engineering capability (= package) to a bespoke process and high engineering capability (= custom solution).

If you land on the side of a software package then create an inventory of the required functionality either as requirements or user stories and evaluate the candidate packages. Ideally you want real business users to be involved in this. The idea is that a package is giving you a lot right out of the box and it shouldn’t be too much hassle to get a demo installed in your environment for this purpose. If it is a hassle than that’s a warning sign for you.

Architecture maturity
Architecture maturity is important to support your application. The better your IT capability is the more you can build these capabilities yourself and hence you can rely on custom applications, otherwise the product needs to provide it out of the box. Four aspects that you can start the assessment with are:

  • Autoscaling
    When your application becomes successful and is being used more then you need to scale the functions that are under stress. The architecture should support the flexible scaling of different parts of the applications. It should do this intelligently (e.g. not just scale the whole application, but rather the functions that require additional scale)
  • Self-Healing
    When something goes wrong, the application should be able to identify this and run countermeasures. This might mean the traditional restarting of servers/applications or spinning up a new version of the application/server.
  • Monitoring
    You want to understand what is going on with your application. Which elements are being used, which parts are creating value for your business? To do this the application should allow you to monitor as many aspects as possible and make that data available externally for your monitoring solution.
  • Capability for change
    You want to understand what it takes to make customisations. How modular is the architecture for you to make changes to. If there are a lot of common components this will hinder you from making independent changes and will likely increase your batch size due to dependencies on those common modules.

Engineering Capabilities
Engineering Capabilities increase in importance the more you believe that the application will have to evolve in the future, which in turn is often driven by the strategic importance of the application for your customer interactions. Good Engineering capabilities allow you to quickly change things and to scale up delivery to support increasing volumes of change. The better skilled your IT department is the more it will be able to leverage these capabilities – if you don’t have strong IT capabilities then you will focus more on the in-build architecture features. Here are a few things to look out for

  • All Code and configuration should be extractable
    • You want to be able to use enterprise wide configuration management to manage dependencies between systems, to do that the exact configuration of an application must be extractable and quickly be able to be restored. Inbuilt or proprietary solutions don’t usually allow you to integrate with other applications, hence breaking the ability to have a defined state across your enterprise systems
    • You should be able to recreate the application in its exact state from the external source control system in case it is required, this means no configuration should be exclusive to the COTS product
    • The ease with which the extract and import can be done will give you an indication of how well this can be integrated into your delivery lifecycle.
    • The extracts should be text based so that SCM systems can compare different versions, analyse differences and support merge activities as required
  • The application is built with automation in mind and provides hooks (e.g. APIs) to fully automate the lifecycle.
    • This includes code quality checks, unit testing, compilation and packaging. None of these activities should have to rely on using a graphical user interface.
    • The same is true for the deployment and configuration of the application in the target environments; there should be no need for a person to log into the environment for deployment and configuration purposes.
    • Build and Deployment times are short (e.g. definitely less than hours, ideally less than minutes)
  • The application is modular
    • This reduces the build and deployment times
    • It also allows for smaller scale production deployments and overall smaller batch sizes by reducing the transaction cost of changes
    • It minimises the chance of concurrent development and developers having to work on the same code
    • This in turn reduces the risk of complicated merge activities
  • The application is cloud ready
    • First of all its not monolithic so that required components can be scaled up and down as required, not the whole application
    • Licensing is flexible and supports cloud use cases
    • Mechanisms are built into the system so that application monitoring is possible at a granular level

I hope this differentiated look at things will help you make the right choice, a choice that you won’t regret down the line. Next time you have to make a choice use this framework to help you decide.

The DevOps Silver Bullet

If you work in the DevOps space long enough you would have been offered many “Silver Bullets” over the years. Everything from a specific tool, over just putting the Dev and Ops team together to specific practices like Continuous Delivery. Of course the truth is, there is no “Silver Bullet”. Last year at the DevOps Enterprise Summit I sat together with some of the brightest in this space. We spoke about what the potential “Silver Bullet” could be. It was surprising how quickly we all agreed what the one thing is that predicts success for DevOps in an organisation. So let me reveal the “Silver Bullet” we casted in that room and it is unfortunately a “Silver Bullet” that requires a lot of discipline and work: Continuous Improvement.

When we surveyed the room to see what the one characteristic is of companies that end up being successful with their DevOps transformation, we all agree that the ability to drive a mature continuous improvement program is the best indicator of success.

But what does this mean?

prime-directiveFor starters it means that these companies know what they are optimizing for. There is plenty anecdotal evidence that optimizing for speed will improve quality and reduce cost in the long run (as both of those influence speed negatively otherwise). On the flipside trying to optimize for cost does not improve quality or speed. Neither does a focus on quality, which often introduces additional expensive and time consuming steps. If you want to be fast you cannot afford rework and will remove any unnecessary steps and automate where possible. Hence speed is the prime directive for your continuous improvement program.

In my view it’s all about visibility and transparency. To improve we need to know where we should start improving. After all systems thinking and theory of constraints have taught us that we only improve when we improve the bottleneck and nowhere else. Hence the first activity should be a value stream mapping exercise where representation from across the IT and business come together to visualize the overall process of IT delivery (and run).

I like to use the 7 wastes of software engineering (https://www.scrumalliance.org/community/articles/2013/september/how-to-manage-the-7-wastes%E2%80%9D-of-agile-software-deve) when doing value stream mapping to highlight areas of interest. And this map with the “hotspots” creates a target rich environment of bottlenecks you can improve.

The key for improvement is to use what basically comes down to the scientific method: Make a prediction of what will improve when you make a specific change, make the change, measure the change to see whether you have in fact improved it. Too often people don’t have the right rigor in the continuous improvement program. As a result, changes wont get implemented or if they get implemented no one can say whether it was successful or not. Avoid this by having the right rigor up front.

The other challenge with continuous improvement is that it unfortunately is not linear. It follows a slightly adjusted J-Curve. Initially you will find easy wins and things will look splendidly. But then things get harder and often regress a litte. You could jump ship here, but that would be a bad choice. If you stick to it you will see real improvements above and beyond easy wins later.

jcurve

 

As a result the goal if continuous improvement needs to be to find many smaller J-Curves rather than working on one huge Transformation-type J-Curve. When I saw this concept explained at DevOps Enterprise Summit by Damon Edwards, it was quite a revelation. Such a simple graphical representation that explains so well why many DevOps adoptions struggle. It is too easy to assume a straight line and then get distracted when the initial downturn occurs after the easy wins.

I can tell you from experience that the curve is real and that no matter how much experience I gather, there are always context specific challenges that mean we have that dip that requires us to stay strong and stick with it. You need grit to be successful in the DevOps world.

So the Silver Bullet I promised you does exist, but unfortunately it requires discipline and hard work. It was very encouraging to see that some of the most experienced enterprise coaches and change agents agree that the existence of a rigorous continuous improvement culture. Look around and see whether you are in a place with such a culture and if not, what can you do tomorrow to start changing the culture for the better?

6 Questions to Test whether You are Working with your System Integrator in a DevOps way

anger-1226157_1280If you have been following my blog, you will know that I am disappointed on how little the cultural relationship between companies and their systems integrators is being discussed in blogs, articles and conference talks. As I am working for an SI I find this surprising. Most large organisations work with Sis, so why are we not talking about it? If we are serious about DevOps we should also have a DevOps culture with our SIs, shouldn’t we?

When I speak to CIOs and have a discussion about DevOps and how to improve going forward, I often get a comment at some stage – “Mirco you seem to get this. Why is it then that not all projects with your company leverage the principles you talk about?”.

A good question, and one that a few years ago I didn’t have an answer to and hence made me a bit unsure on how to answer. I have spent a lot of time analyzing in the years since. And the truth is, that often the relationship does not allow us to work in the way most of us would like to work.

The other week I had a workshop with lawyers from both my company and lawyers from a firm that represents our clients to discuss the best way to structure contracts. Finally we all seem to understand that there is a lot of room for improvement. We need to do more of this so that we can create constructs that work for all parties. I am looking forward to continue working them – and how often do you hear someone say that about lawyers 😉

Coming back from yet another conference where this topic was suspiciously absent, I thought I write down this checklist for you to test whether you have the right engagement culture with your system integrator that enables working to the benefit of both organisations:

  • Are you using average daily rate (ADR) as indicator of productivity, value for money, etc.?
    +1 if you said No. You can read more here as to why ADR is a really bad measure all things being equal.
  • Do have a mechanism in place that allows your SI to share benefits with you when they improve through automation or other practices?
    +1 if you said Yes. You cant really expect the SI to invest in new practices if there is no upside for them. And yes there is the “morally right thing to do” argument, but let’s be fair, we all have economic targets and not discussing this with your SI to find a mutually agreeable answer is just making it a bit a too easy for yourself I think.
  • Do you give your SI the “wiggle room” to improve and experiment and do you manage the process together?
    +1 if you said Yes. You want to know how much time the SI spends on improving things, on experimenting with new tools or practices. If they have just enough budget from you to do exactly what you ask them to do, then start asking for this innovation budget and manage it with them.
  • Do you celebrate or at least acknowledge failure of experiments?
    +1 if you said Yes. If you have innovation budget, are you okay when the SI comes back and one of the improvements didn’t work? Or are you just accepting successful experiments? I think you see which answer aligns with a DevOps culture.
  • Do you know what success looks like for your SI?
    +1 if you said Yes. Understanding what the goals are that your SI needs to achieve is important. Not just financially but also for the people that work for the SI. Career progression and other aspects of HR should be aligned to make the relationship successful.
  • Do you deal with your SI directly?
    +1 if you said Yes. If there is another party like your procurement team or an external party involved then it’s likely that messages get misunderstood. And there is no guarantee the procurement teams know the best practices for DevOps vendor management. Are you discussing any potential hindrance in the contracting space directly with your SI counterpart?

A lot is being said about moving from vendor relationship to partnerships in the DevOps world. I hope this little self-test helped you find a few things you can work on with your systems integrator. I am living on the other side and often have to be creative to do the right thing for my customers. It is encouraging to me to see that many companies are at least aware of these challenges. If we can have open discussions about the items above, we will accelerate the adoption of DevOps together. I promise on the side of the SIs you will find partners that want to go the way with you. Find the right partner, be open about the aspects I described above and identify a common strategy going forward. I am looking forward to this journey together. Let’s go!

Impressions from DOES 2016

2016-11-08-16-38-18And now it’s over again, the annual DevOps Family gathering a.k.a. DevOps Enterprise Summit. Another year goes by and we were able to check-in with some of our favorite DevOps leaders and got to know some new family members. The event was full of energy and as every year I am trying to summarize what I have seen.

First of all, some overall trends of things I heard coming up again and again:

  • Attracting people – the DevOps space continues to be a hot spot and we are all competing for rare talent in this space. I think the transformational nature makes it harder to find the right people who have the right technical skills and the right mindset to be in continuous change along the journey
  • Platforms as enabler and answer to the team structure question – At the first DOES the discussion about “DevOps teams” was still heated; should you or should you not have a dedicated team. Having an internal platform team to run and operate the DevOps platform seems to be the most common solution. The idea that the platform provides self-service capabilities to the product teams and uses this to abstract away the org structure problem was mentioned several times.
  • Open Source / Open IP – more companies are now talking about open sourcing some of their tooling, including Accenture. This is a good sign for an industry that too long has focused on internal IP. I think DevOps has done great things to open IT up for sharing and providing an ecosystem where we all work together on the big challenges ahead of us

Let’s look at some of the highlight talks below:

Heather Mickman from Target

We got to check-in with Heather Mickman from Target, to see how she has progressed. It was widely seen as one of the best talks of the conference. Some gems of this talk were:

  • How speaking externally about what Target has been doing, has enabled them to attract talent
  • How they moved more work in-house to control the culture and outcomes better
  • How they build their own platform to manage public and private cloud platforms
  • Key metrics she uses are: Number of incidents / deployment and the onboarding time
  • Heather pretty much addressed all the 3 main themes I mentioned above

Scott Prugh from CSG

Another favourite of previous years provided an update on their journey and how a more Ops focused view looks like. The numbers he mentioned are still impressive, with 10x quality and half the time to market achieved through the adoption of DevOps. Their deployment quality is close to perfect with near 0 incidents post deployment (same metric that Heather mentioned). And he also highlighted the self-service platform as key enabler. Another aspect I liked was his focus on automated reporting and making work visible. His colleague Erica than brought the phoenix project to life by comparing her world to the book. I love this.

Ben and Susanna from American Airlines

I am writing this summary while waiting for my 17hour delayed AA flight, so I assume there is still some room for improvement on the DevOps front 😉 Their talk focused on the “opportunities” provided by the merger of two airlines and what to do with 2 very different stacks initially and how to slowly merge them. They also highlighted the common challenge with test automation and how to measure success with DevOps. It feels better but how do we really measure it?

Gene and John’s fireside chat

I mean what can you say about this one…it was fascinating and a geek-out for all of us. So many threads to follow, it felt like Alice in Wonderland for DevOps guys. When you watch this in replay you will feel to urge to buy books and keep googling things. Hold your fire and buy the Beyond the Phoenix project audiobook when it comes out. I surely will!

Mark Schwartz on Business Value

Mark Schwarz was back and spoke about his book. A great exploration of the concept of business value. He did not provide the answer, but some interesting things to consider:

  • ROI misses the point – profit is a fiction, flexibility and agility and options are not reflected
  • ROI does not easily work to derive decisions, too far away or too much work
  • Not each item in the backlog can feasibly be assigned a value
  • It is so important to have a conversation about business value to decide how Agile teams will use it to derive priorities

Keith Pleas from Accenture

Another exploratory talk about the Automation of Automation. How we focus our attention to automate applications, but are we using the same ideas for our own DevOps architecture. Like Gene and Johns talk, there were many breadcrumb trails to follow with this one. Accenture has also open sourced it’s DevOps platform, which you can find here: http://accenture.github.io/adop-docker-compose/

The main themes of Open IP, Platform teams and attracting talent were hit on by Keith.

There were so many more great talks, check them out when the recordings are available. I will choose a few more quick highlights below:

  • Pivotal’s talk added the product orientation as organisational mechanism to the discussion on platform teams.
  • My good friend Sam Guckenheimer from Microsoft had the guts to do a live demo on stage, which worked out really well and show some very interesting insights into Microsofts developer platform.
  • Carmen DeArdo from Nationwide had one of the best slides in the conference in my view. I really like the cycle transformation picture, what do you think?
    2016-11-07-16-56-20
  • Topo Pal from CaptialOne had some of the best nuggets at the confernece:
    • “It takes an army to manage a pipeline”
    • 16 gates of quality or as he calls it, 10 commandments in Hex 😉
  • We had a really good introduction to Site Reliability Engineering by David Blank-Edelmann and the concepts of Error budgets, Blameless post mortems and much more. He also phrased that “You cant fire your way to reliability” and that maturity models should be there to determine the right help, not to punish someone.

Best thing of course are the hallway talks, the opportunity to talk to old friends and to make new friends. Another great event gone by…

See you all at DOES 17 Nov 13-15 2017 back in San Francisco. I will be there and will look forward to meet you all there. Come join us at the family gathering next year!

Why Develop-Operate-Transition projects need to get the DevOps treatment

The DevOps movement has focused the IT industry on breaking down silos and increase collaboration across the different IT functions. There are popular commercial constructs that did a great job in the past but which are not appropriate for DevOps aligned delivery models. A while ago I talked about the focus on Average Daily Rate, in this post I want to discuss how to change the popular Develop-Operate-Transition (DOT) construct.

Let’s look at the intent behind the DOT construct. The profile of work usually changes over time for a specific system and idealised looks like this:

  • During Development the team needs to be large and deal with complex requirements, new problems need to be solved as the solution evolves
  • During Operate the changes are smaller, the application is relatively stable and changes are not very complex
  • At some stage the application stabilises and changes are rare and of low complexity
  • And then the lifecycle finishes when applications are being decommissioned (– and yes we are not really good at decommissioning applications, somehow we hang onto old systems for way too long. But for the sake of argument let’s assume we do decommission systems)

As an organisation it is quite common to respond to this with a rather obvious answer:

  • During development we engage a team of highly skilled IT workers who can deal with the complexity of building a new system from scratch and we will pay premium rates for this service
  • During Operate we are looking for a ‘commodity’ partner as the work is less complex now and cost-effective labour can be leveraged to reduce the cost profile
  • As the application further stabilises or usage reduces we prefer to take control of the system to use our in-house capabilities

So far so obvious.

If we look at this construct from a DevOps perspective it becomes clear that this construct is sub-optimal as we have two handover points and in the worst case these are between different organisations with different skills and culture. I have seen examples where applications stopped working once one vendor left the building because some intangible knowledge did not get transitioned to the new vendor. It is also understandable if the Develop vendor focuses on aspects that are required to deliver the initial version and less focused on how to keep it running and how to change it after go-live. While the operate vendor would care a lot about those aspects and rather compromise on scope. Now we could try to write really detailed contracts to prevent this from happening. I doubt that we can cover it completely in a contract or at least the contract would become way too extensive and complicated.

What is the alternative you ask? Let’s look at a slight variation:

DOT

Here the company is involved from the beginning and is building up some level of internal IP as the solution is being built out. In a time where IT is critical for business success I think it is important to build some level of internal IP about the systems you use to support your business. In this new type of arrangement in the beginning the partner is providing significant additional capabilities, yet the early involvement of both the company itself and the application outsourcing partner makes sure all things are considered and everyone is across the trade-offs that are being made during delivery of the initial project. Once the implementation is complete a part of the team continues on to operate the system in production and makes any necessary changes as required and any additional smaller enhancements. If and when the company chooses to take the application completely back in-house, it is possible to do so as the existing people can continue and the capability can be augmented in-house as required at this point. While there will still be some handover activities the continuous involvement of people makes the process a lot less risky and provides continuity across the different transitions.

Of course having a partner for both implementation and operating is a much better proposition as this will further reduce the fraction. I have now worked on a couple of deals like that and really like that model as it allows for long-term planning and partnership between the two organisations.

Most people I spoke find this quite an intuitive model, so hopefully we will see more of these engagements in the future.

How to Fail at Test Automation

(This post was first published on DevOps.com)

Let me start by admitting that I am not a test automation expert. I have done some work with test automation and have supervised teams who practiced it, but when it comes to the intricacies of it, I have to call a friend. It is from such friends that I have learned why so many test automation efforts fail. Talking to people about test automation validates my impression that this is the DevOps-related practice that people most often fail at.

Let me share the four reasons why test automation fails, in the hope that it will help you avoid these mistakes in your test automation efforts.

Before I go into the four reasons, allow me one more thought: Test automation is actually a bad choice of word. You are not automating testing, you automate quality assessments. What do I mean by that? It is a mistake to think of test automation as automating what you otherwise would do manually. You are finding ways to assess the quality of your product in automated ways and you will execute it way more often than you would do manual testing. This conceptual difference explains to a large degree the four reasons for test automation failure below.

Reason 1: Underestimating the Impact on Infrastructure and the Ecosystem

There is a physical limit of how much pressure a number of manual testers can put on your systems. Automation will put very different stress on your system. What you otherwise do once a week manually you might now do 100 times a day with automation. And into the mix an integrated environment, which means external systems need to respond that frequently, too. So you really have to consider two different aspects: Can your infrastructure in your environments support 100 times the volume it currently supports, and are your external systems set up to support this volume? Of course, you can always choose to reduce the stress on external systems by limiting the real-time interactions and stub out a certain percentage of transactions or use virtual services.

Reason 2: Underestimate the Data Hunger

Very often test automation is used in the same system where manual testing takes place. Test automation is data-hungry, as it needs data for each run of test execution and, remember, this is much more frequent than manual testing. This means you cannot easily refresh all test data whenever you want to run test automation and have to wait until manual testing reaches a logical refresh point. This obviously is not good enough; instead, you need to be able to run your test automation at any time. There are a few different strategies you can use (and you will likely use a combination):

  • Finish the test in the same state of data that you started with;
  • Create the data as part of the test execution;
  • Identify a partial set of data across all involved applications that you can safely replace each time; or
  • Leverage a large base of data sets to feed into your automation to last until the next logical refresh point.

Reason 3: Not Thinking About the System

Test automation often is an orchestration exercise as the overall business process in test flows across many different applications. If you require manual steps in multiple systems, then your automation will depend on orchestrating all those. By just building automation for one system you might get stuck if your test automation solution is not able to be orchestrated across different solutions. Also, some walled-garden test automation tools might not play well together, so think about your overall system of applications and the business processes first before heavily investing in one specific solution for one application.

Reason 4: Not Integrating it into the Software Development Life Cycle

Test automation is not a separate task; to be successful it needs to be part of your development efforts. From the people I have spoken to there is general agreement that a separate test automation team usually doesn’t work for several reasons:

  • They are “too far away” from the application teams to influence “ability to automate testing,” which you want to build into your architecture to be able to test the services below the user interface;
  • Tests often are not integrated in the continuous delivery pipeline, which means key performance constraints are not considered (tests should be really fast to run with each deployment);
  • Tests often are not executed often enough, which means they become more brittle and less reliable. Tests need to be treated at the same level as code and should be treated with the same severity. This is much easier when the team has to run them to claim success for any new feature and is much harder to do when it is a separate team who does the automation. It also will take much longer to understand where the problem lies.

Of course, absence of failure does not mean success. But at least I was able to share the common mistakes I have seen and, as they say, “Learning from others’ mistakes is cheaper.” Perhaps these thoughts can help you avoid some mistakes in your test automation journey. I do have some positive guidance on test automation, too, but will leave this for another post.

And in the case you found your own ways of failing, please share it in the comments to help others avoid those in the future. Failures are part of life and even more so part of DevOps life (trust me, I have some scars to show). We should learn to share those and not just the rosy “conference-ready” side of our stories.

Test automation is for me the practice that requires more attention and more focus. Between some open-source solutions and very expensive proprietary solutions, I am not convinced we in the IT industry have mastered it.

One bonus thought: If you cannot automate testing, automate the recording of your testing.

If you cannot automate testing, find a way to record the screen with each test by default. Once you identify a defect you can use the recording to provide much richer context and make it a lot easier to find the problem and solve it. Verbal descriptions of error are very often lacking and don’t provide all the context of what was done. I keep being surprised how long triage takes because of the lack of context and detail in the defect description. There is really no excuse for not doing this. Record first, discard if successful, attach it to the defect record if you find a problem.

Is there such a thing as Hybrid Agile?

I recently wrote an article about Hybrid Agile for InfoQ  because the term has been misused too often. Feel free to read the whole article. Here is my conclusion from the article:

After many discussions, people convinced me that “Hybrid-Agile” is what is otherwise called Water-Scrum-Fall and after some deliberation I admit that this makes sense to me. Of course the term Water-Scrum-Fall or similar phrases are often used with contempt or smiled upon, but when you look closer, this is reality in many places and for good reasons.