Category Archives: Uncategorized

The modern IT architect needs to manage 3 architectures to enable Agile and DevOps ways of working

3 architecturesI have been a technology architect for a long time and have worked with many different technologies. And there is something satisfying about coming up with “the architecture solution” for a business problem. The ideal end-state that once implemented will be perfect.

Unfortunately I had to come to the realization that this is not true. There is no end-state architecture anymore and there never was. All those diagrams I drew with the name end-state in it – they are all obsolete by now.

Knowing that architecture will continue to evolve (just look at the evolution of the architecture of Amazon (or many other internet companies) over the years) means as architects we need to think differently about architecture. We need to build architectures that before even implemented are already considering how parts will be replaced in the future. No stone will remain unturned over time no matter how good they seem at the moment. So rather than spending time on defining the end-state we need to spend more time understanding and defining the right principles of architecture for our organisations and manage the evolution of the architecture – how technical debt is paid down and systems are being decoupled for the eventual replacement.

This would be difficult if we had to deal just with the architecture of business systems. Reality is that in the current IT world we have to deal with three different architectures: the business systems architecture, the IT tools architecture and the QA and Data architecture.

Let’s quickly define these:

  • Business Systems Architecture – this is usually well defined in organisations. How do your CRM, ERP and billing systems work together to achieve business outcomes
  • IT Tools architecture – this is the architecture of all your tools that make IT delivery possible: configuration management, container management, deployment, defect management, Agile lifecycle, etc
  • QA and Data architecture – how do we validate that systems are working correctly both in production and in new development and how is data flowing across systems and environments

All three of these architectures need to be managed with the same rigor and focus on continuous evolution as the business systems architecture. This will make the job of architects a little bit more complicated. At the moment I see many organisations not having architects focused on all three architectures as they are not perceived as being of similar importance.

Let me give you some examples from my past to highlight why that is foolish:

  • One of my clients was already pretty mature in their automation so that all deployments to production were fully automated. Unfortunately their deployment architecture was basically a single Jenkins server that was manually maintained. When this server was wiped out by mistake it took weeks to get the capability back to deploy to production – in the mean time very risky manual deployments had to be performed by people who had not done this in months
  • Another client of mine had built a test automation framework that was too tightly coupled so that it took a lot of effort to replace one of the components and maintenance had become so expensive that they had stopped using it – ultimately there was too much technical debt in the tests and the QA and data architecture

The answer of course is that all three architectures need to be managed by architects in similar ways (e.g. failover and availability need to be considers for IT tools and QA tools too) and that the principles of decoupling and continuous evolution need to be aspects of all three.

The architect function is one that will see a lot of change as we come to terms with managing three interconnected architectures and the evolving nature of architecture. But I think it will make the job more interesting and will allow architectures to climb down from the proverbial ivory tower to engage with the Agile delivery teams in a more meaningful way.

DevOps Enterprise Summit Las Vegas 2018 – The best yet?

DOES18Its already over again – the annual get together of the brightest DevOps minds (well the brightest who could make it to Vegas). And in this instance I want to make sure that what happens in Vegas, does not stay in Vegas by sharing my highlights with all of you. It was a great event with a slightly higher focus on operations than last time.

The four trends that I picked up on:

  • Self service operations are considered an good answer to the “DevOps Team” problem
  • The prevalence of Dunning Kruger (Link) when it comes to self-assessments -> We are “DevOps”, we use the “cloud”,…
  • Minimum Viable Compliance as a new term
  • DevOps for AI – I did not see much AI for DevOps yet, perhaps next time

This year the conference focused much more on operations which is great, for next year I hope that we bring in some of the end-to-end business stories. How have we used DevOps practices to drive business – thinks like instrumentalization of software features to understand the business impact.

The top 3 talks

Andrew Clay Shafer

Andrew spoke in his typical eccentric style (with tie on his head) about Digital Transformations and doing DevOps. He made it clear that there is no real end to this transformation and compared it with getting fit (something he took on successfully since the beginning of the year). All the external help you can get will not make you fit, it’s the work you put in yourself. The same is true for this DevOps/Digital transformation. He also made a good point that some message can be dangerous if they are given before the recipients are ready.

J Paul Reed

I admit that I went into “5 dirty words about CI” expecting a talk about Continuous Integration like many others , something John humorously took on in the beginning. The talk focused on Continuous Improvement instead and stood out for me. Key learnings:

  • Root causes are a social construct for the point where we stop looking further, more appropriately we should call those “proximate causes”
  • A great story from Amazon, who used an outage caused by “human error” to look for the weaknesses in the systems rather than finding fault with the person who made a mistake
  • That incidents are not deterministic – there are many parallel universes where the incident might not have happened in the same circumstances, our systems are too complex to be deterministic. The Swiss Cheese model to analyse incidents was a great take away for me.
  • Human error is not the cause, it’s the effect. It’s the start of the investigation.

Damon Edwards

Okay, okay he usually is in my top list of talks because I like his style and approach. This time he told a great case study of how all the new fancy technologies and techniques did not prevent the opps problems. Which was a) funny and b) educational to move us away from all the techno-optimism. He then described the self-service operations model which I prefer myself as well, so this was good to see discussed, which is also called out in the latest puppet state of DevOps report.

 

Some other nuggets from other talks

Courtney Kissler

  • First mention of “Minimum Viable Compliance”
  • I loved the phrases “Geriatric stack” and “PTSD caused by legacy apps”
  • Great story on how compliance can calculate negative ROI to justify investment

Scott Prugh

  • Great case study where he showed the results over the years
  • Showing how Agile and DevOps work together to achieve huge results
  • Aligning teams to Value streams not projects and even using the platform as a product with product owner
  • Some interesting aspects on lean portfolio management that is run through several “tanks” like sharktank
  • I loved the metric – “% of things releases outside of release cycle”

Jeffrey Snover

  • Super inspiring to hear from someone who stood up for what he believed in and that careers can be a bit like snakes and ladders (it took him 5 years to come back from a demotion)
  • Insightful to hear about organisations and/or leaders having a list of people who are moving the company forward and of course those are the people that get well rewarded
  • How transitions transformations can push your career – introducing the terms stair job vs elevator jobs
  • Loved the point that when you are really good at something that doesn’t matter anymore that it will be bad for you (either as organisation or in your career) – great example from DEC which was excellent at something that didn’t matter anymore a and missed the move from vertical integrated to horizontal systems
  • He made a point that he believes the pendelum is swinging back to vertical integrated (like Azure and AWS)
  • “Build what differentiates you, buy what doesn’t”

Thorsten Volk

  • Reiterated the point I have heard a few times now that AI is still similar to the 90s just more accessible and powerful
  • Understand what AI and human understanding is good for and when to use what
  • Start with a narrow problem and extend it once you have a useful answer
  • Treat AI as code – Parameters, training set, data transformation pipeline, etc
  • Use public data – there is heaps that you can use to teach your algorithms
  • AI is a virtual reality as you can only see what is in the data and that data can be biased

Mik Kersten

  • He introduced his new framework to help with the transition from project to product which is described in his new book, which was available at the conference for the first time (I will review it in a blog post in due course)

Rob England

  • How some of the new material in the DevOps world has forgotten the old and sometimes is even reinventing it
  • Interesting anecdote that the SRE book comes to the conclusion that key metric is unplanned downtime – something the ITSM community has already know
  • How DevOps has not covered everything that ITSM covered like user training/Desktop management,… – there is some benefit in review the more rigorous material from ITIL
  • Gave us hope that ITIL 4 will be more relevant and easier to consume vs ITIL 3 being a bit of mixed bag

Jez Humble, Nicole Forsgren

  • Only 22% of people who say that they use the cloud are following the 5 characteristics of NIST – most hands went down at “OnDemand Self Service”

Cornelia Davies

  • Difference between functional and imperative programming
  • Why functional programming allows us to do get systems to do more for us and are less error prone because there is no state embedded
  • The term “Repave environments” – refreshing every part of the environment which we should do regularly
  • Introduced the concept of “Sidecars” – a container next to another container in a Kubernetes pod that deals with cross cutting concerns like security

 

Another brilliant conference is over and I am already looking forward to next year.

Something is missing in the Agile manifesto

As Agilists we keep using the Agile manifesto and are pretty much beyond questioning the exact words on the page. One key one is:

Working Software over comprehensive documentation

And over the years we had the arguments that this does not mean that there is no documentation, what exactly the definition of working software is and many other aspects of this. Yet I failed to pick up on a very important nuance for all those years.

The other day I was running an Agile training and someone with less of an IT background called out to me that she disagrees with the statement “working software over comprehensive documentation”. I was shocked initially until she explained that working software means terribly little when no one uses it (for example due to poor documentation or change communication) or if the software does not solve a business problem.

Wow – how had I not noticed this clear miss before. Of course, we are not about just building software and the Agile manifesto came from people who were thinking about creating software in a better way. But more than decade after I started to run Agile trainings, why had I never noticed this clear gap in the manifesto for our modern context.

A few weeks earlier I was at an Agile conference in Germany and someone spoke about Agility rather than Agile and pointed out how much of the language in the Agile principles behind the manifest are software specific (e.g. “The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.”) How have we not updated this material to be more relevant by changing all the wording to be inclusive of business and other parts of the organisation that need to enable success? See below for references to software development in more than half of the principles.

principles

Of course, the manifesto has its relevance as historic document, but whenever we talk about it, it is worthwhile calling out that we do have a broader understanding of this now which goes way beyond IT and software development. I for one will call this out more clearly whenever I run training going forward. We are using Agile to solve business problems and creating better software is only a small part of the solution. Thank you Jane for calling out a blind spot that I have had for years!

UPDATE: Thanks to Phil i found this tweet which discusses a solution:

tweet

 

It’s time to make Application Management Sexy Again

app maintenanceA few weeks ago I was in group of people and someone made a statement that application management (AM) is commodity and most people in the group agreed. I didn’t say anything but I think application management is one of the most exciting places to be today. There are so many technology trends that apply to AM: Agile, DevOps, Artificial intelligence, etc.

When I say application management I mean some level of operations (e.g. level 3 support and code fixes) and some level of application development/maintenance (smaller changes that don’t require projects). The nature of those activities are that they are frequent and smaller, which means we have a lot of opportunity to learn and improve. It’s the ideal testbed to showcase how successful modern engineering practices can be assuming the volume of work is large enough to justify investment.

Let’s look at two dimensions of this: the code changes and the platform services

The product team making the actual code changes should have responsibilities for making changes for both new scope and production defects. This team needs to reserve some capacity for urgent production defects, but the bulk of production defects get prioritised against new scope and are just forming part of normal delivery. This is possible because the release frequency is high (at least monthly) and the team can leverage platform services like CI/CD, feature flags, automated environment provisioning and monitoring services. The team collaborates with the platform services team in many ways and focuses on the application they own. Both Kanban and Scrum are possible methodologies this team can use, making this a great Agile working space.

The platform team provides the services which make life easier for the product teams. They also provide the first levels of support in a centralised fashion to protect the product team from too many distractions. There are two goals for this team: to provide platform services to the product team (like CI/CD, environment provisioning, monitoring) and to keep improving the customer experience for end-users. Fast ticket resolution is not the right measure for this aspect, but rather how to reduce the tickets overall. Advanced monitoring of performance, functionality and infrastructure is being used to find problems before the customer finds it. Each user created ticket is an opportunity to improve the monitoring for next time. Chatbots and self service are used to provide the fastest service for standard requests and AI/ML is learning from tickets to make more and more requests become standard and to route tickets to the best team/person. Analytics on all aspects of the platform allows to find patterns and reconcile them against issues. The platform team collaborates with the product team to make sure the application related services keep being improved. Over time this team becomes a small team of automation experts rather than a large team of operators for manual tickets.

Seriously if you look at the above, how are you not super excited about this space?If you do this right you get to use a lot of super modern engineering practices to improve the delivery capability of your organisation. The amount of money/capacity that frees up once you have implemented the first improvements can then be used to make more and more meaningful changes to the products and services. You can do this with a focus on the creative tasks rather than the fire fighting and repetitive manual work that consumed application management teams in the past.

I am working with a couple of organisations now that are as excited as I am about this space and have a similar vision. I am very much looking forward to see what results we will achieve together.

The simple Agile question

A few weeks ago I published a blog post on simple DevOps questions you can ask your team to find out how mature your adoption of DevOps practices are (you find it here). There is a case to be made that you should have a similar question for your Agile adoption. The problem here is that there are so many different flavors of Agile and that not one is better than the others (no matter what all those zealots out there are saying). The large amount of variations makes it hard to analyse the maturity of Agile unless you understand the context of the organisation and can evaluate the delivery success of the team. But I found a question that helps me differentiate between the “false” or “hybrid” Agile flavors and “real” Agile. I call it my minimum definition of Agile for it to be something I can engage with and think that it can be successful. Here we go, Mirco’s minimum Agile definition:

“One Team, produces a working and tested version of software within each strict timebox of equal length.”

You might think this is a really low bar for an Agile definition, but it rules out a lot of the work that I see being done under the name “Agile”. It rules out all of the following patterns that are suboptimal if not dangerously risky for delivery success:

  • Separate design, build and/or test teams – often this is a remnant of existing vendor relationships or strong “fiefdoms” and the Agile change has not been able to break down the barrier. For me this is an absolute showstopper for Agile success. Pause and see how you can enable a joint team.
  • Design sprints, followed by build sprints, followed by test sprints – Well let’s call this what it is, Waterfall delivery 😉
  • Teams being changed and swapped – Many of the Agile benefits only manifest once the team has been working together for a while, if you need to keep changing people or ramp-up or down then there is something wrong with the work preparation part
  • Agile teams building lots of stuff but are unable to validate it end-to-end – This is perhaps an already more advanced problem but I keep seeing it a lot these days. Agile teams deliver software but cannot test it sufficiently and while the velocity is high, once the product gets end-to-end tested it falls apart. There are three root causes that I commonly see: unavailability of test environments, lack of maturity in DevOps practices and lack of up-front-planning of end-to-end scenarios.
  • Sprints that have different lengths or are changed at last-minute – Agile is more successful because it is more rigorous and forces you to address problems early and often. If you weaken the conditions you are stepping on a slippery slope that will likely impact you more and more. Step off the slope and fix the problems rather than changing the sprint timebox

I have been happy with this definition for now as it weeds out the most common bad patterns, perhaps it helps you too. Let me know if you have your own question or definition that you use.

The Three dimensions of change for successful IT transformations as outlined in “DevOps for the Modern Enterprise”

DevOps and Agile transformations continue to be challenging for organisations due to the complexity of the problem. It seems that most people agree on what good looks like but struggle to find their way there. Successful organisations like Netflix or Spotify provide examples but their journey cannot be simply copied and organisations which try encounter a lot of challenges as their context organisationally and technology-wise is different. The transformation is different for every organisation, but some patterns are common, so what can one do.

In my book “DevOps for the Modern Enterprise” I tried to write down my experience from working with many large organisations. While there is no one solution, certain patterns that prove successful can be seen and more importantly a certain way of thinking has been common for organisations who made progress. And of course I have been part of many things that did not work out and can share those experiences too to help others avoid costly mistakes. No need to pay for those lessons yourself.

Three themes run throughout the book:

  • The transition from managing IT work similar to manufacturing work towards managing a creative process enabled by knowledge workers
  • Dan Pink’s three ways to motivate knowledge workers: Autonomy, Mastery & Purpose
  • The power of small batches – why it is important to work in small batches and how to enable them

These three themes run like threads throughout all parts of the book. I was motivated by the fact that as a developer myself, I found it very frustrating how for a period of time over-industrialisation created a view of IT work as just a production process that should be optimised for cost and productivity. Something that simply is not appropriate in this day and age.

3 dimensions

The book is structured in three dimensions of change which need to be addressed for a transformation to be successful:

  • The ecosystem of the organisations including vendors and software partners
  • The People Dimension and the organisational structure
  • The Technology Architecture of practices and technologies to enable engineering

And in the center of it all is a rigorous continuous improvement process that leverages the scientific method to plan and evaluate the experiments that make up the transformation over time.

Let me give a little summary of each section:

Creating the right ecosystem

The ecosystem is the environment in which the transformation needs to take hold to make real change. This part of the book looks at how to setup the transformation for success and how to identify the right applications in a legacy environment that should be tackled first. It also explores the roles of technology vendors and systems integrators and helps organisations choose the right ones that can help during the transformation.

The People and Organisational Dimension

Nothing happens without people in technology, so how can we structure our processes and our organisation in a way that maximises Autonomy, Purpose and Mastery for our knowledge workers?

I explore how Agile helps to provide all three things to our people, how some organisational structures have shown to be good stepping stones for success and how the organisation has to move from testing to quality engineering.  I also provide some ideas on how to focus more on the people in the team when managing teams based on my own experience of managing teams.

Technology and Architecture Aspects

The last part of the book is the most obvious one that many expect from a DevOps book. What technologies and technical practices should you use and how can you be successful with them. I revisit Continuous Delivery and more advanced delivery patterns. I take a long hard look at the DevOps tools out there and how you can choose the most suitable ones for your organisation while avoiding any specific vendor recommendations as they would age way too quickly. Of course architecture evolution towards Microservices and leveraging the cloud successfully are covered as well.

Organisations who consider all three dimensions are stacking their chances for success. Each chapter of the book comes with practical exercises that I run with clients and that the reader can run for her/himself in her or his organisation. I wanted to make sure that the book is not just for reading but also for taking action and taking the next step of the journey.

DevOps for the Modern Enterprise is widely available from 4 April 2018:

book

Amazon: Link
iTunes: Link
Bookdepository: Link

Mirco’s self assessment questions of DevOps Maturity

Doubt and Solution - Solutions and Ideas ConceptI like it simple. And I have a somewhat shaky relationship with maturity models as might have gathered from some of my other posts. I see the power of using a maturity model or assessment to help people understand where they are, but it’s also so prone to Dunning-Kruger (Link). So what else can we do to understand our maturity in IT delivery as it relates to DevOps.

I am running executive training that helps program leads and delivery leads understand how to run programs that use Agile and DevOps methods. As a lead they are unlikely to run stand-ups or configure Jenkins but need to understand enough to support their teams and help stay the course. Hence we created a training that focuses on principles and how things look like from the executive perspective. We call it Modern Engineering Bootcamp. I cheekily usually say in the introduction that the aim is to make executives dangerous for their teams, because they will be able to call things out that teams don’t do well. They will be able to call BS when their teams believe they are doing well but really are not. We had this moment when a participant during the discussion on configuration management spoke up: “Until just then I believed we had our code in configuration management and only after asking a few more questions inspired by this course have I just found out that our Pega configuration is only stored in Pega not in a proper configuration management system. We will change this ASAP to improve our DevOps setup and to be able to automate further”.

This got me thinking, can I come up with a small set of self assessment questions that helps people make the same kind of discoveries where they need to.

Thinking about it for a while, I went through my past projects and realised that I asked my teams certain questions and asked them to do certain things when I got a little bit worried we are missing something, it’s time to share these questions with you (and some of the stories behind it):

Let’s blow away the environment and rebuild it from our stored configuration

When I was still a young developer I was working on a mainframe project. We had a development environment that we used to check our code before deploying to the mainframe. I noticed that the environment was cluttered up with lots of things that seemed to be leftover from earlier iterations and decided to blow the environment away and rebuild it from what we had stored in configuration. We might call this now “infrastructure as code”. It turned out that when I tried to rebuild it, that not only some files were missing (they were not in SCM as they were never migrated when moving to a newer SCM) but also the mainframe CICS configuration was not correct. We spent 2 weeks fixing this and while my boss was initially not happy that I had caused two weeks of extra work, we were lucky to find this in development and not at some later stage in production. Some of the missing files only existed in live environments and not in any SCM system…phew… By the way nowadays this question usually means lets rebuild a cloud environment from scratch, which should cause even less headaches.

When you ask your team to do this you are really looking for two things:

  1. Does your team look worried or push back against this request? Explore what the concerns are, it’s likely they know where the skeletons are in the closet.
  2. If they are confident and kick this off, measure how long it takes and whether you are happy with the timing.

Keep repeating this regularly to improve speed and reliability.

Let’s delete the application config and redeploy it

In my second project we ran some pretty fancy Control-M schedules to deploy applications and update configuration as required. Jenkins was still to be invented I think (or I hadn’t heard of it at least). Because we ran two releases in parallel and had lots of queues and databases to connect, we used a massive excel sheet to store our environment configuration. By the time I joined the project the project was already under way for a long time. I was worried that some configuration was missed in the excel sheet and decided to build a new environment and deploy the configuration only from the excel sheet. No surprise I found a handful of configurations that were not stored. Luckily I had learned from my previous project not to blow away an environment in use but rather build a new one in parallel to find these problems.

You are using this question the same way as the first one. See whether your team is comfortable and regularly run the exercise.

Let’s redeploy the application even though nothing has changed

So your team told you the application deployment is automated and reliable? Great, ask them to schedule a deployment of the latest package every morning into your dev or test environment before the team comes to work. Given they are deploying the same thing there is very little risk and it’s automated so no one needs to be around for it. On my third larger project this is what I did. And when the team pushed back initially I asked for it anyway. And guess what… for the first few weeks someone had to come to office early as there was always a small hiccup or another until it was fully smooth. The only thing that proves automation and keeps it clean is running it outside of business hours when no one touches the system and doing it very frequently even (and especially) when no changes to the application are required. It keeps proving to everyone that the deployments are reliable and that if after deploying a new version of the software problems occur, they are likely to be found in the code not in the deployment mechanism. Running it every day also allows you to measure performance and keep improving it. Those of you deploying multiple times a day, well done yours must be reliable already to do it so often, for everyone else: Start doing it frequently to become reliable and take the worry away.

Let’s rerun the test suite and then again

In one of my later projects we were struggling to make test automation work. We had a few goes at getting a test automation framework going. At some stage the test team celebrated that they had a regression suite that works. After a successful run I asked them to quickly rerun it and it turned out that we couldn’t as the data had to be completely refreshed and would have blow away the progress made from manual testing that was running at the same time. We had some more work to do. We had other performance related issues too that made it difficult to grow our regression test suite. But that’s a story for another time.

 

So with these four questions I think I provide you a simple diagnostic for some of the basic principles underpinning DevOps. See how your team reacts when you ask these. If your team cant perform these tasks then start working on improving the situation until all four of these become second nature and the questions don’t even raise eyebrows anymore.