(This post was first published on DevOps.com)
Let me start by admitting that I am not a test automation expert. I have done some work with test automation and have supervised teams who practiced it, but when it comes to the intricacies of it, I have to call a friend. It is from such friends that I have learned why so many test automation efforts fail. Talking to people about test automation validates my impression that this is the DevOps-related practice that people most often fail at.
Let me share the four reasons why test automation fails, in the hope that it will help you avoid these mistakes in your test automation efforts.
Before I go into the four reasons, allow me one more thought: Test automation is actually a bad choice of word. You are not automating testing, you automate quality assessments. What do I mean by that? It is a mistake to think of test automation as automating what you otherwise would do manually. You are finding ways to assess the quality of your product in automated ways and you will execute it way more often than you would do manual testing. This conceptual difference explains to a large degree the four reasons for test automation failure below.
Reason 1: Underestimating the Impact on Infrastructure and the Ecosystem
There is a physical limit of how much pressure a number of manual testers can put on your systems. Automation will put very different stress on your system. What you otherwise do once a week manually you might now do 100 times a day with automation. And into the mix an integrated environment, which means external systems need to respond that frequently, too. So you really have to consider two different aspects: Can your infrastructure in your environments support 100 times the volume it currently supports, and are your external systems set up to support this volume? Of course, you can always choose to reduce the stress on external systems by limiting the real-time interactions and stub out a certain percentage of transactions or use virtual services.
Reason 2: Underestimate the Data Hunger
Very often test automation is used in the same system where manual testing takes place. Test automation is data-hungry, as it needs data for each run of test execution and, remember, this is much more frequent than manual testing. This means you cannot easily refresh all test data whenever you want to run test automation and have to wait until manual testing reaches a logical refresh point. This obviously is not good enough; instead, you need to be able to run your test automation at any time. There are a few different strategies you can use (and you will likely use a combination):
- Finish the test in the same state of data that you started with;
- Create the data as part of the test execution;
- Identify a partial set of data across all involved applications that you can safely replace each time; or
- Leverage a large base of data sets to feed into your automation to last until the next logical refresh point.
Reason 3: Not Thinking About the System
Test automation often is an orchestration exercise as the overall business process in test flows across many different applications. If you require manual steps in multiple systems, then your automation will depend on orchestrating all those. By just building automation for one system you might get stuck if your test automation solution is not able to be orchestrated across different solutions. Also, some walled-garden test automation tools might not play well together, so think about your overall system of applications and the business processes first before heavily investing in one specific solution for one application.
Reason 4: Not Integrating it into the Software Development Life Cycle
Test automation is not a separate task; to be successful it needs to be part of your development efforts. From the people I have spoken to there is general agreement that a separate test automation team usually doesn’t work for several reasons:
- They are “too far away” from the application teams to influence “ability to automate testing,” which you want to build into your architecture to be able to test the services below the user interface;
- Tests often are not integrated in the continuous delivery pipeline, which means key performance constraints are not considered (tests should be really fast to run with each deployment);
- Tests often are not executed often enough, which means they become more brittle and less reliable. Tests need to be treated at the same level as code and should be treated with the same severity. This is much easier when the team has to run them to claim success for any new feature and is much harder to do when it is a separate team who does the automation. It also will take much longer to understand where the problem lies.
Of course, absence of failure does not mean success. But at least I was able to share the common mistakes I have seen and, as they say, “Learning from others’ mistakes is cheaper.” Perhaps these thoughts can help you avoid some mistakes in your test automation journey. I do have some positive guidance on test automation, too, but will leave this for another post.
And in the case you found your own ways of failing, please share it in the comments to help others avoid those in the future. Failures are part of life and even more so part of DevOps life (trust me, I have some scars to show). We should learn to share those and not just the rosy “conference-ready” side of our stories.
Test automation is for me the practice that requires more attention and more focus. Between some open-source solutions and very expensive proprietary solutions, I am not convinced we in the IT industry have mastered it.
One bonus thought: If you cannot automate testing, automate the recording of your testing.
If you cannot automate testing, find a way to record the screen with each test by default. Once you identify a defect you can use the recording to provide much richer context and make it a lot easier to find the problem and solve it. Verbal descriptions of error are very often lacking and don’t provide all the context of what was done. I keep being surprised how long triage takes because of the lack of context and detail in the defect description. There is really no excuse for not doing this. Record first, discard if successful, attach it to the defect record if you find a problem.
Agreed two clarification to add
Run tests in parallel. In my experience teams get surprised can’t run them because they run serially and don’t finish. The tests need to be run in parallel, and results need to be reported out. When you have a lot of tests you need reports that encourage taking action. This is sort of the same as points 1-3.
Fight flaky tests with mock data. Flaky tests that flip between green-red states are a liability not an asset. Flaky automated tests should be hunted down and fixed or removed. Flaky test are often the result of changing data. The solution is hermetically sealed tests using mock data. Same as point 2.