Test automation metrics – what do you report on?

Metrics

One of the fun things of test automation is that, since you do not have to do all the tests manually, you can spend some extra time coming up with test metrics. Test metrics are tricky to do well in any situation, but in a situation where there is an abundance of metrics, such as in a test automation setup,  the choice of metrics becomes the key first step. What are the metrics to look at? Code coverage? Number of tests passed vs Number of tests failed? Duration of the tests over time? Number passed now vs number passed in previous runs? Newly automated tests added since last run? You can keep going in dreaming up new metrics, but which ones will actually make sense and become representative?? And of course, how do you ensure you do not spend ages ploughing through your data to gather these metrics manually?
Borrowed the image from khanmjk If you just take a test automation tool off the shelf it probably has an immense amount of options to measure on and report on, but the risk is always that you start generating reports and metrics that are not quite representative, or even worse, give a tainted view of the actual situation. So how do you make sure you don’t end up with a jungle of metrics?

Audience

First thing you need to know is who is the audience of your metrics? There is a huge difference in what different levels in an organisation consider useful metrics. One manager can be mainly interested in the time spent automating versus the time won by automating; e.g. the extra time now available for testing other stuff, the stuff that matters, while a test manager might be more interested in the functional areas of the application covered and to what extend they are covered.

Type of metrics

I will not attempt to dream up the perfect metric, for every environment and situation one metric might be better than the other. It all depends on the context, the persons you are reporting to, targets of each particular business area etc.

What I do want to touch upon is the awesome power you have with metrics coming out of automation. Since your tests can run rapidly and often, there are lots of runs that can be measured. In other words, you can gather a lot of data, a lot of historical data. When reporting on metrics like amount of tests passed versus the amount failed, it generally will be a snapshot of some test run. Why limit the metric to a snapshot when you have living data at hand?

The strongest metric to show to any manager is trend lines; you need to report on the amount of tests passed vs failed or the amount of tests added to the automation suite? Need to report metrics on code coverage? All of these metrics can result in a trend line. Show the “upwards trend” and managers are generally happy without even knowing what they are looking at.

There are of course some pitfalls, the main one I have made was having a downwards sloping trend line. That seems like a bad trend, even though it can be a totally perfect trend, the sight of a trend line going down generally makes managers nervous, they expect things to always go up.

Be prepared to explain a downwards trend, cause sometimes you cannot escape a downwards or flattening trend line!

Graph examples

Below are two graphs, both with the same data, and a trend line set on the same data. The three charts however, when looking at them each tell you a slightly different story due to the style of trend line chosen for the chart.

Upwards trend

Making the numbers seem a bit more positive than they really are by using an exponential trend line.

The exponential trend line paints a strong picture, however when using it, be prepared to explain the fact that despite the lack of growth at about two thirds of the graph, the trend is still upwards. This is a difficult story to tell.

Linear trendline

The linear trend line gives an indication of the overall trend, when close to flat-lining you know you have a problem, when it is too steep however you also may have a problem!

The linear trend line is one usually understood well by most people, at least in my experience. It shows the gradual, overall progress being made on your metrics. Since it is a straight line, quite often questions about what happened in a “dip” period can be prevented.

Since there is an abundance in data, if you have setup your automation properly, there is also the possibility to combine data. Such as setting off the trend of passed/failed to the trend of new tests added, or even more interestingly, to new functionality added to the system under test.

Be aware!

One big warning though, when playing around with the numbers you may be tempted to make them look nicer than they are or focus on the good things. However tempting this may be, don’t prettify your numbers or graphs, make sure the always paint a true story. If you manipulate the graphs, you are not only trying to fool your manager, but also yourself. Metrics should be useful for you as well as for the managers.

In a follow up post I am currently working on I will give some more clear examples of mashing up data into a useful automation report and how to interpret/present the data given specific contexts.

Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: “There are three kinds of lies: lies, damned lies and statistics.”
– Mark Twain’s Own Autobiography: The Chapters from the North American Review

–Edit–

A follow up on this post can be found here: Test automation metrics – mashing up non-test data

Are we building shelf-ware or a useful test automation tool?

Frustration and astonishment inspired this post. There currently is a big regression testing cycle going on within the organization, over the past 4 months we have worked hard with testers to establish a sizable base of automated tests, however the moment regression started everyone seemed to drop the automation tools and revert back into what they have always done: open excel and check the check-boxes of the scripted tests.

Considering that we have already setup a solid base with a custom fixture enabling the tests, or checks if you will, to do exactly what the tester wants them to do and do what a tester would do manually whilst following the prescribed scripts, and having written out, in FitNesse, a fair share of these prescribed scripts, what is stopping them from using this setup?

Are we automating for the sake of automating?

While working on this, extremely flexible, setup with FitNesse and Selenium WebDriver and White as the drivers I have started wondering more and more why we are automating in this organization. The people responsible for testing do not seem to be picking up on the concept of test automation, they are all stating loudly that it is needed and that it is great that we are doing it, but when regression starts they immediately go back to manual checks. I say manual checks on purpose since the majority of testing here is done fully scripted, most of these scripts do not leave anything to the testers imagination, resulting in these tests being checks rather than tests. Checks we can execute automatically, repeatedly and consistently with tools such as FitNesse.

How do you make testers aware that a lot of the scripted tests should not be done manually?

Let me be clear on this, I am a firm believer in both manual and automated testing. They both have their value and should be used together, automated testing is not here to take away the manual testing, rather it is here to support the testers in their work. Automated testing should be complimentary to manual testing. Thus far in this organization, I have seen manual testing happening and I have seen (and experienced) a lot of effort being put into writing out the automated tests in FitNesse. However there has not been a clear cooperation between the two, despite the people writing the automated tests being the same individuals who also are responsible for executing the manual tests (which they have rewritten into FitNesse in order to build automated tests).

We have tried coaching on the job, we have tried dojos, but alas, I still see a hell of a lot of manual checks happening instead of FitNesse doing these checks for them. What is it that makes people not realize the potential of an automation tool? Thus far I have come up with several possible causes

  • In our test-dojos we mainly focused on how to write tests in FitNesse rather than focusing on what you can achieve with test automation. This has led me to the idea that we rapidly need to organize another workshop or dojo in which the focus should be on what the advantages of automated tests are.
  • Another reason could be that test automation was not initiated by this team, it was put upon this team as a responsibility. The team we are currently creating this fixture for is a typical end-of-the-line-bottom-of-the-testing-chain team, everything they get to test is thrown over a wall and left to them to see if it works appropriately. Most of them do not seem to have consciously chosen to be testers, instead they have accidentally rolled into the software testing field. Some of them have adapted very well to this and clearly show affinity and aptitude for testing, others however would, in my opinion, be better of choosing a different occupation. It is exactly the latter group that needs to be pulling this test automation effort currently going on.
There are more reasons I could go into here, but I believe these two to be the main issues at hand here which can actually be addressed.

So what will make people use automation tools properly?

The moment I can answer this one in a general rule-of-thumb I will sell it to the highest bidder. For within this organization however there doesn’t really seem to be a simple solution just yet. As I have written before, there is not yet one sole ambassador for test automation in this organisation. Even if there is, we will need to cause a shift in the general mindset of the testers. Rather than just walking through their predefined set of instructions in excel, they need to consider for themselves what has already gotten covered in the automated tests, how can I supplement these tests with manual testing?

We will need to find a way to get the testers to step out of their comfort-zone and learn how to utilize tools other than Excel and MS Word. Maybe organizing a testing competition will work, see who can cover the most tests in the shortest time and with the highest accuracy?

I am not a great believer in measuring things in testing, but maybe inventing some nice measurements will help the testers see the light. For example “How often can you test the same flow with different input in a certain timeframe?”.

Did we build shelf-ware or did we add value to the testing chain?

At the moment I often ask myself whether I am building shelf-ware or actually am building a useful automation tool (trying to stay away from terms like framework, since that might only increase the distance between the tool and the testers). Whenever I play around with the FitNesse/WebDriver/White setup we currently have running I see an incredibly versatile test automation tool which can be used to make life a lot easier for those who have to test the software regularly and repeatedly (not just testers, but also developers, product owners etc. can easily use this setup).

It is completely environment agnostic, if needed we can (and have in the past) run the same tests we run in a test environment also in production. It is easy to build new test cases/scripts or scenarios (I seem to have lost track what would be the safe option here to choose, they all have their own subconscious connotations) since it is a wiki. All tests are human readable, if you can read an excel sheet, reading the tests in FitNesse with Slim the way we built it, should be child-play.

Despite all these great advantages, the people that should be using it are not.

Reading all this back makes me consider one more thing; we started off building this setup with these tools based on a request from higher management. The tool selection was done by the managers (or team leads if you will) and not by the team themselves. Did we miss out on the one thing the IT industry has taught us? Did we build something we all want, but not what our customer wants and needs? I hope not, for one thing, I am quite sure this is what they need, an easy to use tool to automate all tedious, repetitive check work.

Question that remains: is this what our customer, or to be more exact, our customers’ end user, the tester, wants?