Test automation metrics – what do you report on?

Metrics

One of the fun things of test automation is that, since you do not have to do all the tests manually, you can spend some extra time coming up with test metrics. Test metrics are tricky to do well in any situation, but in a situation where there is an abundance of metrics, such as in a test automation setup,  the choice of metrics becomes the key first step. What are the metrics to look at? Code coverage? Number of tests passed vs Number of tests failed? Duration of the tests over time? Number passed now vs number passed in previous runs? Newly automated tests added since last run? You can keep going in dreaming up new metrics, but which ones will actually make sense and become representative?? And of course, how do you ensure you do not spend ages ploughing through your data to gather these metrics manually?
Borrowed the image from khanmjk If you just take a test automation tool off the shelf it probably has an immense amount of options to measure on and report on, but the risk is always that you start generating reports and metrics that are not quite representative, or even worse, give a tainted view of the actual situation. So how do you make sure you don’t end up with a jungle of metrics?

Audience

First thing you need to know is who is the audience of your metrics? There is a huge difference in what different levels in an organisation consider useful metrics. One manager can be mainly interested in the time spent automating versus the time won by automating; e.g. the extra time now available for testing other stuff, the stuff that matters, while a test manager might be more interested in the functional areas of the application covered and to what extend they are covered.

Type of metrics

I will not attempt to dream up the perfect metric, for every environment and situation one metric might be better than the other. It all depends on the context, the persons you are reporting to, targets of each particular business area etc.

What I do want to touch upon is the awesome power you have with metrics coming out of automation. Since your tests can run rapidly and often, there are lots of runs that can be measured. In other words, you can gather a lot of data, a lot of historical data. When reporting on metrics like amount of tests passed versus the amount failed, it generally will be a snapshot of some test run. Why limit the metric to a snapshot when you have living data at hand?

The strongest metric to show to any manager is trend lines; you need to report on the amount of tests passed vs failed or the amount of tests added to the automation suite? Need to report metrics on code coverage? All of these metrics can result in a trend line. Show the “upwards trend” and managers are generally happy without even knowing what they are looking at.

There are of course some pitfalls, the main one I have made was having a downwards sloping trend line. That seems like a bad trend, even though it can be a totally perfect trend, the sight of a trend line going down generally makes managers nervous, they expect things to always go up.

Be prepared to explain a downwards trend, cause sometimes you cannot escape a downwards or flattening trend line!

Graph examples

Below are two graphs, both with the same data, and a trend line set on the same data. The three charts however, when looking at them each tell you a slightly different story due to the style of trend line chosen for the chart.

Upwards trend

Making the numbers seem a bit more positive than they really are by using an exponential trend line.

The exponential trend line paints a strong picture, however when using it, be prepared to explain the fact that despite the lack of growth at about two thirds of the graph, the trend is still upwards. This is a difficult story to tell.

Linear trendline

The linear trend line gives an indication of the overall trend, when close to flat-lining you know you have a problem, when it is too steep however you also may have a problem!

The linear trend line is one usually understood well by most people, at least in my experience. It shows the gradual, overall progress being made on your metrics. Since it is a straight line, quite often questions about what happened in a “dip” period can be prevented.

Since there is an abundance in data, if you have setup your automation properly, there is also the possibility to combine data. Such as setting off the trend of passed/failed to the trend of new tests added, or even more interestingly, to new functionality added to the system under test.

Be aware!

One big warning though, when playing around with the numbers you may be tempted to make them look nicer than they are or focus on the good things. However tempting this may be, don’t prettify your numbers or graphs, make sure the always paint a true story. If you manipulate the graphs, you are not only trying to fool your manager, but also yourself. Metrics should be useful for you as well as for the managers.

In a follow up post I am currently working on I will give some more clear examples of mashing up data into a useful automation report and how to interpret/present the data given specific contexts.

Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: “There are three kinds of lies: lies, damned lies and statistics.”
– Mark Twain’s Own Autobiography: The Chapters from the North American Review

–Edit–

A follow up on this post can be found here: Test automation metrics – mashing up non-test data

Throw-away test automation

I quite often tell clients that their approach to test automation is not sustainable, this got me thinking, does test automation always have to be sustainable and reusable?

This all depends on the goal you’re trying to meet I guess. If your goal is set for long-term cost efficiency, shorten the timelines on regression testing and through that get to a more rapid release cycle, yes you will need to be focussed on the re-usability of your automation suite.

However there are plenty of instances where you want to automate something to make life easier right here, right now. Most testers, I hope, know the feeling of having to go through tedious, repetitive work, setting up data for a test, going through the login flow of an application to get to the feature you want to test etc. For actions like that, you can very well use automation. In fact, you can quite often use the most simple form of automation, record/playback without having to adjust your scripts for maintainability or re-usability.

Tools like Selenium IDE and AutoIt are excellent for making life easy right here, right now when you need to quickly automate something to make life easy. Funnily enough a lot of testers either do not realize these tools can be used to make life easier. When talking with colleagues about test automation they quite often think of big test automation packages, like QTP or Rational Robot. Sometimes they ask how much you need to know of software development and writing code to automate things. And most of these conversations I let myself be sucked into the tool talks and indeed talk about the difficulties of setting up a test automation framework.

In future conversations I am going to try to explain my colleagues and fellow testers that automation does not need to be a big operation, it doesn’t need to be reusable and maintainable, at least, depending on your goals. As long as your goal is to make life easy here and now, there is no need to build something awesome.

For a lot of things, a simple script, either hand written or simply recorded, can be more than enough to get to your goal, when done with your tasks you can then throw them away, but preferably be a bit smart about it and just dump it somewhere in a repository, you might have to do this task again.

All my automated tests are passing, what’s wrong?

The other day I had an interesting discussion with some developers within a scrumteam I am part of. We were discussing the use of test automation within the team and how to deal with the changing requirements, code and environments, which would lead to failing tests.

They gave the clear impression they were very worried about tests failing. I asked them what would be wrong with tests failing in a regression set during sprints, which led them to look at me with a question in their faces: Why would a tester want tests to fail??
If anything I would automated tests expect to fail, at least partially.

While automating in sprint I’m assuming things to be in a certain state, for example I’m assuming that when I hit the search button nothing happens, in the next sprint I really hope this test will break, meaning the search button is leading me to some form of a result page. That way all tests are, just like the rest of the code, continuously in flux and get constantly updated and refactored.

This of course fully applies to automating tests within a sprint, when automating for regression or end-to-end testing however, I would rather expect my tests to pass. Or at least the majority of regression tests should keep passing consistently.

When is automation useful?

Recently I helped write a proposal aimed at investigating whether or not to automate regression testing of a system is worth the investment. Looking at their current time spent in regression testing I started wondering when it is worth automating the hell out of everything and when it is just eating money on being hip to have tests automated.

I guess it all depends on a combination of things, such as time spent in regression testing, effort and investment needed to automate things, knowledge about test automation in house (e.g. does an external need to come in and do it for you?), how often do you need to run through regression on the platform, can all of regression be automated or just a part of it etc.

So, seems like you have to be a bit of a mathematician to see if test automating is worth your while, doesn’t it?

Let’s get realistic then instead.

I truly believe the easiest way to see whether test automation is your piece of cake, there’s one very effective way to do it: set a clear goal on what you want to achieve with test automation and based on that do a Proof of Concept. In other words, try it out. This is assuming you have the knowledge to write the code for test automation in house.

Start off with a little bit of research to figure out which tool will quite likely work best for you, there are plenty of Forums out there, or simply ask some fellow testers out there. Take the tool, or for all I care the demo version of some tool, you hope will work for you and try it out. Automate a few tests, just to see how fast it can be done. Do not try to make it all pretty, perfect and reusable immediately. First focus on what’s important: will it work? Does it help me get to my goal?

If it does, cool, now you can start for real, refactor the PoC-code or simply throw it out and start from scratch.

This of course becomes a slightly different story if you do not have the knowledge in-house to write the code yourself, or if you simply have no clue where to start with test automation.

If this is the case, please do not adjust the approach all that much. It is still the best, in my view, to let someone prove to you that their approach and their choice of tooling indeed is the best for your application and organisation. Keep in mind that at the end of the line, you still need to be able to use and maintain the tool and automated tests yourself, in-house. Having some contractor for that running around constantly makes the test automation effort an extremely tedious and expensive one.

Automating a legacy system

In the past I have worked a lot with fairly new, web-based applications. So here and there I would run into some “old” desktop applications, but the majority of my professional career has been, working with web-based technologies.

In my current assignment however I have hit a nice new technology, at least new to me: COBOL over TN3270

When I started on this assignment I had one big fear: being forced to use one of the bloated commercial tools out there which claim they support TN3270 off the shelf. Oh, did I say bloated? That sounds like I am biased towards the so called big commercial tools.

This sense of bias is partially true, I am in favor of using small drivers like WebDriver or White where possible over using a full environment such as QTP or Rational Functional Tester. The big tools quite probably have  their own unique selling points, however for a small IT organisation they are a huge investment and quite often overkill compared to what is actually needed.

When we started looking into tools we almost instantly dismissed the “big ones”, not so much due to their cost, but mainly due to the amount of customization still needed to make them work “out of the box” with TN3270 and the rest of the environment we’re automating. When talking to the vendors they all claimed that their tool is excellent to use for an environment for this, when asked however whether the tool will be pluggable out of the box the answer was consistently no, you will need to do some customization (or better yet, one of their consultants would need to do some customization). This same amount of customization will need to be done with a small driver as well, so why spend a fortune on something that is no better than a small, cheaper or even free tool?

For the Proof of Concept we decided to take two different drivers: Jagacy and s3270.

Jagacy is in their own words:

“… our award winning 3270 screen-scraping library written entirely in Java. It supports SSL, TN3270E, and over thirty languages. It also includes a 3270 emulator designed to help create screen-scraping applications. Developers can also develop their own custom terminal emulators (with automated logon and logoff). Jagacy 3270 screen-scraping is faster, easier to use, and more intuitive than HLLAPI. It excels in creating applications reliably and quickly.”

I cannot but agree with especially the latter part: it is indeed faster and easier than HLLAPI. In terms of more intuitive, I am not quite convinced (yet).

s3270 is a lot more simplistic than Jagacy:

” opens a telnet connection to an IBM host, then allows a script to control the host login session. It is derived from x3270(1), an X-windows IBM 3270 emulator. It implements RFCs 2355 (TN3270E), 1576 (TN3270) and 1646 (LU name selection), and supports IND$FILE file transfer.”

In other words, s3270 can be used as a layer between an application and a legacy TN3270 system.

Talking directly to s3270 is not the most intuitive thing, this requires the code to start a process, keep track of the state of this process and throw input into this process stream. All nice and doable, however it was already invented before by others and they quite likely did an adequate job at making this work, considering TN3270 is a fairly old protocol.

In order to make life for us easier while using s3270 or the PoC, we searched around a bit and found a wonderful open source tool h3270. Put simply, h3270 is a program that allows you to communicate with tn3270 hosts from within your browser. This open source tool helped us quickly hook up our test code to s3270. We dit not however need this browser functionality but are thankfully using the object orientated abstractions within h3270 for our own purposes

One of the wonders of working with tn3270 is that a whole new world of states opens up. In a web-browser for example, something is either there and visible or it is hidden from the user. Keyboards are never locked, when opening a new page the browser will tell your driver when it is done loading. With tn3270 it is not all that simple. First of all, there are features like a locked keyboard, it works with function keys that have disappeared from our physical keyboards years ago, the state of a screen is unknown most of the time, etc.

On the other hand, a terminal emulator is still nothing other than a form of sorts, just a bit more difficult to read and write.

Basically one of the things we once more saw proven, is that in test automation the basics are always similar. It is just the way you need to talk to the system that may be different.