Saturday 4 February 2017

Beware tests that don't test what you think they do

Migrating tests from one framework to another we came across a test suite that taught us a valuable lesson, treat your automation with a level of distrust, unless you have valid reasons or proof that it is doing what you expect.

This test suite was seemingly perfect, it was reliable, ran against all the supported releases and had a run time of 5 minutes.  It's name was simply the name of the technology that it tested.  The only problem was that if it was going to be used as part of our CI pipeline then it would need to migrated to our new test framework.

Most of the components of the test were easily reusable, all that needed to be migrated was the code that provisioned the test system and invoked the test programs.  This took a few days of effort and I was left to review the new migrated test suite.  Everything looked fine.  The only issue that I had was that the test method names were a little un descriptive.  I spoke to the engineers and requested that they ran the test in a debug mode to understand exactly what the test did.  I didn't want them to do any static analysis of the source, but to see what the test did at runtime.

A day later I met with the engineers to see how they were progressing,  they were hitting problems.  As fair as they could see the test never invoked any of the APIs that they would expect given the name of the test suite.  I asked them to show me and I had to concur.  It did appear that the test didn't use the technology we were expecting.

I gave the engineers a list of diagnostics and further tests to double check this finding.  After this was done it was clear the test was simply not testing the function that we thought it did.

The problem with this test is clear.  There had been an assumption that the test 'did what it said on the cover' and that since it always passed it was considered a great test asset.  In fact this was probably the worst test we had.  It built a false level of confidence in the team and could have let regressions into the field.

Naturally we have deleted all evidence of this test suite from all source code repositories and test archives.  However even this action was not without complaints.  A lot of the team felt that even though it was clear the test didn't do what we expected it did do 'something' and so we should keep it. This is the wrong thing to do!

I agree that the test did execute some of the SuT function.  However the code it did execute it only exercised and did not test it.  If the test found a regression would it report the problem or not?

Because the test always passed it wasn't until we decided to migrate it that this problem was uncovered.  Tests that fail regularly (either due to a regression or a test failure) at least get eyeballs on them.  These checks implicitly validate that the test does something of valid use.

So what did we learn from this:

  • Code coverage is great at ensuring that a test is at least executing the code you think that it is.
  • If the test is executing the code you expect and it regularly passes it is worth checking that it is testing the code and not just exercising it.
  • We needed more tests in this area - we have now done this and ensured that they actually do what we expect.

No comments: