Wednesday, November 30, 2011

Manual Testing Radiator

Two days ago Finnish Association for Software Testing - that I just happen to be the chairman for - had an Exploratory Testing Dinner. The idea is to eat and talk testing, with some volunteering to talk on a topic, and then let others join in.

One of the three talks this time was by Petri Kuikka from F-Secure. He's my testing idol, someone I respect and admire, having worked with him for a few years at F-Secure. He shared a story on a manual testing radiator, that I just need to share with others even before they manage to put the system open source as they've planned.

In the years I worked at F-Secure, I brought in borrowed idea of a testing dashboard, adapted from James Bach's materials. I introduced the idea of area architecture based reporting on two dimensions, Quality and Coverage. We used traffic lights for the first, and numbers 0-3 for the latter. In some projects, I used a third entity, best-before commitment, which basically was an arrow showing if we'd need to say I'll change my mind about quality due to changes by tomorrow / a little longer / relatively stable. To collect the values, I used to collect people into rooms for thumbs up / down voting, making the other's opinions visible and just learning to report feelings based data in a consistent way. I learned that power of a crowd outweights any metrics-based reporting in delivering the message.

The problem that I never tackled but Petri and other current F-Secure employees have tackled, is keeping this style of reporting up to date, when you have continuous integration, with changes introduced from many teams, for areas separate and common.

For the continuous integration build system, they have a status radiator for automated tests coming directly from the tool of their choice. If a build fails, the visual radiator shows which part is failing, and when all is well (as per test automation being able to notice), the whole picture of parts remains blue.

The manual testing radiator does the same, but with a facebook-like thumb-voting icons. If a tester clicks thumbs up, that's counted as a positive vote. Positive votes remain valid only for a limited timeframe, as with continuous changes, if you did not test an area for a while, it's likely that you really know little of the area anymore. When several people vote thumbs up, the counts are summed and showed. Similarly, there's possibility to cast to kinds of negative votes. With the thumbs down icon, the tester gets to choose if the status is yellow ("talk before proceeding, concerns, bug report ID") or red ("significant problem, here's the bug report ID") . If a negative vote, even one, is cast, the positive votes counting goes back to zero. All the votes are saved in a database, this is just the visible part of it in the radiator.

Another type of a radiator they use is showing progress on statuses on regular intervals. Like in battleship game, they mark down where they feel they are with product releases and end-of-sprint versions. This is to show that not tested does not mean not working, but it means we really don't  know. With scrum and automation, the amount of exploratory testing is still significant and guided by time considerations.

Petri was talking about having the automated and manual testing radiators, on visible locations at office, side by side. I would imagine that causes positive buzz towards the results that exploratory testing can deliver, that are hard for the automation - assuming that there's stuff that exploration finds on areas that automation claims that is working fine.

Friday, November 18, 2011

Falling into a trap - talking of Exploratory Testing

I attended the public defense of dissertation for Juha Itkonen today. His doctoral dissertation with the title "Empirical studies on exploratory software testing" is a topic particularly close to my heart, since it was the topic I started with but failed to continue striving for in the academic sense. Juha started in the same research group after me, took the relevant topic and made all the effort to research and publish what he could in the limited timeframe (of nearly 10 years).

When Juha's opponent started, he went first for the definition of exploratory testing and distinguising exploratory testing and non-exploratory testing. This, I find, is a trap, and an easy one to fall into.

Juha explained some of the basic aspects of what he has summarized of exploratory testing, but struggled to make the difference of exploratory vs. non-exploratory. He pointed out that learning is essential in exploratory testing, and changing direction based on learning. And the the opponent pointed out that typically people who test based on test cases also learn and add new tests as needed based on what they learned. Juha went on explaining that if you automate a script and remove the human aspect, that is clearly not exploratory. And the opponent pointed out that a human could look at the results and learn.

What Juha did not address in his defense, or in the papers I browsed through, is the non-linear order of the tasks. He still works on the older definition of simultaneous activities, and thus, in my opinion, misses a point of tester being in control what happens first, after, again and for how long each of the activities endure to achieve a goal.

A relevant challenge (to me) is that all testing is exploratory. There's always a degree of freedom, tester's control and learning when you do testing. Trying to make an experienced tester follow a script, even if such existed, is not something that happens - they add, remove, repeat and do whatever needs to be done, including hiding the fact that they explore. All real-world approaches are exploratory testing to a degree.

I still have a problem to solve: if all testing is exploratory, what words should I use to describe the wasteful, document-oriented approaches, that take a lot of effort for creating little value. There's a lot of real-world examples where we plan too much, what we plan is the wrong things as we do the planning at time we know the least. And what's even worse, due to be belief in planning, we reserve too little time for the hands-on testing work and related learning, and with the schedule pressure, fail to learn even when the opportunity supposingly is there.

Low quality exploratory testing is still exploratory testing. Getting to the value comes from skill. And with skill, you are likely to get better results with some planning and enough exploration-while-testing, than skipping the planning. And yet with skill, you could use too much time on planning and preparing, due to logistics of how the software arrives to you.

Ending with the idea from the opponent for today's dissertation: the great explorers (Amundsen - south pole / example coming from a norwegian professor) planned a lot. What made the difference between those that were successful and those that were not, was not whether they planned or not, but that they planned holistically, thinking of all kinds of aspects, and keeping their minds open for learning while at it - making better choices for the situation at hand and not sticking to the plan like carrying back stone samples at cost of one's life.

Saturday, November 5, 2011

Management: 80 % feels better than 50 %

I never feel too good going back to Quality Center for the tests we just did, but this time we managed to use it so that it did not cause us too much trouble.

Our "test cases" in the test plan, are types of data. A typical test case on average seems to have 5 subdata attached to it. All in all, a lot of our planning was related to understanding what kinds of data we handle in production, what is essentially different and how we get our hands on that kind of subset.

We had two types of template tests in use for the steps:
  • Longer specific process flow descriptions: basically we rewrote the changes that were introduced in the specification in detail in minimum number of workflows, and ended up with about 40 steps, and 5 main flows.
  • Short reminders of common process flow parts: these had 5 steps and just basically mention the main parts that the system needs to go through, little advice for the testers.
Our idea was, that for the first tests we run, we support the newcoming testers with the longer flows. And when they get familiar (they have the business background, it doesn't take that much), we enable them to do things in exploratory fashion with the feeling of being in service role towards own colleagues - it's an internal system.

Before we started testing, I took a look at the documentation, numbers and allocated time, and made a prediction. We'd get to 50 % measured against the plan in the timeframe. After the first our of five weeks, we were at 4 %, and I was sure we'd stay far behind my pessimistic prediction.

We also needed to split the tests in Plan to 5 instances of the test in Lab, to make sure we would not end up showing no progress or being inable to share the work in our team, as one of the things to test were potentially a week worth of work. We added prioritization at this point, and split things so that one of the original test cases could actually be started on week one to be completed on week 5 - if it would ever be completed.

We did our finetuning while testing, moved from the detailed cases (creating more detailed documentation on what and how we were checking things) to the general ones, and with face-to-face time talking on risks we cut down each actual test run in different way, basically building together understanding of the types of things we had already seen and could go with the risk that it may not work but we won't bother, since there's stuff that is more relevant to know of right now.

With notetaking, I encouraged the testers to write down stuff they need but taught that blank means nothing relevant to add. They run the tests in QC lab and made notes there in the steps. The notes we write and need are nowhere near the examples that go around session-based test management.

The funny part came from talking with some of the non-testing managers, when they compared my "50 % is where we will get" to the end result of getting to 80 % as the metric was taken out from quality center. It was (and still is) difficult to explain that we actually did end up pretty much on the level I had assumed and needed to cut down about half of what we had planned for, we had just decided we'll cut out random amounts of each test, instead of something that they decided to assume was comparable.

Yet another great exploratory testing frame with the right amount of preparation and notetaking. And yet, with all the explaining of what really happened, management wants to think there was a detailed plan, we executed as planned and and the metric of coverage against plan is relevant.

Friday, November 4, 2011

Ending thoughts of an acceptance test

I spent over a year in my current organization to get to the point where my team completed a project's acceptance test the first time. Now, down to little over two years, a second project's acceptance test is ending.

I'm a test manager in quite a number of projects that are long enough that I feel that most of my time goes into waiting to actually getting into testing. Two more to complete in upcoming months, and another longer project just getting started.

I wanted to share a few bits on the now-about-to-be-finished acceptance testing.

Having reached this point, I can reflect back a month and admit, that I believed our contractor would not be able to find the relevant problems. I had reviewed some of their results, and noted that
  • system testing defects were mostly minor and there were not that much of those
  • it took about 4 days to plan & execute a test on average, 1 day to fix a bug when found during system testing
  • it took about 2 days to plan & execute a test on average, 0,7 days to fix during integration testing
On setting up the project, I had tried convincing steering group to let the customer side testers participate in the contractor testing phases in testing role, but was refused. We wanted to try out how things go when we allocate the responsibility over to the contractor side. So I wasn't convinced there would not be a number of "change requests" - bugs that the contractor can't or don't care to find, since they are not to be read directly from specifications or problems that you just can't read from spec directly.

Now that we're almost done with our testing, I can outline our results. We found a small yet relevant portion of problems during the project. I had estimated that if I can count number of bugs to fix with one hands fingers, we'll be able to do this in the allocated one-month timeframe, and that's where we ended. Another batch was must-have change requests, bugs with just another name. And we logged about 80 % of issues, that reflected either bugs in current production (and mostly those we don't fix, although it's not so straightforward) or problems in setting up the test scenario for the chain of synchronized data. Just counted that we used on finding issue on average 4,6 days. If I take out the ones that did not result in changes, 20 days to a relevant bug. I have no right to complain about the contractors efficiency. And I better do them right, and make our management clearly aware of this too.

So, I guess I have to admit. The contractor succeeded even though I remained sceptic up until the very end. I had a really good working relationship with the customer side test manager, and at this point I'm convinced she is a key element in this experience of success and another ones still going on, being over schedule with still relevant concerns on whether the testing we do on customer side will ever be enough.

Just wanted to share, amongst all the not-so-fortunate stories I have, that this one went really well. The only glitch during the project was at the time when our own project manager needed planning support to take the testing through. After that, she was also brilliant and allowed me to work on multiple projects on less managerial role but more of as a contributor to the actual testing.