Software testing

(Article from 2013.)

Testing is often put forward as a way to find bugs at an early
stage. It requires little effort and pays off huge: not having
to retract shipped copies or publish patches. It is the best
way of achieving the goal of having software that is bug-free.

This holds when testing is compared to Formal Verification,
the more scientific approach (rather than engineering, where
is testing). With verification, a huge analytic effort
produces a result that can be hard to grasp. By contrast,
testing reveals bugs that can be fixed instantly,
upon detection.

Also, testing tests the real thing. Verification on the other
hand requires a model which itself may be wrong. Upon success,
that only proves the model to be correct, not the
actual application.

The key aspect to testing is to actually do it. Already at
that point there is a big advantage compared to not testing at
all. Beyond that, it is uncertain that more refined methods
produce better results. But "more refined methods" can be
a way to "just do testing"!

Consider the volume factor: if a simple test method can be
employed massively, it is probably preferable to a more
refined method, that only covers patches of the test field.

If several test methods are employed, each test should have an
explicit purpose and a distinct scope. In practice, one would
make a list of tests and have them all automatically and
sequentially invoked by a script or shell function. And don't
forget the README file!

For example, one test could enforce that every line of source
code is executed at least once. Most likely this will require
several runs of the program. To achieve this, there are
benefits in modular code: each function should be called at
least once, and the return value fetched and examined.
Each procedure should be invoked and brought to conclusion.
And each interface should be covered in full, including
optional parameters. While it isn't guaranteed every line will
be reached even so, it is a good start.

Beneath that, it gets more fine-grained, as the control logic
- iteration and branching - must be taken into account.
To help humans visualize the execution, a directed, cyclic
graph could illustrate the execution logic and flow.
But unless a tool can automate that we are again facing the
risk of creating a model that may not reflect the
real program.

The test that every line of code can execute sensibly is
intended to track bugs that are not syntactical, so thus will
compile, but, once executed, will either bring a halt to the
execution or worse, further down the road produce
a bogus result.

Another way to test focuses on input data. This method is in
line with the notion of a piece of software as a black box
that maps input to output values. Automatic, brute force
testing with random input data is a fine way of doing it.
Input data must be valid, but must not necessarily make sense:
with volume and time, what makes sense will be tested as well.
And bogus input data should never be allowed to break the
program, anyway.

If a more refined approach is desired, testing could be based
on input cases that are qualitatively distinct. One would
setup these manually if need be. For example, with the data of
a student database, such cases could be the empty set (zero
students), a singe student, all students, only female
students, only students of a certain subject, and so on.
Cases that might strike you as unrealistic or even impossible
should not be avoided as long as they are valid. On the
contrary, those border cases can reveal shortcomings that
sensible input cannot. Indeed, the purpose of testing is to
break the examined application - thus revealing the bug that
caused the failure.