Objects First


10. Testing and Verifying Programs

A program is useless unless it is:

reliable it performs as it was expected to:

formally we say it runs in accordance with its specifications

and
robustit doesn't terminate prematurely!
Colloquially, programs which terminate prematurely are said to crash.

10.1 Program Proving

If we could prove that programs are correct in the same way that we prove a theorem in algebra or geometry, then that modern excuse for all problems:
It was an error in our computer!
would have to be consigned to the scrap-heap.

Unfortunately, although considerable progress has been made in the area of formal methods - strategies for designing programs which are correct by construction ie they were constructed from the problem specification in a way that guarantees that the final program is correct, these methods are still time-consuming and labour-intensive (and therefore costly), and, because they are carried out by humans, still subject to the same errors that programmers make. As a consequence, although they can be usefully and economically applied in a widening area as techniques (and particularly support tools) improve, we mostly rely on testing programs after they are written in order to determine whether they are reliable and robust.

How do we test a program?

Our experience with the first assignment has already taught us one way not to test programs!
Don't rely on random test data input by a user!
The reason for this is very clear and simple: a user inputting test data may neglect to enter data which corresponds to some very important case (eg the case where we don't have sufficient funds in our bank account to cover a withdrawal). Since a complex program will contain many such distinct cases which need to be checked as part of the testing process, then the probability of a tester forgetting a vital case rises dramatically with the size of the program under test.

Furthermore, every time the program is changed - whether it be in the initial debugging phase or later (perhaps months or years later) when it needs maintenance - all the tests should be re-run to ensure that a change to fix one problem hasn't inadvertently created errors elsewhere! In a complex program with thousands of cases to be tested, a human tester is simply not going to have time, let alone the patience, to enter all the test data needed to perform a thorough test again.

10.2 Test Programs

Thus, in order to ensure that all required tests are carried out, these tests should be applied mechanically by a program. Not only does this ensure that no important test is left out, it is quick and efficient - a few hours of computer time to run a few million tests is cheap compared with a programmer's time - or the cost of an error in a production system!

Where does the test data go?

Depending on the nature of the tests which need to be carried out, the tests can be 'driven' by: In what follows, I shall assume that we are testing sets of functions which are the methods of a class. However, the strategies outlined here can readily be extended to testing other functions and whole programs. A whole program can simply be viewed as a function which takes various user data (from keyboard input, files, databases, etc) and produces some output (onto a terminal screen, into files, databases, etc).

In every case, a basic test driver program is written which applies the necessary tests to the functions of the class which you are testing.

Program Statements

These tests can be applied by separate program statements:
#define A1  10.00
#define W1  5.00
#define W2  20.00

  /* Create a bank account */
  account = ConsAccount();
  /* Make a deposit */
  balance = Deposit( account, A1 );
  /* Test withdrawals */
  balance = Withdraw( account, W1 );   /* Should succeed */
  assert( balance == (A1-W1) );
  balance = Withdraw( account, W2 );   /* Should produce error message */
  assert( balance == (A1-W1) );        /* Balance should not change */

If there are only a small number of cases or the cases are complex - involving complex rules based on many attributes - then this approach is fine. However, it may require considerable effort on the part of the programmer and table or file data is to be preferred when it's practical.

Tables

When the cases to be tested involve a large number of values of individual attributes, it is usually better to put the values to be tested in tables inside the program or files which are read by it:

Click here to load a sample test function in a separate window.

Note:

  1. The structure test_table has the values for the test to be applied and the expected answers (expected_balance and error_expected).
  2. The size of test_cases array is defined by the initialisation data (using [] for the array dimensions): this enables additional tests to be added as necesary with no changes to the rest of the test function!
  3. N_W_TESTS is set by using the sizeof operator on the array and the struct's which are its elements.
  4. The tests are run in the for(i=0;i<N_W_TESTS;i++) loop. The same code is used for each test, so that additional tests are added with no changes to this loop.
  5. The value returned by the Withdraw function is checked against a value in the test array - enabling the test function to automatically detect and count errors. Thus this function produces no output when there are no errors; it simply returns a 0 to its calling function. This has an important testing efficiency consequence, the tester does not have to wade through reams and reams of test output - the program only needs to produce error reports when errors occur; otherwise it is pleasantly silent!
  6. A function AlarmRaised is assumed to be able to detect whether the error required by the specifications was raised: this is dependent on operating system capabilities and thus, being non-standard, is beyond the scope of this course.
  7. (For pedants only!) It's assumed here that the tester wants to count balance and alarm errors as separate errors adding to the total error count. This is a practical assumption, because the really critical thing about the error count is whether it's 0 or something else. Slightly more complex code could be used to either This is left as a programming exercise to those concerned!

    Style Reminders

  8. Note that all the constants used here are symbols! This facilitates the generation of the result expressions, A1-W1.
  9. Strictly, comparing two doubles for equality will not always work. The test should strictly have been:
    if ( abs(balance - test_cases[i].expected) > EPS ) ...
    
    with a suitable value for EPS. The example code was left in the simpler (but incorrect) form for clarity.

Files

Just as test values can be placed in tables, they can also be placed in files. Usually this testing strategy is essentially identical to the table one - the data that would be stored in the testing table is stored in files instead. Thus in the example above, we would create a test file:
 5.00 15.00 F
10.00 15.00 T
In this format, F is read a FALSE and T as true.

Click here to load the test function modified to work with a file in a separate window.

Note:

  1. The main advantage of this approach over the table method is that a file is external to the program and thus may be modified in a single step with a text editor.
  2. The test loop is essentially identical to the one using tables - as might be expected.
  3. The read_test is somewhat more complex: the extra complexity here is the trade-off for the convenience of being able to modify the test cases without needing to reconstruct the program.
  4. read_test returns FALSE when the test file is exhausted: thus the number of tests performed is readily altered by adding or deleting entries from the 'driver' file.

Continue on to Choosing the Test Data

Back to the Table of Contents


© John Morris, 1997