I’ve never been good about writing tests for my code, or rather as I’m coding. Whenever I do sit down to write tests I feel like it is taking too much time and providing only limited benefit. The only real benefit I have found from testing is that trying to write testable code almost always improves my design and implementation. So why do I find actually writing tests to be so difficult and frustrating? My guess is that the answer has two parts. First of all, it is because I’m wasting my effort on low value tests. Because I haven’t dedicated enough time to learning how to test it is easy for me to fall into testing trivial functionality in isolation, when such trivialities would be better left to be exercised by more sophisticated tests. A good solution to this may be to focus on testing isolated units of functionality, rather than isolated units of implementation. Second, I haven’t written enough tests to truly become proficient with any testing framework other than pytest; this leads me to spend so much time trying to figure out the “right” way or the “best” way to implement my tests that I get stuck on trivialities like fixtures instead of focusing on designing and implementing actual tests.
To kick off my education in writing better tests I’m going to read and understand the EUnit tests written by Fred Hebert for testing the ppool application in Learn You Some Erlang. I think this is a good test suite to look at because it falls nicely in the space of the sort of tests I should be writing all the time; it isn’t a tutorial on testing about EUnit, but it was written with novices in mind (I assume). It also isn’t an extremely rigorous test suite of tests built for an important project, but rather just something used to get stuff done.
What gets tested?
- A pool server can be started and registered under a name.
My reaction was that this test is unnecessary. Checking that the server can be started seems like the kind of thing that is implicitly covered by all the other tests that rely on a running server. However, because the server must be registered at a name provided by the user a test that checks the registration of the new server seems useful. Furthermore, checking that the name is unregistered when the pool stops is also an important aspect of this test.
- Pool processes run the function provided at start time with the arguments provided at call time.
This test relies on the pool_nagger
module implemented to
demonstrate the pool application. Since the pool requires an mfa()
term to run, we have to give it a module. My inclination in the past
has been to implement the API in the test module and pass the test
module to the code under test. That can get pretty messy, but it does
reduce the number of files that need to be kept in sync. I’m not sure
what the best approach is for a “serious” application, but using a
small demo module seems very reasonable for this quick example.
- The pool accepts new requests to run workers only until the specified pool size is reached, then new requests are refused.
These two tests are a good example of testable code. Providing a synchronous API as well as an asynchronous API makes testing this simple. Of course the two APIs don’t just exist for testing, but writing this kind of test while designing would force you to consider what kind of feedback you are providing to your clients.
- After a pool worker process in a fully allocated pool exits, a new process may be allocated.
Here the synchronous API is used again, but delays are added so some pool workers will finish before the third task is allocated.
- Requests to start workers can be queued if there is no worker available.
Yet again, the synchronous API makes testing this much simpler. In
test_async_queue
the synchronous ppool:run/2
function is used to
assert that the first two asynchronous requests ran immediately
because the queue was not full. Then, knowing the precondition for the
test is satisfied (the queue is full), the final asynchronous call can
be executed. If it succeeds we know that it was queued, then popped
off the queue and executed once one of the previous two workers
exited.
The second part of this test doesn’t use the ppool:run/2
call to
check the test preconditions, but those were already established as
working in the previous test. Tasks are added through a mixture of the
synchronous and asynchronous APIs, but not much can be said about
whether they are working correctly or not. For example, the first
synchronous call should return immediately, but the final one should
block. I’m not sure how you could test that, other than by checking
the time before and after the call and comparing it to the time
required for the first synchronous call to return.
Alternatively, you could submit jobs that would block the ppool
longer than the default timeout for gen_server:call/2
, then assert
that the final ppool:sync_queue/2
call exits with reason
timeout
. This actually raises a great question about what the
correct behavior should be under these circumstances. Should the job
run anyway? Since that corner case wasn’t tested, there was no need to
define what should happen. The trade off here is that the tests were
quick and easy to develop, but adding in coverage for this edge case
will definitely slow down the development, and for this project it
doesn’t really matter.
- All workers terminate when the pool is stopped, even if they are trapping exits.
This is pretty straightforward (schedule something that takes a long
time, then check that it is dead before that time is up), but it isn’t
clear without reviewing the pool_nagger
module that it tests the
“even if they are trapping exits” part of the requirement.
- Dead children are never restarted. (This is grouped into the same fixture as 6, but seems logically distinct to me.)
It isn’t clear to me that this is actually tested. To do this, you
would need to kill the child (exit(Pid, die)
) and then verify that
it wasn’t restarted. One way would be to fill the pool with long
running tasks, kill one, then after a short timeout (shorter than any
task previously added to the queue) assert that {ok, Pid} = ppool:run(...)
.
- Only
'DOWN'
signals from workers cause a new task to be dequeued; signals from any other process do not.
Here a bunch of 'DOWN'
messages are sent with made up references,
and the test verifies that the expected output from all the tasks is
still received. The timing in this test is pretty subtle though. To
check that none of the workers were deallocated work is added to the
queue with delays such that one short task will be scheduled first,
the two long tasks will block the queue and prevent an almost
immediate task from being executed. A delay is introduced in the test
process to let the first task finish before trying to kill the
remaining tasks. After this, some time is given for signals to arrive
at the test process, and the messages are checked to ensure that only
the first test actually ran.
I think this test could be improved by changing the output from the
workers so each one returned something unique. That way in the
?_assertMatch(...)
expression it would be clear that the only
message we should expect to see originates from the first task.
Some final thoughts
It is very common to see calls to timer:sleep/1
in erlang
tests. This is just the nature of testing concurrent programs; if you
are trying to assert that some other process has done something, then
you must give the other process time to do it. By taking advantage of
massive concurrency to run the test suite (potentially) in parallel,
these small delays (100-600 ms) probably don’t add up to much in the
overall running time for the test suite. They bother me for another
reason though: they create the potential for nondeterministic behavior
in the tests. For every call to timer:sleep/1
there is a chance
however unlikely, that the event we are waiting for won’t happen in
that time. For a test like start_and_test_name
inserting a delay to
wait for a process to terminate seems unnecessary when instead we can
rely on monitors to know when it is safe to proceed to the next part
of the test. Instead of this
start_and_test_name(Name) ->
ppool:start_pool(Name, 1, {ppool_nagger, start_link, []}),
A = whereis(Name),
ppool:stop_pool(Name),
timer:sleep(100),
B = whereis(Name),
[?_assert(undefined =/= A),
?_assertEqual(undefined, B)].
we could have written this:
start_and_test_name(Name) ->
ppool:start_pool(Name, 1, {ppool_nagger, start_link, []}),
A = whereis(Name),
Ref = monitor(process, Name),
ppool:stop_pool(Name),
receive {'DOWN', Ref, process, _Pid, _Reason} -> ok end,
B = whereis(Name),
[?_assert(undefined =/= A),
?_assertEqual(undefined, B)].
To me that seems more clear. It will never be possible to eliminate
all calls to timer:sleep/1
, but at least wherever possible it is
nice to make it clear in the code that we are waiting for a specific
event before proceeding with the test.
How do you know how deep to test?
As mentioned in the discussion of test 5 we could test for an interesting corner case, but it doesn’t really matter for this project. That begs the question, “how far should you go?” My work has grappled with this frequently. I build a lot of “research software” where the software isn’t really the end product, the papers and reports are. There is rarely an expectation that code will be reused or even applied to different data sets in the future, let alone maintained so deep tests often seem like overkill. Institutionally we have tried to define a tiered model of software quality where low-risk/low-exposure projects don’t need testing and as the risk and exposure increase testing must also increase. I haven’t seen many projects move from the low to high category, but I do wonder how difficult it would be to tack on testing as an after thought to a code that wasn’t designed with testability in mind from the beginning. Maybe that means the low-risk version of a project is best treated as an elaborate prototype and discarded when the project moves up to a higher tier.