Infinite Monkey Software Testing

Unpredictable testing for unpredictable users.

One of the difficulties in automated testing is how to test for randomness. The two concepts don’t seem to go well together – most tests start with a defined state, follow a series of actions, and test that the state has changed in the way that’s expected.

Users, of course, don’t follow a series of defined actions. For example, in Bipsync they might be in the middle of working on a note, and suddenly realise that they want to check another note that’s in a different context. But they may not remember where they left it, so they click through three different industries before they find the note they were looking for.

We do a large amount of functional testing using Selenium, but the time required to write a test for every conceivable user flow would be prohibative, and likely would take several hours to run, especially with a large web application such as Bipsync.

Enter monkey testing. It’s not a new concept, it’s been around since at least 1998 – according to the Wikipedia page it was referenced in the Visual Test 6 Bible. The base idea is that you throw pseudo-random input at your application and see what breaks.

We started looking at monkey testing after we found a bug that was triggered by a specific sequence of formatting options and keystrokes. It’s not something that you’d think to write a test for, may not be found in exploratory testing, and potentially affects our users, so it’s definitely something we want to get on top of before it’s seen in the wild.


Originally I looked at using a client-side library such as Gremlins.js, but decided not to use it as I wanted more control over the server-side of things. Eventually I settled on using PHPUnit with Selenium, as we already do for functional testing. The main reason for this is that we already have a large library of actions (open note, create note, etc…) that we could use with the tests, and we’re all for code reuse.

We already use the Faker library to generate random input elsewhere in our test code, so we use this to generate random textual input in places that require it.

At the heart of our monkey test is the random loop function. This function loops a configurable number of times, randomly picks an action from a list, and fires it off. That action may then call the random loop function again, but with a different number of loops and a different list of actions. On every iteration of the loop we check for Javascript errors, and if one is found we immediately fail the test and save the stacktrace. We use TraceKit for this, and as Selenium doesn’t have a lot of options for accessing the developer console, we use TraceKit to set an attribute on the <html> element of the page, which Selenium can quickly and easily check.


Okay, so now we have a stacktrace and a failed test. But without knowing how we got it in the first place, there’s very little we can do with that information. What we need is a way to replay the last actions that occurred and figure out what caused it.

In our situation, every time that we drop back into the base loop we take a snapshot of the database, the current URL of the page, which user is logged in, open logfiles, and close the previous loop’s logfiles. Every time we fire off a new action we log the name of it into one log file so that a human can review the steps leading up to the crash, and every time we make a call for a random – text, boolean, picking a number, etc – we’re actually requesting that from a Randomiser class that stores the value it’s returning in another log file.

Now when an error is fired we have a number of snapshots and every action and every decision that the monkey made. We can then provide the monkey any number of contiguous snapshots and put it in replay mode. It restores the oldest database snapshot, then loads the Randomiser log in order. Now, when it’s asked for a random value, rather than writing each decision, the Randomiser merely returns whatever value is next in the file. This way we can be sure that the monkey follows the same path on every replay.

From there we can distill the actions down to the bare actions needed to trigger the error, and with a repeatable set of steps can now raise a bug in our tracking system. If necessary we can attach the replay files to the bug so that anyone picking up the ticket can replay the steps, or alternatively we could write a functional test that fails until the bug is fixed. This has the added benefit of preventing regression.

So far, all of the work after the test has failed relies on a human to track down the bug and raise the ticket, but in future we’d like to automate the rest of this so that the monkey is able to replay its actions, work out if the bug is replicatable, distil it down to a bare minimum of repeatable actions, check our bug database to see if there’s already a ticket open for it, and raise a bug if not.

The monkey has a bright future ahead of it, and to finish off, here’s a little clip of it in action.


No animals were harmed during these tests.