Automated Containerised Testing of Infrastructure-As-Code

For many companies, the days of throwing code “over the wall” to an ops team to install on physical hardware are over.

Today, services (including microservices) are fundamental parts of many applications – tools like queues, databases and filestores are essential to many products, and product infrastructure often changes just as regularly as the codebase. Factor in scaling, and infrastructure may even change more frequently than code.

No-code-runs-in-a-vacuum

This convergence of infrastructure and code means that when considering change to a product, the team must not only consider the code itself but also the infrastructure it will run on.

No code runs in a vacuum – even “serverless” platforms such as AWS Lambda require some form of orchestration, deployment and testing.

A hugely positive aspect of devops has been that devs have learned from ops and vica versa – increasingly, ops are adopting development best practises by storing their infrastructure as code. Infrastructure-as-code has a number of strengths, and one of those is testability.

If infrastructure is indeed a first-class part of modern applications, it follows that it should be tested. Having an automated safety net that runs on every change allows an organisation to continuously verify that a stated understanding about the product still holds true. This helps to protect against regressions.

This article will explore how automated tests may be used to verify expectations of infrastructure code.


Three scenarios of automated infrastructure testing

Here are three scenarios in which automated infrastructure testing may be employed. These scenarios are complementary and can fit into most workflows.

Scenario 1: Greenfield infrastructure testing

All code “rots” – that is, if it is not used/tested regularly, it can become unpredictable and even dangerous. Infrastructure code is no different – provisioning a new system with this code two months ago is no guarantee that it will work when a big new client comes on board next week. It therefore makes sense to regularly assert that infrastructure code does what we think it does.

Greenfield testing starts with a clean state and provisions it using infrastructure code, and then asserting that it looks like it is expected to look. This may look something like the following:

Greenfield infrastructure testing diagram

When new commits are detected, the Continuous Integration (CI) server detects the new code and wakes up a CI runner. The runner then starts a lightweight container, which is provisioned and then tested. Tests can be used to verify things like:

  • Is the web server running on the expected ports?
  • Are the configuration files owned by the appropriate users?
  • Is the indexing service running?
The advantage of containerisation is that it is affordable, lightweight and representative of production infrastructure. Containers can run in parallel to test multiple branches simultaneously. When the test is finished, the container can simply be disposed of. This can save a fortune in starting/stopping virtual machines.

Scenario 2: Existing infrastructure testing

As well as creating branch new infrastructure from scratch, infrastructure code is also used to update existing, running installations. Before running infrastructure code against client installations, it is a good idea to test that it can update existing infrastructure without incident.

One option is to maintain a cluster of nodes that is configured the same as a production installation but has no client data on it – it is only used for automated testing. This cluster is updated on every build, and then a series of verifications is run to establish that it is configured as expected:

Existing infrastructure testing diagram
If this update is successful, an organisation can have confidence that deploying this version of the infrastructure to a live client is unlikely to be problematic.

Parallel testing on commit

Infrastructure code can be tested in both “greenfield” and “existing infrastructure” scenarios simultaneously. This should be done on every commit:

Parallel Testing to Commit Diagram

These tests can run in parallel and code that does not pass both tests should be rejected.

Scenario 3: Post Deployment

Once production infrastructure code has passed the automated tests outlined in scenarios 1 and 2, it is ready for production use.

Production infrastructure can be verified by adding a post-deployment verification step:

 Post Deployment Verification Diagram

This makes a quick test of assumptions. If something goes awry, automated alarms can help an organisation to make rapid rollbacks – even automated rollbacks.

So this is a silver bullet?

No. Nothing in software is.

Just like with automated application software testing, infrastructure testing is in no way a substitute for manual QA, logging, monitoring and so forth. It is a complementary tool for testing expectations.

All software professionals should aim to keep moving on to dealing with a better class of problem rather than repeating the same mistakes.

Wrapping up

It is important to understand that one can never verify the absence of errors – that is logically impossible. What automated testing can help with is asserting that the configuration of a provisioned system matches expectations.

Infrastructure is just as critical to modern applications as code, and containerisation allows fast, cost-effective and disposable verification of expectations. The same tests can be used to verify test and production infrastructure to raise confidence in the deployment process.