Smoke-testing scenes, using Jenkins and Unity Test Framework

You may be familiar with Unity’s Test Runner window, where you can execute tests and see results. This is the user-facing part of the Unity Test Framework, which is a very extensible system for running tests of any kind. At Roll7 I recently set up the test runner to automatically run simple smoketests on every level of our (unannounced) game, and made jenkins report on failures. In this post I’ll outline how I did the former, and in part two I’ll cover the later.

Play mode tests, automatically generated for every level
Some of our [redacted] playmode and editmode tests, running on Jenkins

(I’m going to assume you have passing knowledge of how to write tests for the test runner)

(a lot of this post is based on this interesting talk about UTF from its creators at Unite 2019)

The UTF is built upon NUnit, a .net testing framework. That’s what provides all those [TestFixture] and [Test] attributes. One feature of NUnit that UTF also supports is [TestFixtureSource]. This attribute allows you to make a sort of “meta testfixture”, a template for how to make test fixtures for specific resources. If you’re familiar with parameterized tests, it’s like that but on a fixture level.

We’re going to make a TestFixtureSource provider that finds all level scenes in our project, and then the TestFixutreSource itself that loads a specific level and runs some generic smoke tests on it. The end result is that adding a new level will automatically add an entry for it to the play mode tests list.

There’s a few options for different source providers (see the NUnit docs for more), but we’re going to make an IEnumerable that finds all our level scenes. The results of this IEnumerable are what gets passed to our constructor – you could use any type here.

class AllRequiredLevelsProvider : IEnumerable<string>
{
    IEnumerator<string> IEnumerable<string>.GetEnumerator()
    {
        var allLevelGUIDs = AssetDatabase.FindAssets("t:Scene", new[] {"Assets/Scenes/Levels"} );
        foreach(var levelGUID in allLevelGUIDs)
        {
            var levelPath = AssetDatabase.GUIDToAssetPath(levelGUID);
            yield return levelPath;
        }
    }
    public IEnumerator GetEnumerator() => (this as IEnumerable<string>).GetEnumerator();
}

Our TestFixture looks like a regular fixture, except also with the source attribute linking to our provider. Its constructor takes a string that defines which level to load.

[TestFixtureSource(typeof(AllRequiredLevelsProvider))]
public class LevelSmokeTests
{
    private string m_levelToSmoke;
    public LevelSmokeTests(string levelToSmoke)
    {
        m_levelToSmoke = levelToSmoke;
    }

Now our fixture knows which level to test, but not how to load it. TestFixtures have a [SetUp] attribute which runs before each test, but loading the level fresh for each test would be slow and wasteful. Instead let’s use [OneTimeSetup] (👀 at the inconsistent capitalisation) and to load and unload our level for each fixture. This depends somewhat on your game implementation, but for now let’s go with UnityEngine.SceneManagement:

// class LevelSmokeTests {
    [OneTimeSetUp]
    public void LoadScene()
    {
        SceneManager.LoadScene(m_levelToSmoke);
    }

Finally, we need some tests that would work on any level we throw at it. The simplest approach is probably to just watch the console for errors as we load in, sit in the level, and then as we load out. Any console errors at any of these stages should fail the test.

UTF provides LogAsset to validate the output of the log, but at this time it only lets you prescribe what should appear. We don’t care about Debug.Log() output, but want to know if there was anything worse than that. Particularly, in our case, we’d like to fail for warnings as well as errors. Too many “benign” warnings can hide serious issues! So, here’s a little utility class called LogSeverityTracker, that helps check for clean consoles. Check the comments for usage.

Our tests can use the [Order] attribute to ensure they happen in sequence:

// class LevelSmokeTests {
    [Test, Order(1)]
    public void LoadsCleanly()
    {
        m_logTracker.AssertCleanLog();
    }

    [UnityTest, Order(2)]
    public IEnumerator RunsCleanly()
    {
        // wait some arbitrary time
        yield return new WaitForSeconds(5);
        m_logTracker.AssertCleanLog();
    }

    [UnityTest, Order(3)]
    public IEnumerator UnloadsCleanly()
    {
        // how you unload is game-dependent 
        yield return SceneManager.LoadSceneAsync("mainmenu");
        m_logTracker.AssertCleanLog();
    }

Now we’re at the point where you can hit Run All in the Test Runner and see each of your levels load in turn, wait a while, then unload. You’ll get failed tests for console warnings or errors, and newly-added levels will get automatically-generated test fixtures.

More tests are undoubtedly more useful than less. Depending on the complexity and setup of your game, the next steps might be to get the player to dumbly walk around for a little bit. You can get a surprising amount of info from a dumb walk!

In part 2, I’ll outline how I added all this to jenkins. It’s not egregiously hard, but it can be a bit cryptic at times.

Anatomy of a unit test

In my spare time I maintain a unit testing library built for Unity3D. It’s called UnTest, it’s open-sourced under the MIT license, and you can download it here.

In the Unity3D community forums announcing UnTest, _Shockwave asked for an example of some real-life unit tests written with this framework. UnTest is very xUnit-flavoured, so they follow a standard pattern, but I thought it would be a good excuse to talk about good unit testing practice.

Much of my unit testing approach is from Roy Osherove’s Art of Unit Testing, which is a very readable and practical book on unit testing. It’s aimed at .Net, so highly applicable for Unity3D development. The Art of Unit Testing website also has some recorded conference lectures from Osherove that are also worth watching. If you want to get better at writing unit tests, these are great resources.

The unit test I’m going to dissect is below. It’s a real-life test from a production behaviour tree system. It’s not really important here to understand what a behaviour tree or a selection node is, as much as the patterns and conventions I followed. Good unit tests are readable, maintainable and trustworthy. As we walk through the test, I’ll explain how these qualities apply, and how to maximise them.

    [Test]
    void UpdatePath_OneActionChild_AddsSelAndChildToPath() {

        var actionNode = FakeAction.CreateStub();
        m_selNode.AddChild(actionNode);

        m_selNode.UpdatePath(m_path, NodeResult.Fails, null);

        var selectionThenAction = new ITreeNode[] {
            m_selNode, actionNode };
        Assert.IsEqualSequence(m_path, selectionThenAction);
    }

To increase readability, the first thing to note is the context of the file you can’t see. It’s in a file called SelectionNodeTests.cs, so I instantly know this test applies to the SelectionNode class. There’s only one class in this file, with the same name, so there’s no chance of confusion.

The name of the function follows a consistent convention throughout the codebase: FunctionUndertest_Context_ExpectedResult. There are many naming conventions you could follow, this is the one I do. Context is how we set up the world before running the function. In this case, we’re adding a single action node to the selection node. ExpectedResult is how we want the function to behave; here we want the selection node and the action node to be added to the path.

It’s not important how long the name of this function is, since it’s never called from anywhere. The more readable and informative you can make the function name, the easier it will be to figure out what went wrong when it fails.

The unit test is split into three sections following the AAA pattern: Arrange, Act, Assert.

Arrange, where I set up the selection for testing:

    var actionNode = FakeAction.CreateStub();
    m_selNode.AddChild(actionNode);

All I need to do is create an action node and add it to the selection node.

Act, where I execute the function we want to test:

    m_selNode.UpdatePath(m_path, NodeResult.Fails, null);

Assert, where I check to see that end condition we expected has been fulfilled:

    var selectionThenAction = new ITreeNode[] {
        m_selNode, actionNode };
    Assert.IsEqualSequence(m_path, selectionThenAction);

Here I construct what I expect the path to be, and assert that it matches the actual path using an Assert library provided by UnTest.

By following this layout, all tests can be easily scanned. When time comes to fix or update a test, developers can dive in quickly.

You’ll notice I could compact the Assert section into one line:

    Assert.IsEqualSequence(m_path, new ITreeNode[] { m_selNode, actionNode });

The reason I keep it separate is to avoid “magic numbers”. It’s too easy to write code like, Assert.IsEqual(result, 5). The writer may know what this 5 means, but it would be much better for future readers to put it in a named variable and write Assert.IsEqual(result, hypotenuseLength).

Now this test is as readable as possible, how did I make it maintainable too? You’ll notice that by improving readability I’ve gone some way to also helping maintainability, as something that’s easier to read is also easier to understand, and therefore is easier to maintain. But there are other things I do as well.

Check out the first line:

    var actionNode = FakeAction.CreateStub();

I need an action to put into the selection node. I could use an existing IAction concrete class, but then any bugs in that concrete class might cause this test to fail. I’ll cover more why that’s bad later, but just pretend it sucks.

I could derive a new class from IAction, which I could keep simple enough to avoid bugs, but then I’d have to maintain that whenever the Action class interface changed. It’s much easier to use a “mocking framework” to do most of the hard work for me.

A mocking framework is a library that can be used to construct a new type at runtime that derives from Action and just does the right thing (among many other things). Then any changes are picked up for me automatically, and I have less code to maintain. If that sounds like magic, that’s because it is.

There’s a mocking framework behind that FakeAction.CreateStub() call, but since it’s such a common use case in this test suite I’ve wrapped it up in a helper function.

Any mocking framework that works with mono will work with Unity3D. I use Moq. The latest version is here. I’ve mirrored this in a unitypackage here for easy importing to Unity.

To further isolate myself from changes, I’m constructing the member variables m_selNode and m_path in a setup function (not shown). This function is run automatically before every test, and makes new SelectionNode and Path objects. This is not only handy, because they’re used in every test in the class, but also isolates the unit tests from changes to the constructor signatures. Other commonly-used functions can also be hidden behind helper functions, but it’s best not to hide your function-under-test for readability reasons.

The final thing I need to do is make the test “trustworthy”.

By going through the maintainable and readable steps, I’ve made sure this test depends on the minimum amount of game code. When this test fails, hopefully it will only be because the function under test, UpdatePath(), had an error.

The more game code you depend on, the closer your test slips along the spectrum from unit to integration test. Integration tests check how systems connect together, rather than single assumptions. They have their place in a testing plan, but here I’m trying to write a unit test. A great rule of thumb is that a single line of buggy code should cause failures in the minimum of unit tests, and ideally only one. If lots fail, that’s because the code isn’t isolated properly and you’ve ended up with some integration tests.

Some of my early unit tests, from F1 2011, created a whole world for the AI to move in and recorded the results, rather than mocking surrounding code like we have here. The end result was that a single bug in the world code could cause many many tests to fail. That makes it hard to track down the root cause of the bug, and meant I had probably written integration tests instead of unit tests.

When this test does fail, it will be deterministic. There’s no dependency here on databases, network services, or random number generators. There’s nothing worse than unit tests that fail once in a blue moon, because they erode developer trust in the test suite. That’s how you end up with swathes of tests commented out, and wasted engineering time.

Now you understand why I’ve written this real-life unit test in this way, and why it’s important your unit tests are readable, maintainable and trustworthy. Like any type of programming, writing good unit tests takes practice and perseverance. They’re truly the foundation of your project, giving you the freedom to restructure at will and the confidence that your game code is high quality. But like any foundation, if they’re not well engineered the whole edifice comes rapidly crumbling down. Take the time to follow up with the resources I linked above, and you will hopefully avoid that situation.