Running unity tests on a build server

(Read part 1, about creating parametric level smoke tests, here)

Now we have some useful tests, we want our Jenkins install to run them frequently. That way, we really quick feedback on failures. Here’s how we do this at Roll7.

Running tests from the command line

Let’s assume you already have a server to build your game. I won’t cover how to set up a unity Jenkins build server here, except to say that the Unity3d plugin is very helpful when you have lots of unity versions. (Although we couldn’t get the log parameter to be picked up unless we specified the path fully in our command line arguments!)

The command line to run a Unity Test Framework job is as follows:

-batchmode -projectPath "." -runTests -testResults playmodetests.xml -testPlatform playmode -logFile ${WORKSPACE}\playtestlog.txt

You can read more about command line params here, but let’s break that down:

  • -batchmode tells unity not to open a GUI, which is vital for a build server. We don’t want it to hang on a dialog! You can also use Application.isBatchMode to test for this flag in your code.
  • -projectPath "." just tells unity to load the project in our working directory
  • -runTests starts a Unity Test Framework job as soon as the editor loads. It’ll run some specified tests, spit out some output, and make sure test failures cause a non-zero return code.
  • -testPlatform playmode tells the UTF to run our playmode tests, which are the ones we care about for this blog post. You can also use editmode.
  • -testResults playmodetests.xml states where to spit out a report of the run, which will include failures as well as logs. The report is formatted as an nunit test result XML. Jenkins has a plugin that can fail a job based on this file, and present a nice reporting UI.
  • -logFile ${WORKSPACE}\playtest.txt specifies where to write the editor log – by default it won’t stream into the console. The ${WORKSPACE} is a jenkins environment variable, and we found specifying it was the only way to get the unity3d plugin to find the log.

…And that’s enough to do a test run.

Playmode tests in batch

This page of the docs mentions that the WaitForEndOfFrame coroutine is a bad idea in batch mode. This is because “systems like animation, physics and timeline might not work correctly in the Editor”.

There’s not much more detail than this!

In practice, we’ve found any playmode tests that depend on movement, positioning or animation fails pretty reliably. We get around this by explicitly marking tests we know can run on the server with the [Category("BuildServer")] attribute. We can then use the -testCategory "BuildServer" parameter to only run these tests.

This is pretty limiting! But there’s still plenty of value in just making sure your levels, enemies and weapons work without firing errors or warnings.

In the near future we’ll be experimenting with an ec2 instance that has a GPU, to let us run without batchmode, and also allow us to run playmode tests on finished builds more easily.

Gotchas and practicalities

  • Currently, we find a full playmode test run is exceedingly slow, and we don’t yet know why. What takes a minute or two locally takes tens of minutes on the ec2 instance. It’s not a small instance, either! So we’re only scheduling a test run for our twice-daily steam builds, instead of for every push.
  • WaitForEndOfFrame doesn’t work in batch mode, so beware of using that in your tests, or anything your tests depend on.
  • The vagaries of unity’s compile step mean that some sometimes you can get out-of-date OnValidate calls running on new data as you open the editor. Maybe this is fixable with properly defined assemblies, but we hackily get around it by doing a dummy build right at the start of the jenkins job. It goes as far as setting the correct platform, waits for all the compiles, and quits. Compile errors still cause failures, which is good.
  • If you want editmode and playmode tests in the same job, just run the editor twice. We do this with two different testResults xmls, and we can use a wildcard in the nunit jenkins plugin to pick up both.
  • To test your command line in dos, use the start command: start /wait "" "path to unity.exe" -batchmode ... The extra empty spaces are important if your unity path has spaces in it too. To see the last command’s return code in dos, use echo %ERRORLEVEL%.
  • These are running playmode tests in the editor on the build server. We haven’t yet got around to making playmode tests work in a build. That might end up as a follow-up post!

Smoke-testing scenes, using Jenkins and Unity Test Framework

You may be familiar with Unity’s Test Runner window, where you can execute tests and see results. This is the user-facing part of the Unity Test Framework, which is a very extensible system for running tests of any kind. At Roll7 I recently set up the test runner to automatically run simple smoketests on every level of our (unannounced) game, and made jenkins report on failures. In this post I’ll outline how I did the former, and in part two I’ll cover the later.

Play mode tests, automatically generated for every level
Some of our [redacted] playmode and editmode tests, running on Jenkins

(I’m going to assume you have passing knowledge of how to write tests for the test runner)

(a lot of this post is based on this interesting talk about UTF from its creators at Unite 2019)

The UTF is built upon NUnit, a .net testing framework. That’s what provides all those [TestFixture] and [Test] attributes. One feature of NUnit that UTF also supports is [TestFixtureSource]. This attribute allows you to make a sort of “meta testfixture”, a template for how to make test fixtures for specific resources. If you’re familiar with parameterized tests, it’s like that but on a fixture level.

We’re going to make a TestFixtureSource provider that finds all level scenes in our project, and then the TestFixutreSource itself that loads a specific level and runs some generic smoke tests on it. The end result is that adding a new level will automatically add an entry for it to the play mode tests list.

There’s a few options for different source providers (see the NUnit docs for more), but we’re going to make an IEnumerable that finds all our level scenes. The results of this IEnumerable are what gets passed to our constructor – you could use any type here.

class AllRequiredLevelsProvider : IEnumerable<string>
{
    IEnumerator<string> IEnumerable<string>.GetEnumerator()
    {
        var allLevelGUIDs = AssetDatabase.FindAssets("t:Scene", new[] {"Assets/Scenes/Levels"} );
        foreach(var levelGUID in allLevelGUIDs)
        {
            var levelPath = AssetDatabase.GUIDToAssetPath(levelGUID);
            yield return levelPath;
        }
    }
    public IEnumerator GetEnumerator() => (this as IEnumerable<string>).GetEnumerator();
}

Our TestFixture looks like a regular fixture, except also with the source attribute linking to our provider. Its constructor takes a string that defines which level to load.

[TestFixtureSource(typeof(AllRequiredLevelsProvider))]
public class LevelSmokeTests
{
    private string m_levelToSmoke;
    public LevelSmokeTests(string levelToSmoke)
    {
        m_levelToSmoke = levelToSmoke;
    }

Now our fixture knows which level to test, but not how to load it. TestFixtures have a [SetUp] attribute which runs before each test, but loading the level fresh for each test would be slow and wasteful. Instead let’s use [OneTimeSetup] (👀 at the inconsistent capitalisation) and to load and unload our level for each fixture. This depends somewhat on your game implementation, but for now let’s go with UnityEngine.SceneManagement:

// class LevelSmokeTests {
    [OneTimeSetUp]
    public void LoadScene()
    {
        SceneManager.LoadScene(m_levelToSmoke);
    }

Finally, we need some tests that would work on any level we throw at it. The simplest approach is probably to just watch the console for errors as we load in, sit in the level, and then as we load out. Any console errors at any of these stages should fail the test.

UTF provides LogAsset to validate the output of the log, but at this time it only lets you prescribe what should appear. We don’t care about Debug.Log() output, but want to know if there was anything worse than that. Particularly, in our case, we’d like to fail for warnings as well as errors. Too many “benign” warnings can hide serious issues! So, here’s a little utility class called LogSeverityTracker, that helps check for clean consoles. Check the comments for usage.

Our tests can use the [Order] attribute to ensure they happen in sequence:

// class LevelSmokeTests {
    [Test, Order(1)]
    public void LoadsCleanly()
    {
        m_logTracker.AssertCleanLog();
    }

    [UnityTest, Order(2)]
    public IEnumerator RunsCleanly()
    {
        // wait some arbitrary time
        yield return new WaitForSeconds(5);
        m_logTracker.AssertCleanLog();
    }

    [UnityTest, Order(3)]
    public IEnumerator UnloadsCleanly()
    {
        // how you unload is game-dependent 
        yield return SceneManager.LoadSceneAsync("mainmenu");
        m_logTracker.AssertCleanLog();
    }

Now we’re at the point where you can hit Run All in the Test Runner and see each of your levels load in turn, wait a while, then unload. You’ll get failed tests for console warnings or errors, and newly-added levels will get automatically-generated test fixtures.

More tests are undoubtedly more useful than less. Depending on the complexity and setup of your game, the next steps might be to get the player to dumbly walk around for a little bit. You can get a surprising amount of info from a dumb walk!

In part 2, I’ll outline how I added all this to jenkins. It’s not egregiously hard, but it can be a bit cryptic at times.