DecisionFlex

DecisionFlex considerations

For a while I’ve been working on an AI plugin for Unity3D called DecisionFlex. It’s a decision-making tool for your games, based on Utility Theory. It’s great for when you have multiple actions to choose between, and lots of factors to take into consideration. For instance:

  • A soldier choosing its next target, based on target range, type and the soldier’s ammo count
  • A Sims-like human choosing to eat, drink, work or exercise based on hunger, thirst, wealth and fitness
  • A bot choosing to prioritise picking up a health pack, based on the bot’s HP and distance to the nearest pack
  • Any time you might find yourself writing highly nested or cascading if-then statements

DecisionFlex editor-based and data-driven, so you can construct new decisions and actions, and tweak considerations, without diving into code. The code to hook your game up to DecisionFlex is minimal and you don’t need to understand complex equations.

DecisionFlex isn’t quite ready to ship yet, but I’m ready to start talking about it. You can find more information and a web demo here:

http://www.tenpn.com/decisionflex.html

Advertisement

How to build Unity3D scripts from the command line

I use emacs for day-to-day programming, including writing unity scripts. For a while I made changes then tabbed over to the unity editor to watch them compile and check for errors. This was, imaginably, tedious. Anyone not coding in MonoDevelop has the same issue, and even the MonoDevelop folks (god have mercy upon their souls) have issues.

It turns out there are multiple ways to compile your unity scripts from the command line, in such a way that most editors can be told to do it and parse the output for errors, greatly speeding up iteration time. I don’t think there’s anything new here, I’m just collating information that seems spread out across the net like horcruxes.

The first way to build your script files is the easiest to get working, but is cumbersome and slow. We’ll just pass a range of cryptic command line arguments to our Unity executable, it’ll launch without an editor front end, compile the scripts, dump any errors to the console, and quit. Cue up some regex and your editor can pull in the errors. Here is the command:

    path/to/unity -batchmode -logFile -projectPath path/to/project/dir -quit

On OSX you can find the unity executable at /Applications/Unity/Unity.app/Contents/MacOS/Unity.

You can read more about unity command line arguments in the unity docs, but let’s go through the options here:

  • -batchmode: stops any front end from loading
  • -logFile: tells unity what to do with any output. Without this parameter it’s all thrown away, which is of no use to our text editors. The docs above will tell you this argument needs a file parameter, but it’s undocumented that if you omit the parameter, all output, including errors, gets spat out to the terminal. That’s just what we want!
  • -projectPath: the path to the root of your project, where you find the sln file and the Assets directory
  • -quit: tells unity to exit straight after compiling.

This works on every platform, and has the nice feature of pulling in new files before compiling, so we don’t need a separate step for that. However it is slow as a milk cart going uphill, because it has to load and boot a lot of the unity runtime before it can do the simple thing of finding mono and asking it to compile a solution file for us.

It is also annoying because it doesn’t work if the unity editor is currently open. Even when in a heavy coding session there’s normally a need to run the game or tweak inspector values at regular intervals, and opening and closing the editor is not going to help you stay in a nice flow. Having said that, this drawback might not matter on a continuous integration server, so it might be the way to go.

So since this is just asking mono to compile for us in a round-about way, how do we cut out the middle man? The mono command-line compilation tool is called mdtool, and it ships with Unity. On OSX you can find it at /Applications/Unity/MonoDevelop.app/Contents/MacOS/mdtool. Here’s the command to give it:

    path/to/mdtool build path/to/project/project.sln

Here we just need to give mdtool the command build, and the path to the sln file. This is super-quick compared to the Unity approach, and works with the editor still open, but unfortunately won’t pick up newly added files. You’ll still have to tab to the editor for those to be picked up. However that’s relatively uncommon, so isn’t too much of a bother.

However when I tried this with the mdtool that ships with unity, I got very strange errors. They used to be relatively compact, but nowadays there are segfaults and all kinds of stuff going on. I did some casual googling over a few weeks, and couldn’t find a solution. But there was a workaround: install the latest Xamarin Studio (a free mono dev environment based on MonoDevelop), and use its mdtool. On OSX that’s at /Applications/Xamarin\ Studio.app/Contents/MacOS/mdtool.

So there you go: with one of these two approaches, you should be able to compile your unity scripts on the command line, find the errors and connect them to your editor of choice. These should all work on windows as well, but if someone can confirm in the comments that would be great.

If you’re interested, my emacs unity helper functions, including some flycheck compilers using both mdtool and unity, can be found on github.

DAMN Behaviours and Context Steering

After my GDC talk, Treff on twitter sent me a link to a paper from the late 90s by a researcher called Julio K. Rosenblatt. It had some similar ideas to my context steering technique. I thought I’d discuss the differences and similarities here.

The system asks modules (behaviours) to vote for how much it prefers each decision in a set of possible decisions. Each vote is weighted according to what behaviour it came from. Votes range from -1 (against) to 1 (for). Superficially this is similar to context steering, but does not split the votes across an interest and danger map. Because of this, it suffers from the same lack of movement constraint that we see with steering behaviours. The paper gets around this by weighting avoidance behaviours much more highly, but this just ends up disabling some nice emergent behaviours, as we saw with the balanced vector problem:

Competing behaviours can cancel each other out, leading to stalemate

The merging of votes doesn’t happen at the decision space. From the diagram below, it seems like there’s some metadata about the curves used to write votes. Notice how a central curve is created from the two behaviours, rather than one small peak and one large peak. This is essentially a rasterized version of steering behaviours combined through weighted averages.

Screen Shot 2013-05-14 at 15.15.50

I think this all adds up to a rather expensive way of implementing steering behaviours. This is somewhat understandable as this paper came out just as or just before steering behaviours were starting to become popular, so the author may have been deep into his research by the time he heard of them.

There are several interesting aspects to the paper. It mentions that the behaviours all update at different frequencies, and the arbiter may receive votes at any time. This is great for those behaviours that are either low-priority or don’t change a lot, and allows easy parallelisation.

DAMN uses multiple subsystems, each asking the behaviours different questions. A speed subsystem (or “arbiter”) works out how fast to go, a Turn arbiter decides on direction, and because this is originally for controlling robots, a “field of regard” arbiter for working out where to turn the cameras. In comparison, context behaviours tend to use the maps for primarily computing a heading, then speed is calculated as a secondary factor – normally from the highest magnitude of interest or danger encountered. Splitting up like this makes for better separation of concerns, at a possible redundancy cost depending on implementation. It’s an idea worth exploring.

The paper talks about structuring behaviours using a subsumption-style approach, with high-frequency basic behaviours providing a “first level of competence”, built upon with more complex, possibly lower-frequency behaviours later. I like this way of thinking about behaviours. You can build your higher-level behaviours to be allowed to fail, knowing you’ll be caught by the lower-level systems.

There’s also some dense but potentially interesting passages that discuss methods of trying to evaluate the utility of each decision. It looks interesting but is a bit over my head. If anyone’s got any further information on what they were talking about, please share it in the comments.

In summary I don’t think there’s a lot of similarity between context behaviours and DAMN behaviours, beyond the superficial. Context behaviours could take heed of DAMN’s separation of concerns and the way polling is reversed, possibly making for better structuring of code. DAMN could do with adopting some of the simplicity of steering behaviours, or if required, the constraints and predictability of context behaviours.

Context Behaviours Know How To Share

Last time we saw how steering behaviour systems are very useful when either behaviour integrity isn’t important, or if there are a large number of entities to help hide any irregularities.

We saw this is actually an unavoidable feature of steering behaviours. The heart of steering behaviour systems, the merging of decisions of several behaviours, is what makes it so straight-forward to explain and implement  However it is also a flaw. There is not enough information in the system for the decisions to be merged with integrity, and there never can be, as long as only a direction and magnitude  are returned from each behaviour.

For many applications, this may not even be an issue. If the game needs a large collection of entities moving as a flock, the user isn’t necessarily interested if one entity occasionally makes a bad choice. The user sees the flock move as a whole, and isn’t looking at individual behaviours.

flocking starlings

However if the application requires a small number of entities that interact individually with the player, like a racing game, then mistakes and collisions start to become very apparent. In fact they can be game-ruining if not dealt with properly.

7529582638_bdab334226_b

The only way to fix these problems without replacing the system is to make the behaviours aware of each other, so they can return decisions that are sensible in the surrounding context. This leads to stateful and complex behaviours, and increased coupling, and doesn’t scale well when adding new behaviours. Can’t this be fixed without losing simple stateless composable behaviours?

I’m going to explain my solution to this problem, which is a more advanced version of the steering system I wrote for a shipped AAA racing game. After replacing the previous behaviour system, there was a net loss of 4,000 lines of code and yet there was a massive boost in the playability and expressiveness of the AI opponents.

To enable a steering system to merge properly, behaviours need to return much more information. They need to give not a single decision, but a view of the world as it appears to them. This is the context in which the behaviour would make a decision, if it was acting alone. The context of each behaviour can be merged and then, with all the information available, the system can make a decision. A sensible decision that always respects every behaviour, never gets stuck, and still shows emergence.

A behaviour will represent its context by writing into a context map. A context map is a projection of the decision space of the entity onto a 1D array. If the application features entities moving on a 2D plane, a map could be represented by evenly spaced radials around the entity, each a direction the entity could travel in, and associated with a single slot in the array. If the application has race cars zooming around a track, each slot of the array is associated with a distinct position to the left or right of the racing line, representing where the car would like to place itself.

Demonstrating projection from a 2D plane to a context map

An entity’s view of a 2D plane projected into a 1D context map

The context behaviour system creates two of these context maps – one for danger, and one for interest. The behaviours will fill these out. A strong entry in the danger map means someone thinks going that way would be bad. A strong entry in the interest map means someone would love to go that direction. The system passes both maps to every behaviour, asking each to fill them in with its own context.

Context maps are not cumulative. When a behaviour wants to add strength to a slot, it is only written if it is stronger than the value already in the slot.

Behaviours themselves look similar to their steering behaviour counterparts. Consider a chasing behaviour that selects a target and returns a direction towards it. The context maps version would instead iterate through each target, evaluate how strongly it wanted to chase it, and add that strength to the interest map slot that points towards the target. Any criteria can be used here, just like steering behaviours. The behaviour might be more interested in dangerous targets, or have some complex utility expression tree for evaluating interest, but for this example the behaviour will be more interested the closer the target is.

A collision avoidance behaviour would work in a similar manner. It would iterate through all obstacles, decide how strongly it wants to avoid each, and write that strength into the danger map slot that points towards the target. Again for this example, closer obstacles will be more dangerous.

Multiple targets and obstacles writing into the interest and danger maps

Targets and obstacles write into the interest and danger maps respectively, with strength based on distance

These behaviours are stateless, small and easy to write. That advantage of steering behaviours has not been lost.

In practice, writing to a single slot is not very effective. The behaviour might not want to move directly towards an obstacle, but it might be good to avoid going anywhere near an obstacle as well. A similar thing applies for the chase behaviour – if the entity can’t move directly towards a target, it might be good to move in a direction that takes it a bit closer to it. For this reason when writing into the context maps it’s normally a good idea to write across a range of slots, with the strength ramping down the further the slot is from the target direction. There’s a lot of power and expression in how the strength in surrounding slots is created. Helper functions can help keep the behaviours small and clean while using this expressiveness.

Once the danger and interest context maps are fully populated, the system can process them and come up with a unique decision. The exact way the map is processed depends on the application. If there are simple entities moving on a plane, a suitable algorithm might be as follows: Find the slot with the lowest danger, or as will probably be the case the set of slots with the equal lowest danger. Look in the corresponding slots in the interest map and pick the slot with the highest interest. For a tiebreak, pick the slot that is closest to our current heading.

The result of the system is simply the direction of that slot coupled with the interest strength. The entity interprets this as a direction to move in, and takes the strength as proportional to the speed to travel. Because of this, an entity that has nothing but low-interest things to do might move quite slowly, but an obstacle chasing something highly interesting or dangerous would move quickly.

Now that the whole system has been explained, consider the problem from the previous article. There are two potential targets to chase, but what we would consider the best choice is obscured by an obstacle. A naive steering behaviours system implementation would lead to deadlock as the forces balanced out, or worse, oscillation. To avoid this, the chase behaviour (or some higher-level decision-making system) had to be aware of the collision avoidance system, so it could know to ignore the obscured target. The context of the collision avoidance system had bled into the chase behaviour and coupling has increased.

In the context behaviour system, there is interest pointing towards both targets, with more towards the best target. There is danger in the same region pointing towards the obstacle. The system evaluates the danger map first, taking only the least dangerous slots from the interest map. This leaves it with only the interest from the weaker target available, and that direction is chosen. The behaviours have remained lightweight and isolated, but the end result was a very complex decision.

Obstacle danger obscures most interesting target, so the less interesting target is chosen

Obstacle danger obscures most interesting target, so the less interesting target is chosen

Not only that but if a higher-level decision-making system is only concerned with choosing unobscured targets, it can now be removed. I found a lot of higher-level decisions in F1 – who to block, who to draft – could be left to the context behaviour system to work out without increasing coupling.

There are several ways this system can be extended to give nice results. Since the context maps are essentially one-dimensional images, they can be blurred to smooth out narrow troughs and spikes. Last frame’s context maps can be kept, and  this frame’s results blended with them to provide free hysteresis to every behaviour. To do a similar thing for steering behaviours would require custom stateful code in every behaviour! The processing of the maps is ripe for vectorisation or offloading to a GPU.

Implemented as-is, the results of the system will always be exactly the direction of one slot. In the diagrams above, the entity can only move in 8 directions. This can lead to juddery behaviour if there aren’t a lot of slots. Taking the image metaphor again, the fix for this is to implement a kind of sub-pixel rendering. By taking the strength of surrounding slots, and approximating the gradient, the between-slot direction that would have the best strength can be found.

Since the behaviours are providing so much more information than a steering behaviours system, this solution is in isolation unavoidably more processor intensive than a steering behaviour equivalent. The exact complexity depends on the application, but the entities on a place example above is linear to the size of the context maps, and that’s probably typical.

However unlike steering behaviours, this system lends itself well to Level Of Detail (LOD) changes. The slot count of the maps can be changed from frame to frame, ramping down as the entity is further from the camera or player. This will compromise the quality of the movement, but the integrity of movement will still be preserved. If the system is structured so the behaviours are ignorant of the context map size, they don’t even have to know about LOD changes. The ability to have this kind of granular control over LOD is very rare.

By writing danger and interest into context maps, a context behaviour system can fix many of the problems that come from using steering behaviours, leading to small, stateless and decoupled behaviours that are still just as emergent and expressive.

Thanks to Matt Simper (@MSimperGames) and Michael Deardeuff (@mdeardeuff) for their help proof-reading this post.

Steering behaviours are doing it wrong

Update: you can now read part two of this series.

Steering behaviours have for a long time been a gateway drug of game AI. People like this (annoyingly pluralised) technique because its components are fun and easy to write, the framework code is very simple, requiring only some vector maths knowledge, and the end result is awesomely emergent.

For the uninitiated, a steering behaviours system is typically used to decide a direction and speed for an entity to move in, although it can be generalised as selecting a direction and strength in a continuous space of any dimensionality, not necessarily just spatial. The system contains many behaviours, each of which when queried returns a vector representing a direction and strength.

These vectors are then combined in some manner. The most simple combination strategy is averaging, but there are others that don’t really change the arguments I make here.

As an example, consider an entity moving through space avoiding obstacles and chasing a target. A collision avoidance behaviour may return a vector pointing away from nearby obstacles, and the chasing behaviour will return a vector pointing towards the target. If the obstacle is behind and the target in front, the entity can move towards the target unhindered. If an obstacle is to the left of the entity’s direction of travel, it will nudge its movement vector slightly to the right, moving it away from danger. Coding behaviour like this by hand would be much more complicated.

Visual depiction of two steering behaviour scenarios described above

The strength of the returned vectors is proportional to how strongly the behaviour feels about this movement. For instance, when far from a target, the chase behaviour might return a long vector, to get him back into the hunt. When very near an obstacle, the collision avoidance behaviour might return a very long vector, to overwhelm over behaviours and get the entity to react quickly.

behaviour results can be proportional to distance to target

This all sounds great, right? Steering behaviour systems can be very effective, as long as you’re using it in the right situations. It gives coherent and pleasing results when given the numerical statistical advantage to hide its flaws. A massive flock of entities moves through obstacles in a convincing manner, but inspect one of those and you’ll find it sometimes behaves erratically, and without robust collision avoidance.

After all, the collision avoidance behaviour has no direct control over entity movement, and can just suggest directions to move in. If the chase behaviour also decides on a strong result, the two may fight and collision may be unavoidable.

When creating robust behaviours that work at the macro scale, with a small number of entities, these problems become very visible. The small component-based behaviours and lightweight framework are attractive but the system doesn’t scale. You can code around the edge cases, but the previously-simple behaviours soon become complex and bloated.

Consider an example. If our chasing entity picks a target that’s directly behind an obstacle, there will come a point where the vectors from the chase behaviour and the collision avoidance behaviour will cancel each other out. The entity will stop dead, even if there’s another near-by and unobstructed target that could be picked. The chase behaviour doesn’t know about the obstruction, so will never pick the second target.

Competing behaviours can cancel each other out, leading to stalemate

To fix this, the first thing most people will try is to have the hunting behaviour path-find or ray-cast to the target. If it’s unreachable or obscured, the behaviour can pick another target. This is successful, and your system is more robust.

However not only has your hunting behaviour become an order of magnitude more expensive, it’s also become aware that such a thing as obstacles exist. The whole point of a steering behaviours system implementation is to separate concerns, to reduce code complexity and make the system easier to maintain. However we had to break that constraint and have lost those benefits as a result.

This is the design flaw of steering behaviours. Each behaviour produces a decision, and all decisions are merged. If one behaviour’s decision (to chase a particular target) conflicts with another’s (to avoid a certain obstacle), the most intelligent merge algorithm in the world will still fail. There’s no way for it to know that two results conflict, and if there was there’s no way for it to know how to resolve the conflict successfully.

To do that the system needs not decisions, but contexts. It needs to understand how each behaviour sees the world, and only then it can produce its own correct decision.

In a context-based chasing entity, the target behaviour would return a view of the world contextualising that there are several potential targets, and how strongly the behaviour wants to chase each one. The obstacle avoidance behaviour would return a view that showed several obstacles, and how strongly the behaviour wants to avoid each one. When placed in the balanced target-behind-obstacle situation above, the obstacle and the target cancel each other out but all the other contexts remain, including other potential targets. The system can recover and choose a direction that’s sensible and coherent.

If computing and merging world contexts suggests generalised compromises and messy data structures, you’d be wrong. And I’ll tell you why in my next blog post.

What? Don’t look at me like that.

Update: continue reading the second post in this series now.