The trouble with rendering a lot of different looking characters is that they tend to require a lot of drawcalls. Normally when making a crowd, the way to do this is to combine several skinned meshes together into a single skinned mesh and then send this to the GPU as one mesh. For Ludus I wanted to draw a huge amount of characters in the crowd, with sufficient visual variety that they looked like a crowd of individuals as opposed to multiple copies. I also wanted to control their animation independently to display things like a mexican wave, and different other ‘multiple people doing the same thing’ type of motions.
I was thinking a bit about the kind of motions that the chinese olympic opening ceremony showed so well with lots of people doing something with a slight delay to eachother, but almost completely in synch. For example everyone holding up a card at the same time, or doing some arbitrary motion. There’s something very appealing with those motions because as synched up as they are, they are also subtly offset because however trained they are, there are always some minor differences in reaction time etc.
In order to do this I figured I needed to be able to control the animations of the characters individually, but still send it to the GPU in as few draw calls as possible.
Unity will try to dynamically batch any unskinned mesh together that is less than 900 vectors of complexity, including UV coordinates and normals. That leaves us with a budget of around 300 vertices + 300 uv coordinates + 300 normals. To achieve this while keeping the overall shape I first cut the body up into the major parts that I wanted to animate, used an automatic polygon reduction tool to hit the target vertex amount while maintaining the shape as much as possible.
Now animating the mesh only using these major areas instead of a skinned mesh, I can get enough animation fidelity for the crowd agents without using bones at all. Since the crowd is far away, there’s no way of seeing the bad interesctions etc, so even though the model parts are completely rigid, we don’t really percieve it as wrong from the distance we see them at.
Up close of course, the mesh is not sufficient, but as soon as it goes somewhat into the distance we stop paying attention to a lot of the details and start seeing the agents as a crowd.
When we look at a large crowd, the thing that affects us the most in terms of perceiving the characters as a group of individuals as opposed to a group of copies is colour. Shape and animation are both secondary to this. We can easily see how big an impact this has on our perception by comparing the two images below.
In order for dynamic batching to work in a complex scene we have to make the rendering instruction as simple as possible. This means excluding the crowd from shadow computation, and basically using vertex colours as much as possible. I wrote a simple unlit vertex colored shader, and then computed on the CPU the shading of the characters per vertex at spawn time. This way we don’t need to compute anything substantial on the GPU. not even a dot product. So each character is vertex coloured using a random skin tone and their rest value normal dot product to the light forward direction.
This might seem strange as the characters move around their lighting is effectively baked in, but because of the distance and the multitude of characters, we don’t really perceive the ‘error’. In the end we save a huge amount of drawcalls using the dynamic batching method and can still maintain individual agents for animation purposes, and have about as many of them as our CPU can handle.
And here’s the final test from unity, 1200 individual crowd agents running in the editor on an Intel HD 4000 integrated graphics card.
And there you have it, dynamic draw call batching for use in crowds.