CSM in A7/A8

Alright! So, this is a very simple tutorial to extend the shader workshop's shadow tutorial to use a very simple (but good-looking) implementation of Cascaded Shadow Mapping.

Quite frankly, if you own Commercial or Professional edition of A7 or A8 and you haven't been through the shader workshop, you're not getting your money's worth. Most of the shader programmers on the forums love the way A7/A8 is set up for shaders, and for all the pros and cons of each engine many come back to A7 even if it's just because it's so good for shader programming.

You don't have to be a shader pro, but all it takes is a quick look to realise that HLSL is no harder (and is in some ways easier) than Lite-C. I strongly recommend that you equip yourself to at least understand how shaders work in general.

Enough of my ranting :)

You can download a sample containing everything you need for this tutorial as it should be once you've finished -- the link should be right at the bottom of this page under "Attachments". Open up "shadowmapping.c" and run it. If it doesn't work, it may be because your graphics card doesn't support Shader Model 3. If so, get into "Shadow.fx" and change the lines that say "vs_3_0" and "ps_3_0" to "vs_2_0" and "ps_2_0", respectively, and comment out "#define USE_PCF" near the beginning of the file so that a wussier version is used. A7/A8 is equipped to automatically fall back to a shader version that's compatible with the hardware it's running on IF you provide one or more fallback techniques (these are described in the manual). This tutorial doesn't cover that :)

Setting up

I'd like you to run SED and open up shadowmapping.c and Shadow.fx from A7's samples folder. If you can't find them, make sure you have the latest public beta version of A7 (at time of writing it's 7.84 -- don't worry, it's very stable, and if that's not the version on the main download page then it's only a matter of time before it is). You might want to copy these into another folder if you want to preserve the samples as they are (like I do). If so, you'll also need a copy of Depth.fx, blob.mdl, and small.hmp in this folder in order for it all to work.

You'll notice that shadowmapping.c is very simple and short -- only 88 or so lines, and 20% of it is just comments. Shadow.fx is a little trickier, but that's where the real magic's at. It shouldn't be too much trouble if you've been through the shader workshop, but even if you haven't a lot will be explained.

What we've got to start with

This is just a recap of what's in shadowmapping.c. If you already feel comfortable with how that works, or you really don't care and just want to get into changing it, skip ahead to Getting our hands dirty.

Some may notice that shadowmapping.c is different to what you'll find in shadowdemo.c in the shader workshop. What we've got here is simpler, cleaner, a more appropriate environment to show off shadows, and takes advantage of more recent features.

We basically have a view set up to act as the sun, and what it sees will be rendered into a 2048 by 2048 bitmap. Imagine that the sun is looking at the earth, and everything it can see is lit, while everything it can't see is in its shadow.

The most difficult part of shadowmapping.c is mtlDepth_event. Basically, in a shader we use a coordinate system to describe the position of points within a texture and on the screen. While there's more to it for the sake of depth and perspective, suffice it to say that screen-space coordinates range from -1 to 1, and texture space coordinates range from 0 to 1. So, if we need to convert a point's screen position to the corresponding position in a texture, we need to take both components of its screen position, halve them, and then add 0.5. That's what the matrix in mtlDepth_event will do. For those of you wondering what exactly lines 43 and 44 are doing, in order to line everything up perfectly, the coordinates also need to be offset by half a pixel in each axis.

The views viewDepth and camera (the default view) are assigned materials to render the depth of everything, and to calculate shadows for everything, respectively.

The main function has very little to do except set up the level and put viewDepth in the right position.

Getting our hands dirty

Alright, let's get this over and done with. Shadow mapping has all sorts of issues. You only have so much resolution you can use. If it's distributed over a large area, each shadow looks blocky. If it's distributed over a very small area, you can have very nice looking shadows within that area, but then nothing outside that area. With CSM, we're going to have three views for the sun:

First thing we're going to do is set up the views and render targets.

Right at the top of shadowmapping.c put:

BMAP *shadow1 = "#2048x2048x14";

BMAP *shadow2 = "#2048x2048x14";

BMAP *shadow3 = "#2048x2048x14";

float shadowMat1[16];

float shadowMat2[16];

float shadowMat3[16];

These will be the render targets for each view. 2048x2048 is the resolution, and the 14 at the end indicates that the render target is actually a 32-bit floating point texture. We also need to be able to keep track of the matrices we'll set up later on. That's what shadowMat1 and so on are for.

function mtlShadow_event() {

    mat_effect1 = shadowMat1;

    mat_effect2 = shadowMat2;

    mat_effect3 = shadowMat3;

    return 0;

}

Then go into mtlShadow (should begin about line 37) and add this line somewhere:

 

event = mtlShadow_event;

Also make sure you set its ENABLE_VIEW flag:

flags = ENABLE_VIEW | AUTORELOAD;

Forgetting that flag has messed me up twice. In the end, if everything compiles beautifully but you can't see any shadows, you probably forgot that flag. Of course, you won't now :)

The matrices shadowMat1..3 will be set elsewhere, but it's important that we have a material event making use of mat_effect1..3. We could reference shadowMat1..3 straight from the shader, if we wanted to, but these won't necessarily be updated in time for rendering each view, and things can get ugly. Trust me.

Now replace the declaration of viewDepth with these three views:

VIEW* viewDepth1 = {

    bmap = shadow1;

    material = mtlDepth;

    flags = SHOW | ISOMETRIC;

}

VIEW* viewDepth2 = {

    bmap = shadow2;

    material = mtlDepth;

    flags = SHOW | ISOMETRIC;

}

VIEW* viewDepth3 = {

    bmap = shadow3;

    material = mtlDepth;

    flags = SHOW | ISOMETRIC;

}

The differences here are that each view's render target refers to the ones we declared above, and that each view also has the ISOMETRIC flag. The sun is so far away that its rays are practically parallel to each other. You can often tell a game that doesn't use such a flag when shadows move around when the camera moves.

Alright, now we need to fiddle with mtlDepth_event. All we need to do is add these lines just before the return 0 at the end of the function:

if (render_view == viewDepth1)

    mat_set(shadowMat1, mtlShadow.matrix);

else if (render_view == viewDepth2)

    mat_set(shadowMat2, mtlShadow.matrix);

else

    mat_set(shadowMat3, mtlShadow.matrix);

So, ultimately mtlShadow.matrix becomes redundant. But the purpose of this tutorial is to get CSM up and running with as little editing as possible. Each material gets a bunch of matrices that you can play with yourself. We're using matrices 1-3 for our three shadow views, and those are set to shadowMat1..3 in a different material event we set up earlier. We also need to make sure we're not referencing the now-nonexistent viewDepth. Just above what we did here are two lines that call for viewDepth.bmap.width. These can be changed to viewDepth1.bmap.width, or use the other views, or just "2048", since that's the resolution of our shadows.

Okay, we're nearly done with shadowmapping.c. Now we just need to make some changes to main. Just after "camera.material = mtlShadow" (which should be around line 102 by now) we'll delete the line "viewDepth.stage = camera" and replace it with:

viewDepth3.stage = camera;

viewDepth2.stage = viewDepth3;

viewDepth1.stage = viewDepth2;

viewDepth1.arc = 30;

viewDepth2.arc = 110;

viewDepth3.arc = 160;

We're just making sure that the depth views are rendered before the camera. If they get rendered afterwards the shadows will have a one-frame delay which can be particularly troublesome as objects or the camera move around. We're also setting the width of each view. With ISOMETRIC views, the arc determines the width of the view according to an equation described in the manual. Basically we're make sure viewDepth1 is small, viewDepth2 is bigger, and viewDepth3 is huge.

Put this before your while loop in main:

var viewOffset1, viewOffset2, viewOffset3;

VECTOR tempVec;

In order to make the most of our view space, we want to move each depth view to a position relative to the camera. We don't want to face the camera directly, because then half of that view space is going to be wasted behind the camera. We're going to calculate where to put each view. Now, many papers have fancy-pants algorithms for calculating the optimal size and position for each view, but we want to keep this simple. All we want to do is make sure the camera is within the edge of each view's frustum, facing towards its centre. Delete both the lines in the main loop that reference viewDepth (probably about 122 to 123 by now), and put all this there instead:

viewDepth1.pan = 180+sun_angle.pan;

viewDepth1.tilt = -sun_angle.tilt;

vec_set(viewDepth2.pan, viewDepth1.pan);

vec_set(viewDepth3.pan, viewDepth1.pan);

viewOffset1 = 2048 * tanv(viewDepth1.arc/2) * 0.9;

viewOffset2 = 2048 * tanv(viewDepth2.arc/2) * 0.9;

viewOffset3 = 2048 * tanv(viewDepth3.arc/2) * 0.9;

vec_for_angle(tempVec, camera.pan);

vec_scale(tempVec, viewOffset1);

vec_add(tempVec, camera.x);

vec_set(viewDepth1.x, sun_pos);

vec_normalize(viewDepth1.x, 1000);

viewOffset1 = (camera.z + 1000 - tempVec.z)/viewDepth1.z;

vec_scale(viewDepth1.x, viewOffset1);

vec_add(viewDepth1.x, tempVec);

vec_for_angle(tempVec, camera.pan);

vec_scale(tempVec, viewOffset2);

vec_add(tempVec, camera.x);

vec_set(viewDepth2.x, sun_pos);

vec_normalize(viewDepth2.x, 1000);

viewOffset2 = (camera.z + 1000 - tempVec.z)/viewDepth2.z;

vec_scale(viewDepth2.x, viewOffset2);

vec_add(viewDepth2.x, tempVec);

vec_for_angle(tempVec, camera.pan);

vec_scale(tempVec, viewOffset3);

vec_add(tempVec, camera.x);

vec_set(viewDepth3.x, sun_pos);

vec_normalize(viewDepth3.x, 1000);

viewOffset3 = (camera.z + 1000 - tempVec.z)/viewDepth3.z;

vec_scale(viewDepth3.x, viewOffset3);

vec_add(viewDepth3.x, tempVec);

If you are using this in an actual game, I'd put this code in a different function that gets called every frame, or a function that gets called once but has a while-loop to keep it updating every frame. It's good to keep functions as clean as possible, and main gets a little cluttered here.

So, the first four lines just make sure the views are all looking in the same direction as the sun. The next three lines calculate the minimum radius of each view's frustum. Then, for each view we centre it in front of the camera to take advantage of as much area as possible. We also move each view so that, without ruining its alignment we've just calculated, is 1000 quants higher than the camera (to make sure most shadow-casting objects don't get clipped away by each view's near-plane).

You may think that we're dividing by a variable that has potential to be zero in three cases. We don't want to do that, ever. However, this will only occur when the sun's tilt is at 0, and that'll be impossible by the end of this tutorial. At the moment, the sun's tilt is whatever the default is. Later on when we make the sun's tilt dynamic, we'll clamp the sun's tilt to a safe range.

Okay, that's everything we need to do to shadowmapping.c. Now let's hit up Shadow.fx.

First thing we're going to do is comment out #define USE_PCF -- we're first aiming to get CSM up and running as fast as possible. We can worry about PCF later.

Now, we need to allow it to use the matEffect matrices we set up before. Just below "Application fed data" (or anywhere you want, really, as long as it's before the actual vertex and pixel shaders) we'll put:

float4x4 matEffect1;

float4x4 matEffect2;

float4x4 matEffect3;

Followed by:

texture shadow1_bmap;

texture shadow2_bmap;

texture shadow3_bmap;

Instead of "texture TargetMap;".

As such, we'll also want to get rid of the line with "DepthSampler" in it, and have the following instead:

sampler DepthSampler1 = sampler_state { Texture = <shadow1_bmap>; };

sampler DepthSampler2 = sampler_state { Texture = <shadow2_bmap>; };

sampler DepthSampler3 = sampler_state { Texture = <shadow3_bmap>; };

Remember the BMAPs we declared in shadowmapping.c with very similar names? Because the folks behind A7/A8 love us so much, we can use any global BMAP in a shader just by sticking "_bmap" on the end of its name. Lovely, eh? That means when you've finished this tutorial and you're super excited to get it into your game and get it working with all the shaders you've already got, we don't have to worry about using up the entSkins and mtlSkins.

You may notice that a lot of what we're doing is adapting what was previously done once to be done thrice instead. This trend continues in the vertex shader -- find the line with outDepth amongst the arguments for ShadowVS, and replace it with:

out float4 outDepth1: TEXCOORD2,

out float4 outDepth2: TEXCOORD3,

out float4 outDepth3: TEXCOORD4)

Similarly, find the other line with outDepth at the end of the ShadowVS function and replace it with:

outDepth1 = mul( mul(inPos,matWorld), matEffect1 );

outDepth2 = mul( mul(inPos,matWorld), matEffect2 );

outDepth3 = mul( mul(inPos,matWorld), matEffect3 );

See where this is going? We could re-write fDist to pick the right depth-map based on its given co-ordinates, but it might as well be in the ShadowPS function (it'll reduce redundancy when we're actually using PCF). Remove the whole fDist function, since its reference to DepthSampler will get in the way of compilation anyway.

Alright, now for the arguments for the pixel shader ShadowPS. Change inDepth: TEXCOORD2 and the line it's contained in to:

in float4 inDepth1: TEXCOORD2,

in float4 inDepth2: TEXCOORD3,

in float4 inDepth3: TEXCOORD4) : COLOR0

And now we're back to the more interesting stuff, where we actually change something meaningful. We need to figure out which depth-map to use for shadow calculations. If any of the x or y coordinates are out of the 0 to 1 range after being divided by the w component, then we don't want to use it. Put this in the beginning of ShadowPS:

float2 range1, range2, range3;

range1.x = max(inDepth1.x/inDepth1.w, inDepth1.y/inDepth1.w);

range1.y = min(inDepth1.x/inDepth1.w, inDepth1.y/inDepth1.w);

range2.x = max(inDepth2.x/inDepth2.w, inDepth2.y/inDepth2.w);

range2.y = min(inDepth2.x/inDepth2.w, inDepth2.y/inDepth2.w);

range3.x = max(inDepth3.x/inDepth3.w, inDepth3.y/inDepth3.w);

range3.y = min(inDepth3.x/inDepth3.w, inDepth3.y/inDepth3.w);

float fShadow;

Alright. Now we have all the information we need to choose which depth-map to look up.

Find the fShadow = ... just after the #else -- we're going to replace this whole line (which basically decides if the current pixel is in shadow or not), with this mess:

if (range1.x < 0.95 && range1.y > 0.05)

    fShadow = (tex2Dproj(DepthSampler1, inDepth1).r + 0.0001) < inDepth1.z ? fDark : fBright;

else if (range2.x < 0.95 && range2.y > 0.05)

    fShadow = (tex2Dproj(DepthSampler2, inDepth2).r + 0.001) < inDepth2.z ? fDark : fBright;

else if (range3.x < 0.95 && range3.y > 0.05)

    fShadow = (tex2Dproj(DepthSampler3, inDepth3).r + 0.002) < inDepth3.z ? fDark : fBright;

else

    fShadow = fBright;

As you can see by the if comparison, we're not going all the way to the edge of the range of a texture (1.0 and 0.0) -- that's because we're going to do basically the same thing when we get PCF (which'll soften everything) working, and we don't want to go over the edge of the image when we're doing that. It would actually be safe to go much closer to the edge, but I just like it like this. You can also see a factor for bias in there (the 0.0001 in the first case). It's extremely small, because with ISOMETRIC views the depth only ranges from 0.0 to 1.0, so that small value actually corresponds to a noticeable difference. Without the bias we'd have heaps of "surface acne" -- broken shadows on lit surfaces due to there being no smoothing between pixels in a depth-map. You can (and should) play with these values.

Now would be a good time to test it. If it doesn't work... well... you might not have followed properly. Or I might have missed something. PM me on the forums if you really need to.

Let's make it look better

I told you I just wanted to give you quick and easy CSM, but of course you want better than that. Here are a few changes we'll make that'll very quickly clean things up. Both of them are done in Shadow.fx:

At the beginning of ShadowPS is the declaration of fDiffuse. Currently, it's not actually taking into account the fDark and fBright variables set at the beginning of the file. Instead, it's getting a lighting factor ranging from 0.0 to 1.0. That's actually quite useful, because the lerp function interpolates between two numbers using a factor ranging from 0.0 to 1.0, so change that whole line to this:

float fDiffuse = lerp(fDark, fBright, saturate(dot(-vecSunDir, normalize(inNormal))));

Now, this brings to light another problem (you'll see if you test it). The nice smooth diffuse lighting is hardly visible next to the harsh shadow. The solution is pretty straight forward. Replace the last line of ShadowPS with:

return tex2D(TexSampler,inTex) * min(fShadow, fDiffuse);

And now it should look much better. What's happening is: instead of multiplying a harsh shadow by smooth shading, we only let the shadow affect the result according to how much it is already lit.

Finally, let's get it working with the sample's PCF to soften it up a bit. This will only work if your video card supports Shader Model 3 or higher. You'll find in the ShadowPS a #ifdef/#else conditional. Basically, get rid of everything in-between #ifdef and #else and replace it with this:

fShadow = 0.0;

if (range1.x < 0.95 && range1.y > 0.05)

    for (int i=0; i < 9; i++)

        fShadow += (tex2Dproj(DepthSampler1, inDepth1 + fPCF*fTaps_PCF[i]).r + 0.0001) < inDepth1.z ? fDark/9 : fBright/9;

else if (range2.x < 0.95 && range2.y > 0.05)

    for (int i=0; i < 9; i++)

        fShadow += (tex2Dproj(DepthSampler2, inDepth2 + fPCF*fTaps_PCF[i]).r + 0.001) < inDepth2.z ? fDark/9 : fBright/9;

else if (range3.x < 0.95 && range3.y > 0.05)

    for (int i=0; i < 9; i++)

        fShadow += (tex2Dproj(DepthSampler3, inDepth3 + fPCF*fTaps_PCF[i]).r + 0.002) < inDepth3.z ? fDark/9 : fBright/9;

else

        fShadow = fBright;

Of course, it'd take less code to have one for-loop outside the if-else chain, instead of three almost identical for-loops in each stage of the chain, but it feels wasteful to have the same if-else chain used over and over again within a loop if the results will always be the same. This way, the if-else change should only be traversed once for each pixel. In the end, there's a good chance it wouldn't make much of a difference, since the shader compiler takes a lot of liberties when compiling in attempts to optimise the code, and it could easily end up being the same thing to the graphics card.

Now, as tempting as it is to just leave you with that, here's an explanation of what we changed. It's very, very similar to the conditional we were already using for choosing the right depth-map, with a few differences:

Before we can test this, two more changes need to be made:

Near the top of the file, "fPCF" is declared and initialised. Since we're using ISOMETRIC views, our range is much lower, so change it to "0.0005".

Near the bottom of the file we have VertexShader = ... and PixelShader = .... These need to be changed to use Shader Model 3, since the loop combined with all the if/else conditionals used with PCF is too complex to be handled by Shader Model 2 or less. So:

VertexShader = compile vs_3_0 ShadowVS();

PixelShader  = compile ps_3_0 ShadowPS();

Now get the example running and, while the window is open, uncomment the line "#define USE_PCF" and then save Shadow.fx. The engine will automatically recompile the shader and you can toggle relatively quickly between using PCF for soft shadows or using simple unfiltered shadows.

Test the range

Now, this example has not been built to take advantage of CSM, so copy "terrain.hmp" from the sample folder into the folder you're using, go to shadowmapping.c and into main, and replace the level_load(...) line with this:

level_load("terrain.hmp");

int i;

for (i=0; i<20; i++)

    ent_create("blob.mdl",vector(random(2000) - 1000, random(2000) - 1000,0),NULL);

And add these lines to the while loop (somewhere, it doesn't really matter):

sun_angle.pan += 1.25 * time_step;

sun_angle.tilt += (key_y - key_h) * 5 * time_step;

sun_angle.tilt = clamp(sun_angle.tilt, 10, 170);

Shadows are way more exciting when they move (a lot of games with dynamic shadows have their light-sources swinging as if there is a strong wind, even in circumstances where it would be impossible. Why? Because it looks so cool!). Also, having control is nice. This lets you go pretty close to sunset/sunrise, or appreciate some noon shadows, or whatever.

The clamp is because we're tracing a line to a flat plane. A line will not meet a plane if the line is perpendicular to the normal of the plane -- in this case, when the sun's tilt is 0 or 180. That's where the potential divide by zero I mentioned earlier comes up, and we're preventing that -- don't go all the way down to 0 or up to 180. Because of the way we've set up the views to attempt to elevate to a point 1000 quants above the camera without leaving a fixed line, this causes them to move very far away if we get too close to an angle of 0 or 180. This could have repercussions because of the clipping range of the views. Clearly, this is a limitation on my technique, not shadowmapping itself nor CSM. If you really need to go all the way to 0 degrees (or further) you can position the views a ton of other ways. Just make sure you consider the positions of all important shadow-casters.

Alright, moving on... You know what? A really big shadow-casting object would be cool for showing off our shadows. Find the line in main "ENTITY* ent = ent_create...". On the very next line let's put:

ent.scale_x = ent.scale_y = ent.scale_z = 20;

Run shadowmapping.c now and enjoy many more shadows, constantly moving around. Put a harsher angle on the shadows with key Y and H keys.

Of course, there are far more interesting scenes that can be made to demonstrate CSM, but this gives you more room to move around.

Here're a couple of shots:

We get really good coverage here -- everything in the scene gets shadows.

But they still look pretty good close-up, as well!

Debugging

Okay, so people asked for pics showing how it all works. I thought it would be really useful to have an example where I modified the colour depending on which depth-map we're calculating shadows from. So, go to Shadow.fx and save it as something else (ShadowDebug.fx, for example). Alternatively you can just use a #define, the same way PCF is done, but I"m too lazy :)

Right near the beginning of ShadowPS (it doesn't particularly matter where, as long as it's before #ifdef USE_PCF, put in a:

float4 colour;

Now go down to whatever's after #endif. It should just be "return ...", right? Well replace that with this:

colour = tex2D(TexSampler,inTex) * min(fShadow, fDiffuse);

if (range1.x < 0.95 && range1.y > 0.05)

    colour.r *= 2;

else if (range2.x < 0.95 && range2.y > 0.05)

    colour.g *= 2;

else if (range3.x < 0.95 && range3.y > 0.05)

    colour.b *= 2;

return colour;

To use this, go to shadowmapping.c and find mtlShadow. Change "effect = ..." to:

effect = "ShadowDebug.fx";

And now you should be able to run the example. The tint indicates which depth-map is being used. A red tint indicates viewDepth1, a green tint indicates viewDepth2, and a blue tint indicates viewDepth3. No tint indicates that there's no shadow at all, because it's outside the range of all the depth-maps.

So, what do you notice? Unless you move the camera REALLY far away from everything, viewDepth3 is nowhere to be found! This is pretty cool. Very cool, actually. Because it means we can make some adjustments and get even better resolution on our close-range and medium-range shadows.

In main we have viewDepth1.arc = 30 etc, right? That's where we've defined how much range each view covers. Fiddle with those values until you get something you're happy with. I quite like the results I'm getting with 15, 45 and 110. The shadows are really crisp close-up, but we still have some decent range:

This shows the results of changing the settings as I've just described.

What works best will depend on your game -- draw-distance, openness, and whether there are large shadow-casting objects far away.

Optimisation

Now! Let's talk about some optimisation. For now it's just very brief theory, but perhaps I'll come back and put it into practice for you. Here are some ideas:

Conclusion

Alright! So, CSM really isn't that hard. There are tons more things you can do if you're familiar with shaders. One of the most obvious ones is that one of the biggest advantages of shadow mapping over stencil shadows is taking advantage of alpha transparency. Currently, Depth.fx doesn't take the model's texture into account at all. You shouldn't have too hard a time setting it up to get the alpha value of the texture, and reject pixels based on their alpha value so that you can get nice shadows around plants and what-not. Some clues: AlphaFunc, AlphaRef, AlphaTestEnable, and ZWriteEnable.

Also, bias issues can be significantly reduced if the depth shader culls polygons the other way around -- look for CullMode. Of course, this has potential to cause other problems, but I'm not going into this right now. I'm tired.

This is my first tutorial. I'm sure a lot of it wasn't well-explained. The main purpose of it was really just to equip you with Cascaded Shadow Mapping and some understanding thereof. However, if something could really use some more explaining, find the thread on the GameStudio forums, or pm me if you really need to.