Implementing deferred rendering in XNA 4.0 is somewhat more challenging than it needs to be. The API in 4.0 has been “streamlined” mainly for use on Windows Phone, at the expense of certain capabilities.
One major constraint is the explicit linking of depth-stencil buffers with render-targets; it is impossible to create an independent DS buffer, they are always associated with a single RT. This seems reasonable for forward rendering, where you typically draw to only a single render-target with a strongly linked depth buffer, even if you render out some intermediate targets for shadow maps or environment maps before the main render (they all get their own depth buffers too, a potential waste of video memory). In my deferred renderer, however, it means that I can no longer use the depth-buffer after un-binding the GBuffer render-target, because whichever new target I bind has its own depth-buffer which is blank or otherwise irrelevant. This wasn’t a major issue until I threw a bunch of point lights at my renderer and saw a pretty considerable amount of overdraw, and I know that there are optimizations for deferred light volumes to be made using z-buffer and stencil operations (see below).
I considered two workarounds for the lack of depth buffer during my light pass: reconstitute the z-buffer from the GBuffer depth texture, or copy the GBuffer main target and alias it as the light accumulation buffer. Both of those effectively equate to a full-screen shader pass reading a texture, because XNA doesn’t expose any fast blit operation. The latter option wasn’t really an option because I use a different texture format for my light accumulation buffer (allowing for HDR values), but it could be considered if necessary. I gnashed my teeth for a while until I realized that my directional/ambient light pass was already doing a full-screen shader evaluation including a depth-texture read, and so reconstituting the z-buffer using that shader was practically free; I needed only to enable depth writes and to write out the depth value from the pixel shader. Point lights and other light volumes can then be rendered afterwards and use the correct z-buffer values.
Light volume optimizations
- One optimization has very little downside: you render the light volume front-faces whenever they are closer than the scene z-buffer value (assuming the camera view is outside the light volume). Trivial depth rejection should prevent the shader from evaluating for any light volume pixels which are completely hidden behind geometry. This optimization is of my own devising, but it’s probably documented somewhere already.
- The second optimization is a trade-off between draw calls, z-buffer fill rate, and pixel shader fill rate: render the light volume back-faces with color writes disabled while setting the stencil buffer to a reference value on z-fail, then render the light front-faces as in the first optimization except also testing for the stencil reference value; this avoids rendering any pixels for which the scene geometry was not inside the light volume. The trade-off is that you must render the light volume twice: once to populate the stencil buffer, and again to evaluate the light shader. I learned of this optimization from the nVidia deferred rendering presentation from their “6800 Leagues” event.
- There is also a secondary trade-off for the stencil optimization. I have the capability to draw all my lights as instances with a single draw call, but since only a single stencil reference value can be set on the device per draw, this produces some unwanted overdraw in places where distant and near light volumes overlap in screen space. The result is that you can make 2 draw calls per light with a unique stencil reference value and get perfect pixel shader fill rate, or draw all lights in only 2 draw calls and waste some shader fill rate; there may be some mitigation involving stencil increment instead of replace, but I haven’t thought too hard about it.
- For now I’m opting with the additional draw calls and reduced shader fill rate, since my total number of draw calls per frame is relatively low (under 100 for geometry, perhaps another 50-100 for text and other HUD elements via SpriteBatch); another 50-100 draw calls for all the lights in a scene shouldn’t kill the runtime.
- I’m currently using a cube as my point light volume, although a sphere would be an even tighter fit and would reduce pixel shader fill rate even more. I’ll probably implement this soon, but must be careful to ensure that the geometry approximating the sphere has its faces tangent to or farther than the light maximum radius (not the vertices) or else pixels may drop out around the “silhouette” of the light volume.
- Another gotcha in the XNA API: there is a stencil reference property both in the DS state object and on the device itself. Setting a DS state will set the device property, so if you want dynamic values for stencil reference without creating hundreds of very-similar DS state objects, just modify the device stencil reference property after setting the DS state.