RedBeard's Dev Blog

Archive for the ‘CubeFortress’ Category

Indie Production Process

Posted by redbeard on September 15, 2011

I’ve fielded a few questions about and discussed my production process recently, so I felt compelled to document it more permanently. I’ve developed this process after a number of projects both failed and completed, independent and professional.

Design

  • Paper Notebook: Before I start typing anything into a word processor, spreadsheet, or compiler IDE, I capture some concepts on paper, both high-level and low-level, related to design, art, technology, whatever. It’s easy to sketch out all sorts of random stuff on paper like UI, art concepts, gameplay or code logic flow charts, and of course just writing. I’m also fairly forgetful but can carry a paper pad around with me so I can capture whatever fleeting genius might strike me.
  • Design Doc: Once I have a good idea of the high-level concept and some details, I try to capture them in a more sensible and editable format; Word, OneNote, Google Docs are all good choices. The purpose of the design doc is to describe in detail all the ideal features and systems as they would be in a final product, but not necessarily in a technical manner. I don’t describe APIs or algorithms in this document unless they’re essential to the concept and are novel or original.The design doc is a living document and may change drastically over the course of development, so don’t get too mired in details when you’re still prototyping. Here’s my design doc for Cube Fortress.

Product Planning and Development

  • Production Spreadsheet: I use a spreadsheet to track all the production details of my project. This includes development tasks, art and other assets, bug tracking, publication submission details, free download codes, review links, sales numbers, and whatever else is related to the project. The largest chunk of manual work needed here is managing the task list, which is divided into milestones of “coherent functionality” such as a visual or technical prototype, basic gameplay, network multiplayer, final polish, title update, etc. Here’s my production spreadsheet for Cube Fortress.
  • Milestones: The idea of a milestone is to have something presentable and coherent at the end, without tons of unimplemented stub functionality or useless disconnected features. Milestones are composed of tasks, where each task is ideally 1-4 hours long so it can be completed in a single development session; examples of good tasks: cube rendering, player respawn, weapon firing, raycast collision detection, scoreboard; some tasks which would be too broad or nebulous: netcode, guns, HUD UI, Clone Angry Birds. I prefer to focus on and complete a single task at a time, since this gives a good tangible result and also allows for better bug tracking if you keep your code & assets in a source-control system like SVN.
  • Task Estimation: For scheduling production, it’s essential to capture good tasks, break them into reasonable sizes, and make good estimates of their time complexity. Time estimation is a skill which comes poorly to most programmers, because we tend to be optimistic about our ability to write perfect code quickly. In reality, new requirements will typically be discovered halfway through implementation and the result will have a few bugs which take significant time to track down and squash. Really, the best way to be good at estimating task times is to do it a bunch and learn through experience.

Posted in CubeFortress, XNA | Leave a Comment »

Multiplayer Milestone Complete

Posted by redbeard on September 14, 2011

I’ve completed my self-imposed “multiplayer milestone”, which means two things: the game has reached a state of coherent playability, and I can now start working on making the game deeper and broader. According to my task & time tracking spreadsheet, I’ve spent over 150 hours on this milestone (and 350 total on the project). My previous milestones were “experimental cube-world rendering” and “basic networked gameplay”. Here are some of the features and tasks which I completed for this one:

  • Winnable game mode “team treasure grab”
  • Explosive grenades and rocket launcher
  • Limited ammo magazines & weapon reloading
  • Teams: spawning, scoreboard, player balance,
  • Robust object state replication and RPC over reliable UDP
  • Dedicated host/client separation, which makes the code more sane and opens the option of dedicated servers
  • Client-side movement prediction (essential to avoid laggy player movement)
  • Network bandwidth profiling (invaluable for tracking down overactive network objects)
  • Player feedback sounds & visual effects for various actions, and incoming damage indicators
  • Various UI functionality like compass, health, and weapon info
  • BRDF rendering experimentation
The most time-consuming task was probably the client-side player movement prediction. I referenced a few different papers/articles which helped get the concept clear in my head:
My next milestone involves several main categories of work:
  • AI helpers, such as builder & tunneler bots, sentry turrets, and patrol bots
  • Weapon variety: machine gun, sniper rifle, pistol, etc
  • Support items: medic kit, ammo box, repair wrench, etc
  • Player class build or economic systems for selecting from the various weapons & support items
  • Enhanced rendering effects like ambient illumination, motion-detector vision, and other
  • Performance improvements, specifically targeted at Xbox 360 (I’d like to run at 60 fps @ 720p, at least on clients if not the host)
  • Polish & style improvements, it would be nice to start using more aesthetically appealing art & sound assets
If you’d like to take the milestone build for a spin, download here. You’ll also need to install the XNA 4 runtime if you haven’t already. Consider it alpha-quality code, it’s probably full of bugs… but if it doesn’t crash, hopefully you have some fun with it!

Posted in CubeFortress, XNA | Leave a Comment »

Progress and a Name

Posted by redbeard on August 19, 2011

I decided on a name for my game, considering a number of factors including domain-name availability. I ended up going with CubeFortress, which is descriptive and somewhat evocative of gameplay. The site is just a placeholder for now, but I don’t anticipate making frequent updates there, perhaps with the exception of a forum sub-domain. I don’t claim to be very creative with names, but I think this one is better than some of the alternatives like Voxel Fortress, Voxel Defense, Team Cube, FightCraft, Cubic Carnage, and others.

I’ve also made some progress in various areas since my last update. The game can now be won or lost by a team, and a new round will start after a short delay of scoreboard gloating; the current game mode bases score on the treasure quantity held by all players on a team. I’ve also added some small features like crouching, more sounds for events like flying rockets and jetpack use, a compass to locate your base and other things, time delays for weapon firing & reloading, incoming damage indicators, and of course I’ve added the rocket launcher and grenade as selectable weapons.

On the graphics side of things, I’ve added some particle system code and tweaked it to look nicer with the deferred lighting, added more light glows for treasure drops and other things, made some optimizations to point light rendering as mentioned previously, and did a bunch of experimentation with various BRDF models for lighting. I’ve currently settled on Oren-Nayar for a diffuse term and the standard Blinn-Phong specular term, with a bit of a hacked Fresnel blending term between them, and a somewhat hacky skybox lookup for global directional light. The surface variables in the G-Buffer now include diffuse color, diffuse/specular blend factor, roughness, and specular metal-ness (which decides the specular color contribution). The differences between lighting models can be subtle, but Oren-Nayar should aid the ability to make surfaces look “rough” under dynamic lighting.

I haven’t made many changes to network code lately, but I hope to make a couple of changes before exiting my current “milestone”. I plan to separate the host functionality so that the host-local client doesn’t share objects in memory with the actual host logic, which should make it more sane to add network code and also make a dedicated server more feasible. I also plan to implement better client player movement prediction so that slightly laggy but stable connections don’t cause position warping on remote clients.

Posted in CubeFortress, XNA | Leave a Comment »

Deferred Rendering II: Point Light Optimization in XNA

Posted by redbeard on July 25, 2011

Implementing deferred rendering in XNA 4.0 is somewhat more challenging than it needs to be. The API in 4.0 has been “streamlined” mainly for use on Windows Phone, at the expense of certain capabilities.

One major constraint is the explicit linking of depth-stencil buffers with render-targets; it is impossible to create an independent DS buffer, they are always associated with a single RT. This seems reasonable for forward rendering, where you typically draw to only a single render-target with a strongly linked depth buffer, even if you render out some intermediate targets for shadow maps or environment maps before the main render (they all get their own depth buffers too, a potential waste of video memory). In my deferred renderer, however, it means that I can no longer use the depth-buffer after un-binding the GBuffer render-target, because whichever new target I bind has its own depth-buffer which is blank or otherwise irrelevant. This wasn’t a major issue until I threw a bunch of point lights at my renderer and saw a pretty considerable amount of overdraw, and I know that there are optimizations for deferred light volumes to be made using z-buffer and stencil operations (see below).

I considered two workarounds for the lack of depth buffer during my light pass: reconstitute the z-buffer from the GBuffer depth texture, or copy the GBuffer main target and alias it as the light accumulation buffer. Both of those effectively equate to a full-screen shader pass reading a texture, because XNA doesn’t expose any fast blit operation. The latter option wasn’t really an option because I use a different texture format for my light accumulation buffer (allowing for HDR values), but it could be considered if necessary. I gnashed my teeth for a while until I realized that my directional/ambient light pass was already doing a full-screen shader evaluation including a depth-texture read, and so reconstituting the z-buffer using that shader was practically free; I needed only to enable depth writes and to write out the depth value from the pixel shader. Point lights and other light volumes can then be rendered afterwards and use the correct z-buffer values.

Light volume optimizations

  • One optimization has very little downside: you render the light volume front-faces whenever they are closer than the scene z-buffer value (assuming the camera view is outside the light volume). Trivial depth rejection should prevent the shader from evaluating for any light volume pixels which are completely hidden behind geometry. This optimization is of my own devising, but it’s probably documented somewhere already.
  • The second optimization is a trade-off between draw calls, z-buffer fill rate, and pixel shader fill rate: render the light volume back-faces with color writes disabled while setting the stencil buffer to a reference value on z-fail, then render the light front-faces as in the first optimization except also testing for the stencil reference value; this avoids rendering any pixels for which the scene geometry was not inside the light volume. The trade-off is that you must render the light volume twice: once to populate the stencil buffer, and again to evaluate the light shader. I learned of this optimization from the nVidia deferred rendering presentation from their “6800 Leagues” event.
  • There is also a secondary trade-off for the stencil optimization. I have the capability to draw all my lights as instances with a single draw call, but since only a single stencil reference value can be set on the device per draw, this produces some unwanted overdraw in places where distant and near light volumes overlap in screen space. The result is that you can make 2 draw calls per light with a unique stencil reference value and get perfect pixel shader fill rate, or draw all lights in only 2 draw calls and waste some shader fill rate; there may be some mitigation involving stencil increment instead of replace, but I haven’t thought too hard about it.
  • For now I’m opting with the additional draw calls and reduced shader fill rate, since my total number of draw calls per frame is relatively low (under 100 for geometry, perhaps another 50-100 for text and other HUD elements via SpriteBatch); another 50-100 draw calls for all the lights in a scene shouldn’t kill the runtime.
  • I’m currently using a cube as my point light volume, although a sphere would be an even tighter fit and would reduce pixel shader fill rate even more. I’ll probably implement this soon, but must be careful to ensure that the geometry approximating the sphere has its faces tangent to or farther than the light maximum radius (not the vertices) or else pixels may drop out around the “silhouette” of the light volume.
  • Another gotcha in the XNA API: there is a stencil reference property both in the DS state object and on the device itself. Setting a DS state will set the device property, so if you want dynamic values for stencil reference without creating hundreds of very-similar DS state objects, just modify the device stencil reference property after setting the DS state.
Results

Posted in CubeFortress, XNA | Leave a Comment »

CubeWars Is Networked

Posted by redbeard on July 11, 2011

I had my first successful multi-player network test last night, with a few players other than myself connecting over the internet to my development machine. CubeWars is of course a working title, I’d prefer a final title which evokes more than just cubes and fighting.

We uncovered a few bugs, most of which I hadn’t noticed before, and which I’ve now fixed:

  • Players who quit left a ghost behind visible on other clients but not on the host
  • Backspace doesn’t work on chat input when you max out the input buffer
  • You can build cubes inside yourself and other players
  • The game crashes if you minimize and restore it

Overall progress has been good, and I’ve implemented a number of features since my previous post:

  • Networking system for object creation & synchronization
  • Multiplayer chat, including broadcast join & quit notices, and primitive command handling like /name <Dude>
  • Fast ray-casting through the cube grid using 3D DDA
  • Custom Win32 input handling to avoid dropped mouse clicks and text input
  • Console window for logging & debugging
  • Shooting and knifing other players (to death, complete with broadcast kill notices)
  • Visual feedback quad for cube editing
  • FXAA anti-aliasing
  • 3D sound playback, including distance loudness adjustment, left/right panning, and doppler shift

Some work still remains before I’m happy to call my “milestone 2” complete:

  • Reliable UDP integration via Lidgren library
  • Weapon shooting FX (other than sound) and player hit FX + sound
  • Edge enhancement effect on discontinuous depth images
  • Improved main menu with text entry for player name, world seed, etc

Obligatory screenshot

Posted in CubeFortress, XNA | Leave a Comment »

Networking Cubes from the Ground Up

Posted by redbeard on June 27, 2011

Due to my recent design changes towards networked multi-player, it is obviously necessary to write a little networking code. I wanted to learn as much as possible and exert total control over my game design, so I dove in head first from the ground up, using only what’s provided in the .NET API. I decided to use a standard TCP connection for prototyping, so as to avoid dealing with packet loss and connection instability at first. Although I aced my college intro-to-networking class and have written some odd network code here and there since then, I still encountered a number of issues which needed a bit of thought and experimentation to resolve.

I also studied a number of other proven FPS game networking architectures: Quake, Tribes, Halo, Unreal. Quake3 appears to be unique in that it tries to transmit the entire world state to clients in an atomic unit, while the others focus on updating various objects in the world asynchronously, which allows them to prioritize important data for high-frequency updates and de-prioritize unimportant data to save bandwidth. They all vary on data reliability promises, but all of them have strong concessions for unreliable data delivery and a few require some reliable data. They all share a common client/server architecture, where an authoritative server sends out the “official” world state and processes requests from clients but is free to ignore them if deemed invalid. This design allows for a central focus of bandwidth for high player counts, versus decentralized peer-to-peer where all peers require reasonably high bandwidth, and also allows for an authoritative game state to keep things in sync for fast-paced gameplay, without needing to consult all other peers to see if the current state is OK.

My preliminary design has ended up somewhere between the architectures of Halo and Unreal:
  • Authoritative host, to allow fast processing, numerous connected clients, and dedicated servers
  • Classification of network message types into reliability guarantees
    • Unreliable:
      • Object state replication
        • Full update sends all data fields in the object, including type info which allows the client to spawn a local copy
        • Delta update taken on a per-variable basis from previous send, not last client acknowledgement
        • Client will request a full update if it sees a delta update for an unknown object
      • Effects, one-time events which are non-vital to a client (ie explosions)
      • Client Requests, one-time requests from the client which are time-sensitive but not vital (ie fire weapon, pick up item)
    • Reliable:
      • Host Events, one-time events which are vital to a client (ie player spawn/death, object deletion, etc)
      • Client Commands, one-time requests from the client which are not time-sensitive but shouldn’t be repeated (ie respawn player)
  • Object relevance:
    • Binary decision evaluated on a per-object, per-player basis.
    • Accelerated using spatial partitioning to find “nearby” objects for each player
    • Freshly relevant objects must be replicated in full, previously-relevant objects can replicate only deltas in state
    • Object prioritization:
      • Avoid sending updates for all relevant objects in every packet (they wouldn’t fit anyway)
      • Important objects get updated frequently, unimportant ones are updates infrequently but eventually
      • Importance evaluated

Some of the issues I encountered and spent time on:

  • Connection handshake: connecting to a socket makes no guarantee that the application on the other side is the correct one. My handshake sends an assembly build number to ensure that the client and server are running the same code. Clients are disconnected if they fail the handshake or cause trouble at any point.
  • Connection ID: keeping track of who owns what and where to send prioritized information, a connection ID is generated for each new connection. This can be used to tag objects with the owning connection, so when that connection is dropped the appropriate items can be destroyed or otherwise handled.
  • Object creation: To avoid centralizing all my code into giant switch statements, I make use of .NET Reflection to label types with IDs and register factory methods for each of them. This requires that the assemblies on the client and server match so that the reflection results correspond. It also allows for potential modularity for future expansion via add-on assemblies. This same mechanism is used for arbitrary client requests and host events (a player move request is implemented as a class with an static methods EncodeEvent and DecodeEvent).
  • Initial synchronization: A singular GameInfo object is created to centralize game logic, and a client is not “ready” until it has synched the GameInfo for the first time. Clients can then spawn PlayerController objects, which in turn can spawn Player objects. A PlayerController is used for local control and client input and its state is replicated only to the owning connection, while Player state is representative of the player in the world and is replicated to all clients. Objects are always spawned on the host and the client waits for them to synchronize before creating a local copy. The procedural level is not generated until the random seed is obtained via the GameInfo, and subsequent world modifications need to be synched before the client can present the level to the user; this could take some time on a map with heavy amounts of modification.
  • Object lifetime: When a connection on the server is dropped due to error or a player quitting, the relevant player objects need to be cleaned up. The connection has a list of associated PlayerController objects which have been created by that client, which provides a starting point for destroying objects on the host. Object destruction is reliably communicated to clients, so they don’t end up with ghosts floating all around.
  • Programmer error in serialization due to copy/paste or incomplete refactoring. I refactored my serialization utility functions to operate in both read and write mode, so the same code will write and read the same data in the same order.
  • Host-client internal transfer: My design has the host also acting as a client, so the two aspects can share objects rather than duplicating all objects on the host machine. It was originally connecting back to itself via a socket, but that was causing issues with initialization. I added functionality to transfer packets internally via memory instead of sockets and things got much better. This also means single-player mode can use the same code paths as multi-player without needing to actually connect with any sockets.

Upcoming problems which I anticipate consuming significant time:

  • Moving from TCP to reliable UDP, which is an essential change to provide a high quality experience, and also because this is supported by the XNA networking system on Xbox. I plan to use the Lidgren library on Windows.
  • Adding client prediction to avoid jerky motion of players and other objects.
  • Regulating the bandwidth used by each connection, and the rate at which network data is generated (currently a full update 1 or 2 times per frame at 60 fps, way too much).

Resources:

Posted in CubeFortress, XNA | 2 Comments »

CubeWorld is now CubeWars

Posted by redbeard on June 22, 2011

After contemplating a number of factors, I’m taking my cuboid-world game design in a different direction.

Old: single-player, exploration-based gameplay, infinite 3D world.

New: online team-based multi-player, destruction & construction-based gameplay, finite world.

  • Performance wasn’t good enough on Xbox for generating the procedural world geometry dynamically, but a finite world can be generated at load time, even if it’s still procedural. Rendering performance should be fine.
  • Fortresscraft already has a large number of sales and is currently focused on building stuff and sharing with friends. Minecraft has been announced for release later this year on Xbox, which makes that space even more crowded. By focusing on team-based PvP gameplay, I’m taking things in a different direction which should garner more attention (and hopefully sales).
  • The overall design for the free-roaming exploration and infinite world wasn’t really jiving for me, and I was facing some concerns about data persistence and other technical issues.
  • Network programming is something new & exciting which I’m interested in exploring as a personal growth exercise.

Feel free to read my new game design document.  Let me know if you have reasonable suggestions!

Posted in CubeFortress, XNA | Leave a Comment »

JIT Optimizations on Xbox

Posted by redbeard on May 27, 2011

I came to a sickening realization last night: the JIT optimization performed by the .NET compact framework on Xbox is pitifully weak. Some of the most basic optimizations I’ve come to expect from a decade of C++ programming and “trust the compiler” advice are not present. This realization has the potential to kill my CubeWorld project, if it prevents me from reaching my perf goal on Xbox. Read the rest of this entry »

Posted in CubeFortress, XNA | 1 Comment »

Static Ambient Occlusion in CubeWorld

Posted by redbeard on May 26, 2011

I was pleased with the visual impact of screen-space ambient occlusion (SSAO) in my deferred shading system, but I felt that there were two primary artifacts that were too big to ignore and which mean that SSAO is inappropriate for this project: 1) screen-space noise was quite noticeable despite the psuedo-gaussian blurring step, and 2) the ambient occlusion disappears at the edges of the screen when the occluding surface moves off-screen. I wanted ambient occlusion which was more stable and perhaps a bit less expensive to render, after all I’m just rendering a bunch of cubes!

For CubeWorld, static ambient occlusion appears to improve on the flaws of SSAO, without too many drawbacks. For non-cubular geometry the ambient occlusion calculation can become exceedingly expensive, which is why SSAO was invented, but the mostly-static cubes allow for a relatively discrete approximation. For my implementation, it effectively samples the ambient occlusion term at each vertex and allows interpolation on the GPU to smooth things out. For a visible face vertex, there’s a possibility of 0-3 adjacent cube volumes, so I give an AO term which ranges from 0-1 in 0.33 increments. Results look good (although this particular screenshot is a bit dark because I toned down the ambient light and the camera-position point-light).

Problems

Sticking point 1: I compute all my world geometry procedurally on worker threads in 32^3 chunks of unit cubes. At the boundary between chunks, there was no guarantee that any data was present in order to compute the ambient occlusion neighborhood properly, so visual seams were visible on continuous surfaces. To solve this issue, I separate my world generation into two phases – cube data and vertex buffers – and reduced the visible area without adjusting the data area so that a margin of 1 generated chunk existed around the boundaries of the visible chunk grid. This has the detrimental impact of either reducing my visual range or increasing my computational cost to keep the same visual range, because I now need an extra margin around everything. It would perhaps have been cleaner if I could generate a 1-cube margin around each chunk, but most of my chunk generation code is discontinuous and relies on a random number generator seeded based on the chunk ID, rather than the ID of the unit cubes. As a side effect of improving the cube neighborhood to look across chunk boundaries, my hidden face removal is now more aggressive and perf is slightly better since the game has less geometry to render.

Sticking point 2: When modifying the cubescape (adding or removing individual cubes), I was only updating the chunk in which the cube resides, but it now has potential to impact adjacent chunks also, both for ambient occlusion and hidden face removal. Easily fixed by updating adjacent chunks whenever the modified cube is on the surface of the chunk anywhere. I thought about adding an optimization to inspect whether the neighboring chunk would actually see any change, but haven’t bothered with that yet.

Vaguely related problems

  • I use the VPOS pixel-shader semantic with my deferred shaders to generate the texture coordinate used for looking up the corresponding texel for the currently rendering pixel. On the Xbox, VPOS behaves strangely if predicated tiling gets enabled, which is typically because you want a fat rendertarget due to MSAA, GBuffer, high resolution, etc. I imagine that the viewport transform or whatever feeds the VPOS semantic isn’t set quite right. I worked around this by disabling MSAA on my final rendertarget (since the underlying GBuffer textures are lower-resolution with no MSAA).
  • I was experimenting with occlusion culling to optimize my rendering in dense environments, but it appears to be essentially incompatible with deferred shading in XNA 4.0, for one flawed reason: you cannot disable color writes when a floating-point rendertarget is bound.
    • Attempting to do so produces an exception with the text “XNA Framework HiDef profile does not support alpha blending or ColorWriteChannels when using rendertarget format Single”; I assume that this is an oversight in the XNA API because I’m unaware of a reason why floating-point rendertargets cannot support disabled color writes, even in MRT situations. Here are the relevant device caps for various hardware: COLORWRITEENABLEINDEPENDENTWRITEMASKS; since XNA 4.0 requires a D3D10-capable video card, and those caps are enabled on 100% of D3D10 hardware sampled by that site.
    • My workaround for this flaw involves un-binding the offending rendertarget before issuing my occlusion queries and re-binding it afterwards, when I want to render real geometry again; this causes rendertarget toggling several times per frame, which is not ideal.
    • However, my workaround doesn’t seem to work on Xbox; the rendertarget contents preservation flag appears to be broken, so my GBuffer gets filled with bad data.

Posted in CubeFortress, XNA | Leave a Comment »

Deferred Shading in CubeWorld

Posted by redbeard on May 16, 2011

For my CubeWorld prototype, I wanted to try some screen-space effects like SSAO (screen-space ambient occlusion) and also compare the performance of deferred lighting versus standard forward-rendering lights; I’m also interested in just implementing a deferred renderer as I haven’t experimented with the concept before.

I found some good foundation code & explanation in the articles at http://www.catalinzima.com/tutorials/deferred-rendering-in-xna/, which got me started with some directional and point light functionality. I made a few modifications here and there such as combining multiple directional lights into a single pass and taking some liberties with the C# and shader code; I also used a procedural cube instead of a sphere mesh for my point light. I also found some good intro material in the NVidia deferred rendering presentation from “6800 Leagues Under the Sea”: http://developer.nvidia.com/presentations-6800-leagues-under-sea, which includes a few optimizations which can help (if you’re not using XNA, I’ll get to that). The performance of the deferred lighting is quite good on my PC, although I haven’t tried it extensively on the Xbox.

After seeing the deferred shading in action, I wanted to make even more use of the G-Buffer for effects that can make use of it, and one of the primary effects I’m interested in is SSAO, because the cube world looks rather artificial with all the faces shaded relatively flatly. I implemented the SSAO shader described in a gamedev.net article, which provides dense and somewhat unintuitive code, but it works and the rest of the article explains the concepts used. The article offered little guidance for tweaking the 4 input parameters such as “bias” and “scale”, but I found some numbers which appeared to work, and named them more intuitively for my internal API. I’m currently using only a 9×9 separated blur rather than the 15×15 suggested in the article. The effect works, but the screen-space random field is plain to see, and it seems to be more pronounced on distant geometry; I can probably do some more work to try and resolve those artifacts. A much more distracting artifact is the total loss of ambient occlusion at the edges of the screen in certain condition, I’m not sure if there’s a reasonable solution for that. I may try some static AO calculations for each cube face to see if I can get stable results that way.

The overall flow of my deferred renderer, currently (1 or more “passes” per step below):

  1. Render all scene geometry into G-Buffer
  2. Generate noisy SSAO buffer at half-resolution
  3. Blur SSAO buffer horizontally and vertically at full-resolution
  4. Accumulate directional and point lights, one per pass
  5. Combine lighting, albedo, and SSAO into final image

Some issues I ran into when implementing my deferred shading in XNA:

  • XNA does not allow you to separate the depth-buffer from a render-target, which means you cannot use the stencil optimization for light volumes as discussed in the NVidia “6800 Leagues” presentation. The optimization allows you to only light-shade the pixels which are within the light volume, rather than all the ones that may be covered by it but are too distance to be affected. This requires that you retain the depth buffer from the geometry pass and use it to depth-test and store stencil values for light geometry, and then use those stencil values for a different render-target, specifically the light accumulation buffer.
  • Xbox 360 has 10MB of framebuffer memory linked to the GPU, which works great if you’re rendering a single 1280×720 render-target and depth-buffer at 4 bytes each (about 7MB). When you want 3 rendertargets and a depth-buffer, you can either “tile” the framebuffer and re-draw the geometry multiple times, or you can drop the resolution until all the buffers fit; I opted for the latter option, using 1024×576 (for 16:9). XNA doesn’t expose the ability to resolve the depth-buffer to a texture, which means you must include your own depth render-target in your G-Buffer, or else that target resolution could be increased. On PC, the memory limitation is lifted, but you still can’t read back depth via D3D9 so the extra buffer still applies.
  • I can see visible banding on my point lights, I’m not sure if this is due to banding in the light buffer itself or the final compositing. XNA 4.0 exposes the HdrBlendable format, which on Xbox uses a 10-bit floating-point value per component, but with only 7 bits of mantissa I’m not convinced it offers any reduced banding from 8-bit fixed-point components, just a different pattern.

Screenshots of my results:

  • Directional and point lights: screenshot (debug display shows albedo, depth, normals, and lighting)
  • SSAO random samples before blurring: screenshot (slightly more noisy than it should be, due to non-normalized random vectors)
  • SSAO after blurring: screenshot
  • Comparison images from before deferred shading was implemented: shot 1, shot 2

Other resources I came across while implementing these things:

Posted in CubeFortress, XNA | Leave a Comment »