20140814

Abdul Bezrati: Real-time lighting via Light Linked List

Real-time lighting via Light Linked List

Seems like the basic idea is as follows,

Render G-buffer.
Get min/max depth for each 8x8 tile.
For 1/64 area (1600/8 by 900/8), raster lights with software depth test.
For each tile, build linked list of lights intersecting tile.
Linked lists of {half depthMin, half depthMax, uint8 lightIndex, uint24 nextStructureIndex}.
Keeping light bounds helps avoid shading when tile has large depth range.
Full screen pass shading for all lights intersecting a tile.

Would This be Faster?
Given the maximum of 256 lights, have a fixed 32-bytes per tile which is a bitmask of the lights intersecting the tile. Raster the lights via AtomicOr() with no return (no latency to hide), setting the bit in the bitmask. At deferred shading time per workgroup (workgroup shades a tile), first load the bitmask into LDS, then in groups of lights which fit in the LDS, do a ballot based scan of the remaining lights in the bitmask, load the active bit lights into the LDS, then switch to shading pixels with the light data, then repeat.

5 comments:

  1. I was confused by this talk. It's a lot like tiled deferred shading but requires you to store the light list data out to memory and reload it in a later pass, rather than keeping it in LDS at all times.

    I don't see why or under what circumstances the light linked list technique would be better than tiled deferred. Bezrati didn't show any performance comparisons to other rendering methods in his talk, either (maybe just due to time limits). What do you think are the advantages of LLL?

    ReplyDelete
  2. If anything the ability to easily mix forward and deferred shading is a win in flexibility for LLL compared to only deferred. LLL binning is probably faster then classic tiled deferred checking all lights per tile.

    ReplyDelete
  3. Hello,
    LLL allows a finer granularity and better control than tiled deferred lighting. Storing the accurate minimum and maximum depth per fragment allows for fast rejection of lights that overlap a pixel in 2D but not in 3D space.
    We use LLL extensively to light blended geometry as well as visual effects and also to implement as many custom BRDFs as we need.
    When our engine was running both deferred and LLL, LLL was faster and as the resolution increased the gap got bigger.
    I am putting together a demo for a chapter in GPU Pro 7 to be published at GDC next year and I hope to publish the binaries sooner than that.
    Regards,
    Abdul

    ReplyDelete
  4. Another performance thing that came up when I was developing LLL: I used to compute the light bounds on the CPU and upload it to the GPU as part of the full light environment but then I switched to storing the rasterized depths per fragment and saw a big performance boost.

    ReplyDelete
  5. Thanks for the extra info!

    ReplyDelete