This does not make much sense to me. The on-chip memory is kind of cool - but the benchmarks do not support the exposition at the beginning of the paper. The only scene where on-chip deferred is significantly faster than off-chip is the last one, but this is due to an extra z-pass (I am not sure why it's necessary either).Can you comment on that?
The iOS devices from iPhone4s and on expose framebuffer fetch with FP16x4 render targets (64-bits/pixel). Easy to already do interesting things with on-chip storage. Moving to 128-bits/pixel is not much of a difference, especially with linear HDR as only FP16 has enough precision and range (the 32bpp float formats lack the mantissa bits required for linear, and the 32bpp integer lack the precision). Not all mobile devices now or in the future in the Android space would be able to support this functionality, so there is a portability issue here. However as long as accesses to on-chip storage are split into logical passes, devices without framebuffer fetch just have an extra pass to/from off-chip storage.If G-buffer creation pass does enough artist defined shader logic, then the bandwidth to store the G-buffer can be hidden in ALU work. If pushing a lot of geometry, the bandwidth to store the G-buffer can be partly hidden in geometry load.During lighting, especially with tiled, the bandwidth required to fetch the G-buffer can be hidden in shading ALU work. The more lights per pixel with traditional tiled deferred (G-buffer read once per pixel), the less the bandwidth of fetching the G-buffer matters (ALU dominates). Also traditional deferred z-pre-pass could either be a win or not in performance depending on chip and workload. No reason all the geometry needs to be drawn in the z-pre-pass. Z-pre-pass can also take simplified geometry in many cases.Resolution is a killer of performance. If mobile gets the ALU performance of Xbox360, without the bandwidth, this implies texture fetch rates will be lower too.Xbox360 games had a hard time hitting 720p. The way mobile games will hit 1600p tablets is by not doing any work per pixel. Everyone doing anything interesting in pixel shader won't be hitting native resolution on mobile.A traditional tiler is going to be spilling vertex output off chip. Things like lightmap UVs need to be FP32. Things like atlased texture UVs need to be FP32. There is a point at which vextex output going off-chip takes more bandwidth than the G-buffer. Another way to think about this is if the chip is a tiler, then it needs framebuffer fetch to offset an off-chip vertex output problem (an engine doesn't want both the G-buffer going off-chip and the vertex output going off-chip).