20140125

VR Presence and Tflops/s

Mega Pixels Per Second
Working roughly from Abrash's Talk on VR Presence, starting with some possible single panel configurations: either 1920x1080 or 2560x1440, and then either 95 Hz (suggested minimum for low persistence) or 120 Hz,

197 Mpix/s : 960x1080/eye @ 95 Hz
249 Mpix/s : 960x1080/eye @ 120 Hz
350 Mpix/s : 1280x1440/eye @ 95 Hz
442 Mpix/s : 1280x1440/eye @ 120 Hz

Compared to standard HDTV rendering,

_62 Mpix/s : 1080p @ 30 Hz
124 Mpix/s : 1080p @ 60 Hz

The Ultra High End
A quality/pixel on par with something like a 1.8 Tflop PS4 at 1080p @ 30Hz requires 29 Mflops/Mpix/s. Expanded out to the above table,

_5.7 Tflops @ 197 Mpix/s : 960x1080/eye @ 95 Hz
_7.2 Tflops @ 249 Mpix/s : 960x1080/eye @ 120 Hz
10.2 Tflops @ 350 Mpix/s : 1280x1440/eye @ 95 Hz
12.8 Tflops @ 442 Mpix/s : 1280x1440/eye @ 120 Hz

Now looking at possible GPU configurations,

_5.0 Tflops : GTX 780ti
_5.6 Tflops : R9 290X
10.0 Tflops : GTX 780ti x2
11.3 Tflops : R9 290X x2

Given a target visual quality/pixel a little under what is expected on a PS4 at 1080p @ 30Hz, if the Oculus consumer version ships with a single 1080p panel, both current fastest single GPU/board from NVIDIA and AMD would just barely be enough performance for 95Hz. If Oculus ships with a 1440p panel then two of those GPUs would be needed.

Note: there are some factors I'm not taking into consideration. Guessing by the time Oculus ships, there will be faster GPUs shipping. Second, native resolution rendering of the 110 deg field of view with an image warp results in under-sampling in the center of view. So the usual triangle rendering to a rectangular view would actually need some amount of super-sampling in order to maintain the target pixel quality. Likewise one could mask out (via depth=near plane) pixels which would post-warp be off screen. Ray-tracing methods on the other hand could adaptively shoot rays to maintain a constant post-warp pixel density. Ray-tracing also has the advantage in ability to use stratified sampling, which would provide a large boost to pixel quality given a proper resolve filter (resolve into the final warped space).

One Step Down
Now looking at estimated Tflop requirements for pixel quality possible at 60Hz on PS4,

2.9 Tflops @ 197 Mpix/s : 960x1080/eye @ 95 Hz
3.6 Tflops @ 249 Mpix/s : 960x1080/eye @ 120 Hz
5.1 Tflops @ 350 Mpix/s : 1280x1440/eye @ 95 Hz
6.4 Tflops @ 442 Mpix/s : 1280x1440/eye @ 120 Hz

Now looking at possible single GPU configurations (split by product cycle and vendor, listing non-boost perf),

2.8 Tflops : HD 7870 XT
2.9 Tflops : HD 7950
3.1 Tflops : HD 7950 Boost
3.8 Tflops : HD 7970
4.1 Tflops : HD 7970 GHz

3.1 Tflops : HD 8950
4.1 Tflops : HD 8970

3.1 Tflops : R9 280
4.0 Tflops : R9 280X
4.8 Tflops : R9 290
5.6 Tflops : R9 290X

3.1 Tflops : GTX 680

3.2 Tflops : GTX 770
3.9 Tflops : GTX 780
4.5 Tflops : GTX Titan
5.0 Tflops : GTX 780ti

I look at this data as a huge opportunity for the independent developer. Likely large budget developers are forced to target the masses, building a one-off VR experience at the quality/pixel that requires a high-end PC GPU is probably not palatable. My guess is that the PS3/360 visual level or ultra-high-end mobile visual level is the prime target for the big guns for VR. This leaves a market hole for the experience only possible on the high-end PC. A market which might only sustain a small team at launch. Also judging by the GPU list, probably fine to just target OpenGL 4.4 as the API base.

SLI
I've never owned an SLI setup. From a developer perspective, I've found that exploiting temporal coherence is good for better than a 2x quality improvement. SLI doesn't really scale well in that realm.

However for stereo VR, it should be possible to almost hit linear scaling with the exception of PCIe bandwidth limits and the need to re-combine the right and left eye if Oculus ships with a single display input. This is a really good reason for a dual GPU setup.

If anything this should be a red flag with the GPU vendors, as DX and GL are lacking IMO in what is required from the API perspective as driver managed SLI is high latency with not good quality of service guarantees.

For VR at a minimum, developers need explicit control over both GPUs and probably an exposed peer to peer GPU blit (to copy the right eye to the GPU driving the display) or if faster, an exposed ability to render to the other GPU's framebuffer (for better latency assuming the final rendering pass isn't ROP bound when writing across PCIe). Probably also want ability for direct front buffer rendering and enough control on scheduling to race the scan-out for better latency reduction.

5 comments:

  1. Very interesting. The increase in requirements versus console could be even worse, if you're leaving more headroom to eliminate frame-drops while v-synced, compressing your render pipeline as in Carmack's 'Latency Mitigation Strategies' post, supplying the much nicer AA that everyone will want at such low resolutions, coping with OS/driver overhead, adding stereoscopy and barrel distortion... I did some rough sums and came up with 13x as the requirement to match the perceived quality of a mostly-30Hz PS4 game on a 1080p PC headset. Hopefully that's pessimistic.

    ReplyDelete
  2. In OpenGL case seems there is support for explicit control of multi GPUs and fast blits but it's not standarized.. In AMD case we have AMD_gpu_association with provides already fast blits between GPU also
    "To provide an accelerated path for blitting data from one context
    to another, the new blit function BlitContextFramebufferAMD has
    been added."
    In Nvidia case we have WGL_NV_gpu_affinity for explicit control of multiGPUs and for fast blits we also have NV_copy_image extension:
    "The
    WGL and GLX versions allow copying between images in different
    contexts, even if those contexts are in different sharelists or
    even on different physical devices."
    Shame is NVIDIA support is only for Quadro cards so seems someone should request them to enable on Geforces WGL_NV_gpu_affinity..
    Don't know how these extensions interact with enabled SLI or Crossfire setups i.e. if use of the functionality provided by the exts is posible..
    ideal case would be that using this extensions SLI and Crossfire setups are disabled..
    Also don't know if possible to access buffer objects of one GPU from another GPU in P2P fashion either using OpenCL cl_amd_bus_addressable_memory extension with OpenGL interop in AMD case and CUDA P2P functions jointly with CUDA-OGL interop on Nvidia systems..
    Note also this extensions should readily avaiable on Linux so for SteamOS also..

    ReplyDelete
    Replies
    1. Thanks. I didn't know about the AMD extension.

      Has anyone confirmed that both vendors are doing direct peer to peer copies (and not peer to host to peer)?

      Sounds like the major next steps are: NVIDIA enabling extensions on Geforce, and both vendors enabling a "render to another GPU's framebuffer". Even if the GPU is ROP bound on write through PCIe when rendering to another GPUs framebuffer, it might make sense from the latency reduction angle. On any GPU with async compute, it should in theory always make sense to write out to the peer framebuffer as some other work could fill the open ALU cycles.

      Delete
  3. Would be nice if Nvidia enables this ext on Geforce but seeing how NV enabled OGL QB stereo (i.e. 3D Vision for OGL more or less) for Doom 3 BFG only being OGL QB stereo also a Quadro feature perhaps NV would enable if some project used it on a case by case base..
    Please take this as a joke but would be nice: can't you "recommend" to NV enabling it as are you working at Epic escalate up to Tim Sweeney->recommends to Jen-Hsun Huang-> recommends to OGL NV engineers..
    Also we forgot to say Mantle also allows explicit control of GPUs..
    At least I remember seeing a OpenGL session in GTC 2009 or 2010 saying NV_copy_image was using direct DMA transfers between GPUs so seems yes (as said Quadro only)..
    Would be nice if we get this year we can get some ARB multivendor extension based on AMD_gpu_association->ARB_gpu_association..

    ReplyDelete
    Replies
    1. I'd bet that people at Oculus are already escalating all the issues up to the top people at all the IHVs.

      Delete