Foveation in Game Engines
In this section we outline what implementing foveated rendering for your game engine implies, and how you can approach the task.
Table of Contents
Maximizing Foveation Utility
Generally speaking, foveated rendering will have maximum utility with the least implementation overhead in these scenarios:
- Forward rendering.
- GPU bound, in particular when limited by fragment / pixel shading due to resolution.
- Access to hardware extensions such as VRS or Qualcomm foveation.
The vast majority of XR titles use forward rendering which makes them ideal candidates for foveation.
For titles using deferred rendering, options are generally limited to remapping and shrinking the G-Buffer to reduce the prominence of areas in the eye periphery. This can introduce some overhead in terms of warping the G-Buffer, filtering the output and may require some shaders to be modified to be aware of the warping. Despite this overhead, significant net processing cost savings to the shading passes are possible.
As mentioned in Benefits and Costs, the greatest opportunities for savings apply to the main rasterization and lighting stages, whereas savings in post-processing tend to be limited and require different approaches, which can mean having to support multiple techniques.
Savings are heavily dependent on the specific use case and technique, and apart from some niche cases, CPU bottlenecks will not be alleviated by foveated rendering. VRS-based foveation has very low over-heads and should produce savings in almost all cases. All the savings from VRS foveation apply to shader processing during rasterization.
For foveation of deferred renderers, G-Buffer remapping techniques add additional warp, unwarp and filtering passes which come with fixed processing costs that must be recovered before savings are made. Despite these processing overheads, net savings from the shading of complex scenes can be significant.
Multi-Resolution Shading (MRS) increases vertex processing load which can shift GPU bottlenecks from pixel processing to geometry processing. Savings are possible but in pathological cases there can be a reduction in overall performance. Without costly filtering, MRS exhibits significant artifacts.
Multi-Resolution buffer (composited buffers / “floating window”) approaches can increase both CPU load and vertex load and require re-composition and filtering passes. Fixed overheads are high and new bottlenecks can occur.
The utility of power savings is, to a large degree, dominated by GPU utilization.
For XR titles, we can assume that V-Sync will be enabled. Assuming that an XR title is rendering at its expected target frame rate, the overall GPU utilization should decrease, which in turn may lead to power savings by decreasing GPU frequency.
For non-XR titles with V-Sync disabled, GPU utilization is limited only by the rate that the CPU is feeding commands, and the impact of foveated rendering on power consumption will be dependent on the specific operating context.
There might be some surprising non-linear effects when measuring the effects of power consumption.
For example, when a title was previously rendering at below the expected target frame rate and is using V-Sync, then GPU utilization may be as low as 50%, as frames get reprojected to the target frequency.
If the implementation of foveation then allows the title to then operate at or above the display frame rate, then GPU utilization can jump back to close to 100%, as the title is now rendering twice as many frames, resulting in a higher measured power consumption, but also of course, a much smoother experience.
Utilizing Foveation Hardware
At the time of writing there are two hardware features which will provide tangible benefits at relatively low implementation cost: NVIDIA’s VRS, and Qualcomm’s Adreno Foveation.
These techniques are easy to justify implementing for, as engineering costs and processing overheads are low, and processing savings are almost guaranteed.
NVIDIA Variable Rate Shading (VRS)
To understand some of the terms used in this section, it is recommended that you read the brief overview of VRS and rendering techniques which can be found on the Essential Concepts page.
For forward rendering, VRS can reduce the processing cost of expensive lighting shaders by reducing the number of times that the shader is run, but still maintaining the overall pixel resolution and edge sharpness. Implementation costs and processing overheads for using VRS with forward renderers are very low.
As VRS works on many of the same principles and hardware as MSAA and most deferred renderers are incompatible with MSAA (due to the filtered or duplicated shader output), these renderers are also incompatible with VRS. The incompatibilities result from a combination of the filtered and duplicated shader output during rasterization and the absence of geometry information during the main shading passes. The relatively lightweight shaders used during deferred rendering G-Buffer construction allow for very limited savings, while at the same time potentially introducing serious artifacts to the remainder of the deferred rendering pipeline. As a result, VRS is generally only applicable to forward and forward+ rendering.
VRS is also not suitable for most post-processing due to the absence of the scene rasterization information at this stage. Usage during post-processing can create large block artifacts and an apparent reduction in resolution.
NVIDIA Variable Rate Supersampling (VRSS)
VRSS is NVIDIA’s driver level technology for automatically applying foveated rendering with VRS to applications, without the need for any work from the application developers. Any forward rendered application using DirectX 11 and MSAA can leverage VRSS.
As the name suggests, the NVIDIA driver automatically supersamples the region where the user is looking, thus improving the visual quality. Read more about it in the NVIDIA VRSS blog or submit your application for driver support at the VRSS support application form.
Qualcomm Adreno Foveation
Qualcomm Adreno Foveation has many of the same benefits and limitations as VRS. Configuration of Adreno Foveation is slightly more limited and artifact mitigation require different approaches, such as multi-pass rendering to separate out problematic content.
Qualcomm’s foveated rendering extensions apply to mobile platforms utilizing OpenGL ES, and are suitable for forward rendering and to some extent deferred rendering. It primarily applies to the main rasterization stage but can also be applied to some post-processing at the expense of increased artifacts.
The implementation costs and processing overheads are very low, and expected savings apply to both rasterization and shading.
From Static to Dynamic Foveation
If a game engine already supports static foveated rendering, the barrier to transition to dynamic foveation is generally lower, however a number of caveats apply, and it is helpful to be aware of their impact, and how that impact can be mitigated.
Dynamic repositioning of the foveated region can introduce new artifacts and increase user awareness of existing artifacts. Eye movements, which have the tendency to be rapid and unpredictable, can introduce unwelcome flickering artifacts. The high-quality rendering regions used by dynamic foveation are generally smaller than those used by static foveation. While from a performance perspective this is good, the combination of this smaller size, increased movement of the foveated region and possible a dynamically adjusted size can amplify the visibility of artifacts. Mitigation of these additional artifacts can require additional image filtering and logic around handling of the eye tracking signal to limit unnecessary movement.
VRS and Qualcomm hardware based foveation exhibit the fewest additional artifacts and may require little to no mitigation work.
Multi-Res static foveation is a particularly difficult case to modify for dynamic foveation. Changes to the central positioning tend to introduce significant ‘swimming’ and ‘screen-door’ artifacts that can be extremely difficult to mitigate.
Regardless of performance, dynamic foveated rendering will increase the quality of the peripheral regions of display.
With the above considerations in mind, you might ask:
Should I support foveated rendering in my game engine?
If you are targeting a mixture of platforms, then adding foveated rendering can help with achieving visual or performance parity between the platforms.
If you are only targeting platforms with native eye tracking support, then it makes sense to see if you can get benefits which will improve the quality of your title.
To support applications targeting platforms with hardware foveation (such as Qualcomm 845 or PC with VRS) and using forward (or forward+) rendering, then the answer is probably “yes”:
- Implementation and maintenance costs are very low, and you will see some performance improvement. The amount of the performance improvement will depend on how expensive your main scene rendering shaders are and the resolution being targeted.
- Some, but not all, applications may show visual artifacts with naïve implementations. If the artifacts are severe enough modification of some content or pipeline changes can help to eliminate them. Many artifacts can be handled by simple tweaks to content or exclusion of some graphical elements from the foveation; see Artifacts and Mitigation.
To support applications using deferred rendering the answer is “maybe”:
- For application that exhibit significant rendering performance issues, G-Buffer warping foveation can produce significant processing savings. However, this approach has large fixed processing overheads and its implementation can be complex; some existing processing and shaders may need to be modified to be aware of the warp. Lastly a solid understanding of mathematics is needed to create the warp and matching filtering.
- Variable rate shading approaches are generally unsuitable for foveation of deferred renderers. Processing savings are limited, and artifacts can be severe.