Resource uniformity & bindless access in Vulkan

This is going to be a short post around bindless descriptor access in Vulkan. There were other posts that touched this topic in the past but in this one I’ll focus more on the Vulkan spec and the terminology in general. The concepts described here also apply to DX12 so I’ll try to cover both terminologies when possible.

In SPIR-V/GLSL/HLSL you can have arrays of values or arrays of descriptors or whatever. These arrays can be sized, unsized, runtime sized etc, doesn’t matter. There are a few ways to index into those arrays:

  • Constant integral index
  • Dynamically uniform index
  • Non-dynamically uniform index
  • Subgroup uniform index

Constant integral index is a very typical access pattern. Nothing special:

const int idx = 10;
... = myArray[idx];

Dynamically uniform access is uniform across an invocation group. The Vulkan spec is a bit vague (on purpose) on defining what the invocation group is but to be 100% covered we can view it as a whole drawcall or a whole compute dispatch. So dynamically uniform access doesn’t diverge inside a drawcall at all. All invocations (threads in DX12) of a drawcall (or dispatch) access the same thing.

layout(...) uniform MyConstantBuffer
{
    bool dynamicallUniformBool;
    int dynamicallUniformIndex;
};

...

if(dynamicallUniformBool)
{
    ... = myArray[dynamicallUniformIndex];
}

In the above example myArray is accessed using a dynamically uniform index and it’s inside a dynamically uniform control flow. All invocations of that drawcall will access the same array element.

Non-dynamically uniform access is when there is divergence between invocations of an invocation group (aka drawcall or dispatch).

int idx = rand() % 100;
... = myArray[idx];

Subgroup uniform access is when something doesn’t diverge between the invocations that form a subgroup (wave in DX12). This is not explicitly exposed by the shading languages so we’ll leave that out for now.

We spoke about various access methods as a general concept but what we are really interested in is access of arrays of descriptors. This is what bindless really is. The Vulkan spec have added support for bindless in version 1.2 and as usual it also exposed a bunch of caps that define what’s allowed and what’s not.

shaderUniformBufferArrayDynamicIndexing, shaderSampledImageArrayDynamicIndexing, shaderStorageBufferArrayDynamicIndexing and shaderStorageImageArrayDynamicIndexing are caps since Vulkan 1.0. Having those false means that the arrays of the relevant resources can only be accessed using a constant index (or even better: any constant expression). Pretty much everyone has those set to true so let’s move on. Vulkan 1.2 added shaderInputAttachmentArrayDynamicIndexing, shaderUniformTexelBufferArrayDynamicIndexing and shaderStorageTexelBufferArrayDynamicIndexing and for most ISVs these are true as well. Note that sampler descriptors are absent from these caps.

Then there is the XXXArrayNonUniformIndexing family of caps. If these are false then the implementation doesn’t allow non-dynamically uniform access of descriptors. If that cap is true then you can do bindless on the specific type of descriptor. Most vendors have these set to true except Intel which doesn’t enable all of them.

An additional family of caps is the XXXArrayNonUniformIndexingNative. This feels like a performance warning more than anything else. If the XXXArrayNonUniformIndexingNative is false then the shader compiler will have to add additional instructions to work with non-dynamically uniform access. This varies between ISVs quite a bit.

One additional piece to the puzzle is the NonUniform SPIR-V decoration which is exposed via nonuniformEXT in GLSL and NonUniformResourceIndex() in HLSL. The default SPIR-V behavior mandates that descriptor accesses are dynamically uniform. When they are not, things might break. So when doing non-dynamically uniform accesses (when the implementation allows it ofcourse) you are required to use the NonUniform decoration. The NonUniform decoration is somewhat orthogonal to the caps discussed above. It doesn’t mean that if XXXArrayNonUniformIndexingNative is true you can omit the NonUniform decoration. The spec doesn’t really say when and if you can omit the NonUniform so the best thing to do is to always use it to decorate non-dynamically uniform accesses. If an implementation doesn’t care then it will simply ignore it.

Example of bindless in GLSL:

layout(...) uniform texture2D myBindlessHandles[]; // Runtime sized array
layout(...) uniform sampler mySampler;

...
vec4 color = texture(texture2D(myBindlessHandles[nonuniformEXT(nonUniformIndex)], mySampler), uvs);
...

So, putting all these together. AMD for example allows non-dynamically uniform access on sampled images (shaderSampledImageArrayNonUniformIndexing=true) but these accesses are not native (shaderSampledImageArrayNonUniformIndexingNative=false). By default AMD’s compiler will treat all descriptor accesses as dynamically uniform and use SGPR to store the descriptors. If the access is non-dynamically uniform then things might break. Then NonUniform comes into play. Since AMD’s HW doesn’t natively support non-dynamically uniform the NonUniform will instruct the compiler to add extra instructions to ensure subgroup invariance.

Similar story for Arm’s Mali, different reason though. On Mali some instructions require some arguments to be subgroup invariant and this is where non-dynamically uniform patterns become a problem.

One additional thing worth mentioning is that using buffer addresses to load data from buffers (exposed by VK_KHR_device_buffer_address and part of Vulkan 1.2) doesn’t require any NonUniform decoration. NonUniform is irrelevant if your shader code doesn’t index arrays of descriptors. Addresses don’t point to descriptors, they point to some raw memory.

The final bit to the puzzle is to understand which builtins are dynamically uniform and which are not. The answer is hidden inside the spec and only gl_DrawID is explicitly mentioned as dynamically uniform and everything else is not. If for example you are using gl_InstanceIndex/SV_InstanceID (directly or indirectly) to index resources then you technically need to use the NonUnifom decoration.

Big thanks to Christian Forfang for providing some early feedback!

Anatomy of a frame in AnKi

This is going to be long so let’s start with the purpose of this article which essentially is to analyze a single frame from the renderer’s point of view. It will briefly describe all the passes and how the data get transformed in order to produce some pretty pixels into the screen.

Some disclaimers before we start:

  • It’s not about a perfect renderer. There isn’t such thing. Renderers should adapt to the context (type of game, platforms etc).
  • It’s not about a perfect renderer even for my context and standards. I always have ideas for further improvements and I have kept postponing this article until they get materialized. But then new ideas come up and I felt I shouldn’t wait any longer.
  • It’s not about a mobile (GPU) friendly renderer. It’s a desktop oriented one.
  • It doesn’t reflect how the renderer will look like in a month from now. I tweak it almost daily.

These are some terms used throughout this article:

  • Graphics pass: A series of drawcalls that affect the same output. In Vulkan terminology it’s a VkRenderPass pass with a single subpass.
  • Compute pass: A compute job or a series of compute jobs that affect the same output.
  • Render target: Part of the output of a graphics render pass.
  • Texture: A sampled image that is used as input in compute or graphics passes. Some render targets may be used as textures later on.
  • GI: Global illumination.
Continue reading “Anatomy of a frame in AnKi”

Developer console

Something that was long overdue… the new developer console of AnKi. It can execute LUA scripts and view the log. More importantly, it’s bound to the tilde key (~) like all consoles should. Built using Dear ImGui.

Designing good C++ game middleware

For many years I’ve been evaluating and using various game specific open source libraries and at the same time I was designing and implementing my own. Despite the fact that many libraries are quite competent on what they do, their overall design leaves a few things to be desired. Some of the concepts described here sound naive but you can’t imagine how many libraries get them wrong. This article focuses on a few good practices that designers/implementers of performance critical libraries should be aware of.

This article is built around five pillars:

  • How public interfaces should look like.
  • Data oriented design.
  • The importance of thread-awareness.
  • Memory management.
  • And some general concepts.

Who is the target audience:

  • People who want to create performance critical libraries/middleware.
  • People who want to attract serious users and not only hobbyists.
  • Mainly directed to opensource.

Who is not the target audience:

  • People who want to create middleware solely for their own amusement.
  • C++ purists.
Continue reading “Designing good C++ game middleware”

Optimizing Vulkan for AMD and the tale of two Vulkan drivers

The first GPU AnKi run, almost a decade ago, was in fact an ATI Radeon 9800 Pro. The first version of the deferred shading renderer run in that GPU and not only that. AnKi was running on Linux and the fglrx driver. I don’t remember experiencing many game breaking bugs back then, but then again, AnKi was quite simplistic at the time. One thing I remember was some depth buffer corruption that I had to workaround using a copy. Many years later I understood that this was a driver bug.

The love with ATI didn’t last long and AnKi end up being developed exclusively using nVidias. For many years AMD’s OpenGL driver didn’t have the quality or the features I wanted. Fast forward to today, things are looking far better. Firstly, Mesa has a quite decent OpenGL implementation, secondly, there is a very competitive Mesa Vulkan driver (RADV) and on top of that there is an second opensource Vulkan driver directly from AMD (AMDVLK). The cherry on top is a very good profiler for Vulkan and AMDVLK called Radeon GPU profiler. AMD regularly releases lots of documentation and optimization tips as part of their GPUOpen initiative. This is a great period to own AMD hardware for graphics development that’s why I had to get my hands on an AMD GPU.

In this post I’ll focus on some AMD specific optimizations and I’ll be comparing the two opensource Vulkan drivers.

Continue reading “Optimizing Vulkan for AMD and the tale of two Vulkan drivers”

Porting AnKi to Vulkan part 2

In my Porting AnKi to Vulkan post I went into detail describing how AnKi’s interfaces changed to accommodate the Vulkan backend and how this backend looked like. Eight months have passed since then and a few things changed mainly towards greater flexibility. This post describes what are the differences with the older interfaces, how is the performance currently and what new extensions AnKi is using now.

Continue reading “Porting AnKi to Vulkan part 2”