How to use debugPrintf in Vulkan

Some days ago I was trying to use the debugPrintf functionality that was introduced to Vulkan and adjacent projects more than a year ago. Since I haven’t found (or maybe I missed it) a good online document that describes all the steps to enable such functionality programmatically, I thought it might be a good idea to document it myself. This is going to be a short post.

First of all, what is debugPrintf? debugPrintf is a way to write text messages from shaders that execute in the GPU to stdout or to your output of your own choosing. In other words, the GPU can print text messages that the CPU will display. debugPrintf’s primary use-case is to help debug shaders.

How it works? Someone can add expressions like these in their GLSL shaders:

#extension GLSL_EXT_debug_printf : enable
...
debugPrintfEXT("This is a message from the GPU. Some float=%f, some int=%d", 1.0, 123);

As you can see debugPrintfEXT looks quite similar to printf which makes it quite powerful. Using glslang (aka glslangValidator) you can convert shaders that contain debugPrintfEXT to SPIR-V and pass that SPIR-V to a VkShaderModule.

The majority of the the implementation of debugPrintf lives in the Vulkan validation layer. The validation layer will rewrite the SPIR-V generated by glslang and add code that processes the given text and sends it down to the CPU. The validation layer will make use of a hidden descriptor set, atomics and hidden buffers to pass data from the GPU to the CPU. All this work and setup is transparent to the user and it can be quite slow.

So what are the steps to start using debugPrintf?

Step 1: Enable the extension in your shaders by adding: #extension GLSL_EXT_debug_printf : enable

Step 2: Enable the validation layer while creating the VkInstance:

VkInstanceCreateInfo instanceCreateInfo;
...

const char* layerNames[1] = {"VK_LAYER_KHRONOS_validation"};
instanceCreateInfo.ppEnabledLayerNames = &layerNames[0];

instanceCreateInfo.enabledLayerCount = 1;

Step 3: Enable the debugPrintf validation layer feature while creating the VkInstance:

// Populate the VkValidationFeaturesEXT
VkValidationFeaturesEXT validationFeatures = {};
validationFeatures.sType = VK_STRUCTURE_TYPE_VALIDATION_FEATURES_EXT;
validationFeatures.enabledValidationFeatureCount = 1;

VkValidationFeatureEnableEXT enabledValidationFeatures[1] = {
	VK_VALIDATION_FEATURE_ENABLE_DEBUG_PRINTF_EXT};
validationFeatures.pEnabledValidationFeatures = enabledValidationFeatures;

// Then add the VkValidationFeaturesEXT to the VkInstanceCreateInfo
validationFeatures.pNext = instanceCreateInfo.pNext;
instanceCreateInfo.pNext = &validationFeatures;

Step 4: Setup the callback that will print the messages:

VkDebugReportCallbackEXT debugCallbackHandle;

// Populate the VkDebugReportCallbackCreateInfoEXT
VkDebugReportCallbackCreateInfoEXT ci = {};
ci.sType = VK_STRUCTURE_TYPE_DEBUG_REPORT_CALLBACK_CREATE_INFO_EXT;
ci.pfnCallback = myDebugCallback;
ci.flags = VK_DEBUG_REPORT_INFORMATION_BIT_EXT;
ci.pUserData = myUserData;

// Create the callback handle
vkCreateDebugReportCallbackEXT(vulkanInstance, &ci, nullptr, &debugCallbackHandle);

...

// And this is the callback that the validator will call
VkBool32 myDebugCallback(VkDebugReportFlagsEXT flags,
	VkDebugReportObjectTypeEXT objectType,
	uint64_t object, 
	size_t location, 
	int32_t messageCode,
	const char* pLayerPrefix,
	const char* pMessage, 
	void* pUserData)
{
	if(flags & VK_DEBUG_REPORT_ERROR_BIT_EXT)
	{
		printf("debugPrintfEXT: %s", pMessage);
	}

	return false;
}

Step 5: Make sure you enable the VK_KHR_shader_non_semantic_info device extension while building your VkDevice. This is pretty trivial so I won’t show any code.

Some additional notes:

  • It is possible to use debugPrintf with the DirectX compiler. In DX land debugPrintfEXT is just named printf
  • It is also possible to avoid all this annoying setup and use the Vulkan configurator (aka vkconfig). vkconfig is part of Vulkan SDK. More info on vkconfig here https://vulkan.lunarg.com/doc/view/1.2.135.0/windows/vkconfig.html
  • Latest RenderDoc also supports debugPrintf but I’m unsure about the details

And that’s pretty much it. If I missed something feel free to drop a comment bellow.

Resource uniformity & bindless access in Vulkan

This is going to be a short post around bindless descriptor access in Vulkan. There were other posts that touched this topic in the past but in this one I’ll focus more on the Vulkan spec and the terminology in general. The concepts described here also apply to DX12 so I’ll try to cover both terminologies when possible.

In SPIR-V/GLSL/HLSL you can have arrays of values or arrays of descriptors or whatever. These arrays can be sized, unsized, runtime sized etc, doesn’t matter. There are a few ways to index into those arrays:

  • Constant integral index
  • Dynamically uniform index
  • Non-dynamically uniform index
  • Subgroup uniform index

Constant integral index is a very typical access pattern. Nothing special:

const int idx = 10;
... = myArray[idx];

Dynamically uniform access is uniform across an invocation group. The Vulkan spec is a bit vague (on purpose) on defining what the invocation group is but to be 100% covered we can view it as a whole drawcall or a whole compute dispatch. So dynamically uniform access doesn’t diverge inside a drawcall at all. All invocations (threads in DX12) of a drawcall (or dispatch) access the same thing.

layout(...) uniform MyConstantBuffer
{
    bool dynamicallUniformBool;
    int dynamicallUniformIndex;
};

...

if(dynamicallUniformBool)
{
    ... = myArray[dynamicallUniformIndex];
}

In the above example myArray is accessed using a dynamically uniform index and it’s inside a dynamically uniform control flow. All invocations of that drawcall will access the same array element.

Non-dynamically uniform access is when there is divergence between invocations of an invocation group (aka drawcall or dispatch).

int idx = rand() % 100;
... = myArray[idx];

Subgroup uniform access is when something doesn’t diverge between the invocations that form a subgroup (wave in DX12). This is not explicitly exposed by the shading languages so we’ll leave that out for now.

We spoke about various access methods as a general concept but what we are really interested in is access of arrays of descriptors. This is what bindless really is. The Vulkan spec have added support for bindless in version 1.2 and as usual it also exposed a bunch of caps that define what’s allowed and what’s not.

shaderUniformBufferArrayDynamicIndexing, shaderSampledImageArrayDynamicIndexing, shaderStorageBufferArrayDynamicIndexing and shaderStorageImageArrayDynamicIndexing are caps since Vulkan 1.0. Having those false means that the arrays of the relevant resources can only be accessed using a constant index (or even better: any constant expression). Pretty much everyone has those set to true so let’s move on. Vulkan 1.2 added shaderInputAttachmentArrayDynamicIndexing, shaderUniformTexelBufferArrayDynamicIndexing and shaderStorageTexelBufferArrayDynamicIndexing and for most ISVs these are true as well. Note that sampler descriptors are absent from these caps.

Then there is the XXXArrayNonUniformIndexing family of caps. If these are false then the implementation doesn’t allow non-dynamically uniform access of descriptors. If that cap is true then you can do bindless on the specific type of descriptor. Most vendors have these set to true except Intel which doesn’t enable all of them.

An additional family of caps is the XXXArrayNonUniformIndexingNative. This feels like a performance warning more than anything else. If the XXXArrayNonUniformIndexingNative is false then the shader compiler will have to add additional instructions to work with non-dynamically uniform access. This varies between ISVs quite a bit.

One additional piece to the puzzle is the NonUniform SPIR-V decoration which is exposed via nonuniformEXT in GLSL and NonUniformResourceIndex() in HLSL. The default SPIR-V behavior mandates that descriptor accesses are dynamically uniform. When they are not, things might break. So when doing non-dynamically uniform accesses (when the implementation allows it ofcourse) you are required to use the NonUniform decoration. The NonUniform decoration is somewhat orthogonal to the caps discussed above. It doesn’t mean that if XXXArrayNonUniformIndexingNative is true you can omit the NonUniform decoration. The spec doesn’t really say when and if you can omit the NonUniform so the best thing to do is to always use it to decorate non-dynamically uniform accesses. If an implementation doesn’t care then it will simply ignore it.

Example of bindless in GLSL:

layout(...) uniform texture2D myBindlessHandles[]; // Runtime sized array
layout(...) uniform sampler mySampler;

...
vec4 color = texture(texture2D(myBindlessHandles[nonuniformEXT(nonUniformIndex)], mySampler), uvs);
...

So, putting all these together. AMD for example allows non-dynamically uniform access on sampled images (shaderSampledImageArrayNonUniformIndexing=true) but these accesses are not native (shaderSampledImageArrayNonUniformIndexingNative=false). By default AMD’s compiler will treat all descriptor accesses as dynamically uniform and use SGPR to store the descriptors. If the access is non-dynamically uniform then things might break. Then NonUniform comes into play. Since AMD’s HW doesn’t natively support non-dynamically uniform the NonUniform will instruct the compiler to add extra instructions to ensure subgroup invariance.

Similar story for Arm’s Mali, different reason though. On Mali some instructions require some arguments to be subgroup invariant and this is where non-dynamically uniform patterns become a problem.

One additional thing worth mentioning is that using buffer addresses to load data from buffers (exposed by VK_KHR_device_buffer_address and part of Vulkan 1.2) doesn’t require any NonUniform decoration. NonUniform is irrelevant if your shader code doesn’t index arrays of descriptors. Addresses don’t point to descriptors, they point to some raw memory.

The final bit to the puzzle is to understand which builtins are dynamically uniform and which are not. The answer is hidden inside the spec and only gl_DrawID is explicitly mentioned as dynamically uniform and everything else is not. If for example you are using gl_InstanceIndex/SV_InstanceID (directly or indirectly) to index resources then you technically need to use the NonUnifom decoration.

Big thanks to Christian Forfang for providing some early feedback!

Anatomy of a frame in AnKi

This is going to be long so let’s start with the purpose of this article which essentially is to analyze a single frame from the renderer’s point of view. It will briefly describe all the passes and how the data get transformed in order to produce some pretty pixels into the screen.

Some disclaimers before we start:

  • It’s not about a perfect renderer. There isn’t such thing. Renderers should adapt to the context (type of game, platforms etc).
  • It’s not about a perfect renderer even for my context and standards. I always have ideas for further improvements and I have kept postponing this article until they get materialized. But then new ideas come up and I felt I shouldn’t wait any longer.
  • It’s not about a mobile (GPU) friendly renderer. It’s a desktop oriented one.
  • It doesn’t reflect how the renderer will look like in a month from now. I tweak it almost daily.

These are some terms used throughout this article:

  • Graphics pass: A series of drawcalls that affect the same output. In Vulkan terminology it’s a VkRenderPass pass with a single subpass.
  • Compute pass: A compute job or a series of compute jobs that affect the same output.
  • Render target: Part of the output of a graphics render pass.
  • Texture: A sampled image that is used as input in compute or graphics passes. Some render targets may be used as textures later on.
  • GI: Global illumination.
Continue reading “Anatomy of a frame in AnKi”

Developer console

Something that was long overdue… the new developer console of AnKi. It can execute LUA scripts and view the log. More importantly, it’s bound to the tilde key (~) like all consoles should. Built using Dear ImGui.

Designing good C++ game middleware

For many years I’ve been evaluating and using various game specific open source libraries and at the same time I was designing and implementing my own. Despite the fact that many libraries are quite competent on what they do, their overall design leaves a few things to be desired. Some of the concepts described here sound naive but you can’t imagine how many libraries get them wrong. This article focuses on a few good practices that designers/implementers of performance critical libraries should be aware of.

This article is built around five pillars:

  • How public interfaces should look like.
  • Data oriented design.
  • The importance of thread-awareness.
  • Memory management.
  • And some general concepts.

Who is the target audience:

  • People who want to create performance critical libraries/middleware.
  • People who want to attract serious users and not only hobbyists.
  • Mainly directed to opensource.

Who is not the target audience:

  • People who want to create middleware solely for their own amusement.
  • C++ purists.
Continue reading “Designing good C++ game middleware”

Optimizing Vulkan for AMD and the tale of two Vulkan drivers

The first GPU AnKi run, almost a decade ago, was in fact an ATI Radeon 9800 Pro. The first version of the deferred shading renderer run in that GPU and not only that. AnKi was running on Linux and the fglrx driver. I don’t remember experiencing many game breaking bugs back then, but then again, AnKi was quite simplistic at the time. One thing I remember was some depth buffer corruption that I had to workaround using a copy. Many years later I understood that this was a driver bug.

The love with ATI didn’t last long and AnKi end up being developed exclusively using nVidias. For many years AMD’s OpenGL driver didn’t have the quality or the features I wanted. Fast forward to today, things are looking far better. Firstly, Mesa has a quite decent OpenGL implementation, secondly, there is a very competitive Mesa Vulkan driver (RADV) and on top of that there is an second opensource Vulkan driver directly from AMD (AMDVLK). The cherry on top is a very good profiler for Vulkan and AMDVLK called Radeon GPU profiler. AMD regularly releases lots of documentation and optimization tips as part of their GPUOpen initiative. This is a great period to own AMD hardware for graphics development that’s why I had to get my hands on an AMD GPU.

In this post I’ll focus on some AMD specific optimizations and I’ll be comparing the two opensource Vulkan drivers.

Continue reading “Optimizing Vulkan for AMD and the tale of two Vulkan drivers”