Partial difference derivative functions (
ddy in HLSL[a],
dFdy in GLSL[b]) (in the rest of this article I will use both terms according to the code examples I will provide) are fragment shader instructions wich can be used to compute the rate of variation of any value with respect to the screen-space coordinates.
During triangles rasterization, GPUs run many instances of a fragment shader at a time organizing them in blocks of 2×2 pixels. Derivatives are calculated by taking differences between the pixel values in a block;
dFdx subtracts the values of the pixels on the left side of the block from the values on the right side, and
dFdy subtracts the values of the bottom pixels from the top ones. See the image below where the grid represents the rendered screen pixels and
dFdy expressions are provided for the generic value p evaluated by the fragment shader instance at (x, y) screen coordinates and belonging to the 2×2 block highlighted in red.
Derivatives can be evaluated for every variable in a fragment shader. For vector and matrix types, derivatives are computed element-wise.
Derivatives functions are fundamental for texture mipmaps implementation and are very useful in a series of algorithms and effects, in particular when there is some kind of dependence on screen space coordinates (for example when rendering wireframe edges with uniform screen pixel thickness).
Derivatives and mipmaps
Mipmaps[d] are pre-computed sequences of images obtained by filtering down a texture into smaller sizes (each mipmap level is two times smaller than the previous). They are used to avoid aliasing artifacts when minifying a texture.
Mipmapping is also important for texture cache coherence, since it enforces a near-one texel to pixel ratio: when traversing a triangle, each new pixel represents a step in texture space of one texel at most. Mipmapping is one of the few cases in rendering where a technique
improves both visuals and performance.
Derivatives are used during texture sampling to select the best mipmap level. The rate of variation of the texture coordinates with respect to the screen coordinates is used to choose a mipmap; the larger the derivatives, the greater the mipmap level (and the lesser the mipmap size).
Face normal computation (flat shader)
Derivatives can be used to compute the current triangle’s face normal in a fragment shader. The horizontal and vertical derivatives of the current fragment’s world-position are two vectors laying in the triangle’s surface. Their cross product is a vector orthogonal to the surface and its norm is the triangle’s normal vector (see the 3d model below). Particular attention must be paid to the ordering of the cross product: being the OpenGL coordinate system left-handed (at least when working in window space which is the context where the fragment shader works[e]) and being the horizontal derivative vector always oriented right and the vertical down, the ordering of the cross product to obtain a normal vector oriented toward the camera is horizontal x vertical (more about cross products and basis orientations in this article). The interactive model below shows the link between screen pixels and fragmets over a triangle surface being rasterized, the derivative vectors on the surface (in red and green), and the normal vector (in blue) obtained by the cross product of the twos.
Face normal as the cross product of position derivatives
Here is a GLSL code line to compute a flat normal given the fragment position
pos in camera space:
normalize( cross(dFdx(pos), dFdy(pos)) );
Flat shader example
Derivatives and branches
Derivatives computation is based on the parallel execution on the GPU’s hardware of multiple instances of a shader. Scalar operations are executed with a SIMD (Single Instruction Multiple Data) architecture on registers containing a vector of 4 values for a block of 2×2 pixels. This means that at every step of execution, the shader instances belonging to each 2×2 block are synchronized making derivative computation fast and easy to implement in hardware, being a simple subtraction of values contained in the same register.
But what happens in the case of a conditional branch? In this case, if not all of the threads in a core take the same branch, there is a divergence in the code execution. In the image below an example of divergence is shown: a conditional branch execution in a GPU core with 8 shader instances. Three instances take the first branch (yellow). During the yellow branch execution the other 5 instances are inactive (an execution bitmask is used to activate/deactivate execution). After the yellow branch, the execution mask is inverted and the blue branch is executed by the remaining 5 instances.
In addition to the efficiency and performance loss of the branch, the divergence is breaking the synchronization between the pixels in a block making derivatives operations undefined. This is a problem for texture sampling which needs derivatives for mipmap level selection, anisotropic filtering, etc. When facing such a problem, a shader compiler could flatten the branch (thus avoiding it) or try to rearrange the code moving texture reads outside of the branch control flow. This problem can be avoided by using explicit derivatives or mipmap level when sampling a texture.
Below you can see a HLSL branching experiment written in UE4 using a custom expression node.
Here is the shader code I’m using in the previous example:
float tmp = 10000;
if(xpos > side)
tmp = xpos * xpos;
float dx = ddx(tmp);
color = float3(dx, 0, 0);
tmp = xpos * xpos;
float dx = ddx(tmp);
color = float3(0, dx, 0);
return color * 100;
The purpose of this experiment is to see what happens when derivatives are used inside a divergent block. Suppose that the code above be executed on a GPU core. When a subset of the pixels in a block enters the first branch, the value of
tmp for the inactive pixels waiting for the second branch execution should be still 10000. So the
ddx function should give a spike for some pixels on divergent blocks. Note the
[branch] attribute before the
if to force branching using control flow instructions.
As you can see in the picture above, the compiler gives the following error for that piece of code: “cannot have divergent gradient operations inside flow control“, but when the
[branch] attribute is removed, the code compiles fine but no spikes are visible during rendering, meaning that the branch has been flattened.
Revealing the block aligning of derivatives
Here is a simple experiment that reveals the inner block alignment of shader derivatives. Look at the following pocket.gl sandbox.
Computing the shader derivative of a step function.
The above shader implements a step function over the
x axis. We want to compute its derivative. The derivative of a step function would be a Dirac delta function in the continuous domain, but in the shader’s discrete domain the delta function will be equal to 1 when the step jumps from 0 to 1, and 0 elsewhere. Select the Show Derivative checkbox and toggle the Step on odd pix checkbox to snap the Step position to an even (unchecked) or an odd (checked) pixel at the center of the viewport; you’ll see how
dFdx(step) changes when moving the transition point from an even to an odd pixel.
Because the derivative computation is performed over blocks of 2×2 pixels, we should expect two different results depending on where the step transition occurs:
- Case 1. If the step transition falls in the middle of a 2×2 block of pixels, we’ll see a vertical line with 2 pixel thickness (the derivative is equal to 1 for each pixel in the 2×2 block, hence the 2 pixel thickness). This happens when the step falls on an odd pixel.
- Case 2. The step transition falls in the middle of two neighbouring 2×2 blocks of pixels. In this case we won’t see any vertical line because both the blocks will compute a derivative equal to 0. This happens when the step falls on an even pixel.
As an exercise, try to modify the shader code of the above sandbox in order to show an horizontal step function and an horizontal derivative line.
These aliasing artifacts are caused by the subsampling due to the hardware per-block computation of derivatives; horizontal derivatives have full vertical and half horizontal resolution, vertical derivatives have full horizontal and half vertical resolution.
- How a GPU works – Kayvon Fatahalian
- Computer Graphics: Principles and Practice – Chapter 38. Modern Graphics Hardware
- Real Time Rendering – Chapter 18. Graphics Hardware