An introduction to shader derivative functions

Partial difference derivative functions (ddx and ddy in HLSL^[a], dFdx and dFdy in GLSL^[b]) (in the rest of this article I will use both terms according to the code examples I will provide) are fragment shader instructions wich can be used to compute the rate of variation of any value with respect to the screen-space coordinates.

Derivatives computation

During triangles rasterization, GPUs run many instances of a fragment shader at a time organizing them in blocks of 2×2 pixels. Derivatives are calculated by taking differences between the pixel values in a block; dFdx subtracts the values of the pixels on the left side of the block from the values on the right side, and dFdy subtracts the values of the bottom pixels from the top ones. See the image below where the grid represents the rendered screen pixels and dFdx, dFdy expressions are provided for the generic value p evaluated by the fragment shader instance at (x, y) screen coordinates and belonging to the 2×2 block highlighted in red.

Derivatives can be evaluated for every variable in a fragment shader. For vector and matrix types, derivatives are computed element-wise.

Derivatives functions are fundamental for texture mipmaps implementation and are very useful in a series of algorithms and effects, in particular when there is some kind of dependence on screen space coordinates (for example when rendering wireframe edges with uniform screen pixel thickness).

Derivatives and mipmaps

Mipmaps^[d] are pre-computed sequences of images obtained by filtering down a texture into smaller sizes (each mipmap level is two times smaller than the previous). They are used to avoid aliasing artifacts when minifying a texture.

Mipmapping is also important for texture cache coherence, since it enforces a near-one texel to pixel ratio: when traversing a triangle, each new pixel represents a step in texture space of one texel at most. Mipmapping is one of the few cases in rendering where a technique
improves both visuals and performance.

Derivatives are used during texture sampling to select the best mipmap level. The rate of variation of the texture coordinates with respect to the screen coordinates is used to choose a mipmap; the larger the derivatives, the greater the mipmap level (and the lesser the mipmap size).

Face normal computation (flat shader)

Derivatives can be used to compute the current triangle’s face normal in a fragment shader. The horizontal and vertical derivatives of the current fragment’s world-position are two vectors laying in the triangle’s surface. Their cross product is a vector orthogonal to the surface and its norm is the triangle’s normal vector (see the 3d model below). Particular attention must be paid to the ordering of the cross product: being the OpenGL coordinate system left-handed (at least when working in window space which is the context where the fragment shader works^[e]) and being the horizontal derivative vector always oriented right and the vertical down, the ordering of the cross product to obtain a normal vector oriented toward the camera is horizontal x vertical (more about cross products and basis orientations in this article). The interactive model below shows the link between screen pixels and fragmets over a triangle surface being rasterized, the derivative vectors on the surface (in red and green), and the normal vector (in blue) obtained by the cross product of the twos.

Face normal as the cross product of position derivatives

Here is a GLSL code line to compute a flat normal given the fragment position pos in camera space:

normalize( cross(dFdx(pos), dFdy(pos)) );

1	normalize( cross(dFdx(pos), dFdy(pos)) );

And below there is a complete pocket.gl demo with a vertex and fragment shader at work on an Utah Teapot^[f]. You can toggle the flat shader using the Flat shaded checkbox.

Flat shader example

Derivatives and branches

Derivatives computation is based on the parallel execution on the GPU’s hardware of multiple instances of a shader. Scalar operations are executed with a SIMD (Single Instruction Multiple Data) architecture on registers containing a vector of 4 values for a block of 2×2 pixels. This means that at every step of execution, the shader instances belonging to each 2×2 block are synchronized making derivative computation fast and easy to implement in hardware, being a simple subtraction of values contained in the same register.

But what happens in the case of a conditional branch? In this case, if not all of the threads in a core take the same branch, there is a divergence in the code execution. In the image below an example of divergence is shown: a conditional branch execution in a GPU core with 8 shader instances. Three instances take the first branch (yellow). During the yellow branch execution the other 5 instances are inactive (an execution bitmask is used to activate/deactivate execution). After the yellow branch, the execution mask is inverted and the blue branch is executed by the remaining 5 instances.

In addition to the efficiency and performance loss of the branch, the divergence is breaking the synchronization between the pixels in a block making derivatives operations undefined. This is a problem for texture sampling which needs derivatives for mipmap level selection, anisotropic filtering, etc. When facing such a problem, a shader compiler could flatten the branch (thus avoiding it) or try to rearrange the code moving texture reads outside of the branch control flow. This problem can be avoided by using explicit derivatives or mipmap level when sampling a texture.

Below you can see a HLSL branching experiment written in UE4 using a custom expression node.

Here is the shader code I’m using in the previous example:

 float tmp = 10000;
 float3 color;

 [branch]
 if(xpos &gt; side)
 {
   tmp = xpos * xpos;
   float dx = ddx(tmp);
   color = float3(dx, 0, 0);
 }
 else
 {
   tmp = xpos * xpos;
   float dx = ddx(tmp);
   color = float3(0, dx, 0);
 }

 return color * 100;

float tmp = 10000;

float3 color;

[branch]

if(xpos > side)

{

tmp = xpos * xpos;

float dx = ddx(tmp);

color = float3(dx, 0, 0);

}

else

{

tmp = xpos * xpos;

float dx = ddx(tmp);

color = float3(0, dx, 0);

}

return color * 100;

The purpose of this experiment is to see what happens when derivatives are used inside a divergent block. Suppose that the code above be executed on a GPU core. When a subset of the pixels in a block enters the first branch, the value of tmp for the inactive pixels waiting for the second branch execution should be still 10000. So the ddx function should give a spike for some pixels on divergent blocks. Note the [branch] attribute before the if to force branching using control flow instructions.

As you can see in the picture above, the compiler gives the following error for that piece of code: “cannot have divergent gradient operations inside flow control“, but when the [branch] attribute is removed, the code compiles fine but no spikes are visible during rendering, meaning that the branch has been flattened.

Revealing the block aligning of derivatives

Here is a simple experiment that reveals the inner block alignment of shader derivatives. Look at the following pocket.gl sandbox.

Computing the shader derivative of a step function.

The above shader implements a step function over the x axis. We want to compute its derivative. The derivative of a step function would be a Dirac delta function in the continuous domain, but in the shader’s discrete domain the delta function will be equal to 1 when the step jumps from 0 to 1, and 0 elsewhere. Select the Show Derivative checkbox and toggle the Step on odd pix checkbox to snap the Step position to an even (unchecked) or an odd (checked) pixel at the center of the viewport; you’ll see how dFdx(step) changes when moving the transition point from an even to an odd pixel.

Because the derivative computation is performed over blocks of 2×2 pixels, we should expect two different results depending on where the step transition occurs:

Case 1. If the step transition falls in the middle of a 2×2 block of pixels, we’ll see a vertical line with 2 pixel thickness (the derivative is equal to 1 for each pixel in the 2×2 block, hence the 2 pixel thickness). This happens when the step falls on an odd pixel.
Case 2. The step transition falls in the middle of two neighbouring 2×2 blocks of pixels. In this case we won’t see any vertical line because both the blocks will compute a derivative equal to 0. This happens when the step falls on an even pixel.

As an exercise, try to modify the shader code of the above sandbox in order to show an horizontal step function and an horizontal derivative line.

These aliasing artifacts are caused by the subsampling due to the hardware per-block computation of derivatives; horizontal derivatives have full vertical and half horizontal resolution, vertical derivatives have full horizontal and half vertical resolution.

References

How a GPU works – Kayvon Fatahalian
Computer Graphics: Principles and Practice – Chapter 38. Modern Graphics Hardware
Real Time Rendering – Chapter 18. Graphics Hardware

Notes

High-Level Shading Language [↩]
OpenGL Shading Language [↩]
Lena Söderberg [↩]
“Mip” stands for multum in parvo, latin for “many things in a small place” [↩]
Is OpenGL coordinate system left-handed or right-handed? [↩]
Utah Teapot [↩]
How a GPU Works – Kayvon Fatahalian, 2011 [↩]

22 thoughts on “An introduction to shader derivative functions”

yang January 18, 2017 at 5:52 pm

Thanks! great explanation!

Reply ↓
Büke Beyond April 5, 2017 at 7:45 am

Excellent. Thank you. A little typo: “horizontal derivatives have full vertical and half >horizontal< resolution,"

Reply ↓
1. Giuseppe Post authorApril 5, 2017 at 1:24 pm
  
  Thanks! Just fixed the typo.
  
  Reply ↓
Arda May 16, 2017 at 1:18 am

Awesome explanation and experiment! Thanks!

Reply ↓
terry August 28, 2018 at 10:07 am

where is the demo ： Computing the shader derivative of a step function.

Reply ↓
1. Giuseppe Post authorAugust 30, 2018 at 12:49 pm
  
  Hi, the demo is a webgl embedded in the page. Are you browsing on a PC?
  
  Reply ↓
huttarl August 30, 2018 at 6:34 pm

Thanks, this was very helpful.

Reply ↓
PhilT May 3, 2019 at 6:28 pm

Thanks for this really useful post. I’m just struggling to understand where normalMatrix comes from in the vertex shader for the teapot.

Reply ↓
1. PhilT May 3, 2019 at 6:41 pm
  
  Ah, I see, you’re not actually showing the inputs
  
  Reply ↓
  1. Giuseppe Portelli May 4, 2019 at 6:14 pm
    
    normalMatrix is the inverse transpose of modelViewMatrix avilable as a bult-in uniform in pocket.gl
    http://www.pocketgl.com/documentation/#Built-in_attributes_and_uniforms
    
    Reply ↓
chris April 15, 2021 at 7:57 am

fwidth() compiles at https://www.shadertoy.com/view/WtScDt
but I cant get it to work on my server … What are they doing that I am not, same browser, what gives? Why don’t models cast shadows, as do predefined objects? questions, questions.
http://innerbeing.epizy.com/cwebgl/three/examples/webxr_vr_sandbox.html

Reply ↓
chris April 16, 2021 at 6:50 am

obj.traverse( function( node ) { if ( node instanceof THREE.Mesh ) { node.castShadow = true; } } );
add that to the parser/loader call back – does what it suggests.
http://innerbeing.epizy.com/cwebgl/three/examples/webxr_vr_sandbox.html
now back to fwidth() — why will it compile at ShaderToy but not for me?
my handle is a link to shaders in development. I want to quit without fwidth() working.
drat!
http://trueinnerbeing.x10host.com/cwebgl/shaderh.html

Reply ↓
Pingback: Ice Shader – Hard Edges in Vertex Shader – Amy's Archive
Pingback: Computing vertex normal in OpenGL ES2 shader - Tutorial Guruji
Pingback: WebGPU for Metal Developers, Part Two – Metal by Example
Pingback: Unity shader 中ddx/ddy偏导数的原理和简单应用 - 算法网
Pingback: Topographical Maps in Unity: Terrain Shading - Alan Zucconi
Tom August 14, 2022 at 12:37 pm

One Question
“the ordering of the cross product to obtain a normal vector oriented toward the camera is horizontal x vertical”
if the red one in the picture is horizontal and the green one is vertical,
horizontal x vertical means red x green ,
accroding Right-hand rule, the cross result is down in the picture（not up） , right ?

Two Question,
OpengGL Screen space is “Y asix up and X asix right, axis origin is left-bottom”
so dFdy is up – down, that is , p(x,y+1) – p(x,y) , the green arrow is up(not down?) right ?

Reply ↓
1. Giuseppe Post authorAugust 14, 2022 at 6:08 pm
  
  As I wrote a couple of lines above your quotation, OpenGL is left handed when working in window space:
  
  “Particular attention must be paid to the ordering of the cross product: being the OpenGL coordinate system left-handed (at least when working in window space which is the context where the fragment shader works[e]) and being the horizontal derivative vector always oriented right and the vertical down, the ordering of the cross product to obtain a normal vector oriented toward the camera is horizontal x vertical ”
  
  I also added a link to a page explaining why the window space (aka screen space) is left handed: here.
  
  Reply ↓
Pingback: Mip Map Folding – Paul Nasdalack
Vadim April 29, 2023 at 2:57 am

What happens if only one pixel of the 2×2 block belongs to the triangle that is being rendered, and other three do not? How the derivative for that single pixel be calculated?

Reply ↓
1. Giuseppe Post authorMay 2, 2023 at 3:13 pm
  
  Interpolated values are always computed for all pixels in a 2×2 block (by extrapolation) even if only one pixel belongs to the triangle. This means that you derivatives are available even in this case. The other consequence is a lot of overdraw when you have many small triangles covering a bunch of pixels (micro meshes). This can be avoided with techniques such as LOD or more recent ones like Unreal Engine 5 Nanite.
  
  Reply ↓