Optimised Tile Shift blur effect (non-Craft shaders)

A user was having some performance issues with their project, specifically related to their tilt shift blur effect

This was for vanilla shaders (not Craft) so I decided to try making a more optimised version. The original effect used a single pass with a kernel size of 12, which required at least 144 (12^2) texture samples per pixel

The new effect uses two separate passes (kernel size x 2 samples), with a pre-computed kernel passed in as a uniform array. You can see the difference by changing the kernel size parameter and sigma parameter. Another optimisation is the use of branching (not always bad) so that blurring only occurs in the top and bottom regions. The blur regions are also smoothly blended by adjusting the effective texel size when applying the blur

Because of the way that modern GPUs work, a wavefront/warp (groups of threads working on neighboring pixels) will have coherent branching when all of the threads have the same condition locally (https://www.peterstefek.me/shader-branch.html). So most of the pixels not in the bottom or top will only do one sample per pass instead (a few pixels on the boundary will do a little more work)

Anyway, enjoy!

Hey this is awesome, I’ve been having trouble with my own blur in that Uber shader you saw, once the blur radius is large enough you start to see the divergent passes.

This is also really useful for a “glow effect” as glow is often one(or more) color(s) blurred around the glowing thing.

One thing I noticed is that the article specifically mentions this is true for nVidia but is it true for Apple gpu?

@skar Any kind of single pass shader is always going to be inefficient for larger kernels, making it pretty useless for larger blurring effects

For your glow effect you might be better off using some kind of bloom. Technically you could hijack the Craft bloom post effect by drawing your game to a texture and drawing it as a quad in a craft scene.

The article is for nVidia but I think its relevant to most modern GPU designs and it should be possible to profile. Unfortunately I can’t get the Metal profiling tools to work with OpenGL as it wont capture frame data in Xcode (Apple support for OpenGL has been getting worse for a while now)