Why Codea`s shader is slower than WebGL`s shader?

Today I found an interesting shader in shadertoy (https://www.shadertoy.com/view/llsGW7#), on my iPadPro, I use safari to run it , the FPS is 30~40. I port it to Codea, in ShaderLab, the FPS looks fine, but in Codea code, it is very slow, the FPS become 2~3. In my expect the FPS in Codea should faster than WebGL, but the result is not. I do not know why. Who can help me to analyze the reason, thanks!

My code is here:

The tex1 is a 512*512 image, you can use anyone which the size is the same.

function setup()
    parameter.watch("1/DeltaTime")
    
    local tex1 = readImage("Dropbox:3D-grass2")

    m1 = mesh()
    m1.shader=shader(f.vs,f.fs)
    m1.shader.iResolution = vec3(1000,1000,100)
    m1.shader.iChannel0 = tex1
    m1:addRect(WIDTH/2,HEIGHT/2,WIDTH/1,HEIGHT/1)
end

function draw()
    m1.shader.iGlobalTime = ElapsedTime
    m1:draw()
end

f = {
vs = [[

uniform mat4 modelViewProjection;

attribute vec4 position;
attribute vec4 color;
attribute vec2 texCoord;

//This is an output variable that will be passed to the fragment shader
varying lowp vec4 vColor;
varying highp vec2 vTexCoord;

void main()
{
    //Pass the mesh color to the fragment shader
    vColor = color;
    vTexCoord = texCoord;
    
    //Multiply the vertex position by our combined transform
    gl_Position = modelViewProjection * position;
}
]],

fs = [[

// from https://www.shadertoy.com/view/llsGW7#

precision highp float;

uniform lowp sampler2D texture;

uniform vec3      iResolution;           // viewport resolution (in pixels)
uniform float     iGlobalTime;           // shader playback time (in seconds)
uniform lowp sampler2D iChannel0;          // input channel. XX = 2D/Cube

#define F +texture2D(iChannel0,.3+p.xz*(s+=s)/6e3,-99.)/s

void main()
{
   vec4 p=vec4(gl_FragCoord.xy/iResolution.xy,1,1)-.5, d=p*.5, t;
    p.z += iGlobalTime*20.; d.y-=.2;

    for(float i=1.7;i>=0.;i-=.002)
    {
        float s=.5;
        //t=F+F+F+F+F+F;
        t = F F F F F F;
/*
    t+= texture2D(iChannel0,.3+p.xw*s/6e3,-99.)/s;s+=s;
    t+= texture2D(iChannel0,.3+p.xw*s/6e3,-99.)/s;s+=s;
    t+= texture2D(iChannel0,.3+p.xw*s/6e3,-99.)/s;s+=s;
    t+= texture2D(iChannel0,.3+p.xw*s/6e3,-99.)/s;s+=s;
    t+= texture2D(iChannel0,.3+p.xw*s/6e3,-99.)/s;s+=s;
    t+= texture2D(iChannel0,.3+p.xw*s/6e3,-99.)/s;s+=s;
*/
        gl_FragColor = vec4(1,.9,.8,9)+d.x-t*i;
        if(t.x>p.y*.01+1.3)break;
        p += d;
    }
}

]]
}

Here is the screenshot

Shadertoy`s texture is here:

For loops in shaders in OpenGL ES I believe get unraveled, even if it’s got a break in it. So if it CAN go 850 times, it actually expands the code for the code to do that linearly… It may be that webgl is doing some different style trickery here.

Another thing that may be worth trying is upgrade the shader to OpenGL ES 3.0 assuming you have a more modern ipad. This may be the delta between webgl and Codea, as the loop handling in ES 3.0 may have been improved and the webgl is quite possibly running in 3.0 mode.

That for loop in the fragment shader runs 850 times. Remember that gets executed per pixel. Surprised that it runs so well in web gl

@yojimbo2000 I was looking at that too, but there’s an if statement that breaks out of the for loop. So it not really looping 850 times per pixel. This runs at .9 FPS on my iPad Air. I put a counter in the draw function and increment it by 1. I print the counter value and what I noticed was that instead of printing the counter 1 by 1 each time the image changes, it prints the counter in groups of 3 each time the image changes. So the draw function gets executed 3 times before the shader draws the image which seems strange. I’m using the Small World:Icon as the tex1 image.

EDIT: I changed the .002 to .001 in the for loop and the draw counter was printing out in groups of 5 each image change. Going with .0005 caused it to print in groups of 9. I don’t know the workings of shaders, so I’m not sure what’s really happening with the draw count.

@spacemonkey so openGL ES 3.0 is not the default? How do we tell the shader to use 3.0?

@piinthesky See this link.

https://codea.io/talk/discussion/6991/gles-3-0-instancing-and-other-changes-to-shaders-in-codea-2-3-2

Thanks for your response.

@yojimbo2000 @dave1707 : I noticed that the display window size of shadertoy, it is about WIDTH/2 and HEIGHT/2.5 on the iPad screen, so I changed my code like this:

m1:addRect(WIDTH/4,HEIGHT/2+150,WIDTH/2,HEIGHT/2.5)

But the result(7~8 FPS) is still slower than WebGL.

@spacemonkey @piinthesky: I added a OpenGL ES 3.0 version, but the improvement of FPS is small.

Here is the OpenGL ES 3.0 version

function setup()
    parameter.watch("FPS")
    parameter.watch("1/DeltaTime")
    displayMode(FULLSCREEN)
    FPS = 60
    local tex1 = readImage("Dropbox:tex03")
    local tex2 = readImage("Dropbox:3D-grass2")

    -- OpenGL ES 2.0
    m1 = mesh()
    m1.shader=shader(f.vs,f.fs)
    -- m1.shader=shader(f.vs3,f.fs3)
    m1.shader.iResolution = vec3(2048,1536,2000)
    m1.shader.iChannel0 = tex1
    m1:addRect(WIDTH/4,HEIGHT/2+150,WIDTH/2,HEIGHT/2.5)
    -- m1:addRect(WIDTH/2,HEIGHT/2+150,640,360)
    
    -- OpenGL ES 3.0
    m2 = mesh()
    -- m1.shader=shader(f.vs,f.fs)
    m2.shader=shader(f.vs3,f.fs3)
    m2.shader.iResolution = vec3(2048,1536,2000)
    m2.shader.iChannel0 = tex1
    m2:addRect(WIDTH*3/4,HEIGHT/2+150,WIDTH/2,HEIGHT/2.5)
end

function draw()
    background(15, 14, 14, 255)
    FPS=FPS*0.9+0.1/DeltaTime
    m1.shader.iGlobalTime = ElapsedTime
    m1:draw()
    
    m2.shader.iGlobalTime = ElapsedTime
    m2:draw()
end

f = {
vs = [[
uniform mat4 modelViewProjection;
attribute vec4 position;
attribute vec4 color;
attribute vec2 texCoord;
varying lowp vec4 vColor;
varying highp vec2 vTexCoord;

void main()
{
    vColor = color;
    vTexCoord = texCoord;
    gl_Position = modelViewProjection * position;
}
]],

fs = [[
// from https://www.shadertoy.com/view/llsGW7#
precision highp float;
uniform lowp sampler2D texture;

uniform vec3      iResolution;           // viewport resolution (in pixels)
uniform float     iGlobalTime;           // shader playback time (in seconds)
uniform lowp sampler2D iChannel0;          // input channel. XX = 2D/Cube

#define F +texture2D(iChannel0,.3+p.xz*(s+=s)/6e3,-99.)/s

void main()
{
   vec4 p=vec4(gl_FragCoord.xy/iResolution.xy,1,1)-.5, d=p*.5, t;
    p.z += iGlobalTime*20.; d.y-=.2;

    for(float i=1.7;i>=0.;i-=.002)
    {
        float s=.5;
        //t=F+F+F+F+F+F;
        //t = F F F F F F;
        t = F F F F F F;
/*
    t+= texture2D(iChannel0,.3+p.xw*s/6e3,-99.)/s;s+=s;
    t+= texture2D(iChannel0,.3+p.xw*s/6e3,-99.)/s;s+=s;
*/
        gl_FragColor = vec4(1,.9,.8,9)+d.x-t*i;
        if(t.x>p.y*.01+1.3) break;
        p += d;
    }
}
]],

vs3 = [[
#version 300 es
uniform mat4 modelViewProjection;
in vec4 position;
in vec4 color;
in vec2 texCoord;
out lowp vec4 vColor;
out highp vec2 vTexCoord;

void main(void)
{
    vColor = color;
    vTexCoord = texCoord;
    gl_Position = modelViewProjection * position;
}
]],

fs3 = [[
#version 300 es
// from https://www.shadertoy.com/view/llsGW7#
precision highp float;

uniform vec3      iResolution;           // viewport resolution (in pixels)
uniform float     iGlobalTime;           // shader playback time (in seconds)
uniform lowp sampler2D iChannel0;          // input channel. XX = 2D/Cube

layout(location = 0) out vec4 fragColor;

#define F +texture(iChannel0,.3+p.xz*(s+=s)/6e3,-99.)/s

void main(void)
{
   vec4 p=vec4(gl_FragCoord.xy/iResolution.xy,1,1)-.5, d=p*.5, t;
    p.z += iGlobalTime*20.; d.y-=.2;

    for(float i=1.7;i>=0.;i-=.002)
    {
        float s=.5;

        t = F F F F F F;

        fragColor = vec4(1,.9,.8,9)+d.x-t*i;
        if(t.x>p.y*.01+1.3) break;
        p += d;
    }
}
]]
}

I tried the code on my iPad Air. The version 2 and version 3 shaders had about the same time. My 1/DeltaTime both gave values about .988 .

On mine (ipad pro) I get ~8 fps on the es2, and ~17fps on the es3.

Interestingly after I left it running a while it came up to 20-30fps, weird…

@dave1707 : You can try to use a smaller screen size, maybe get a different result.

@spacemonkey : The FPS depends on the amount of the calculus, when the screen is multi levels of mountains(kinds of colors), it needs more calculus, the FPS is small; when the screen is one mountain(almost black), it needs few calculus, the FPS will become big, but it will keep a very short period.

shadertoy renders out to a very small window in the browser on my ipad pro, and if I were to copy and full screen the same thing in codea it would be rendering at retina resolution. if you refer to the shadertoy app for ios, try hitting the HD button on a shader and that will show you a comparative result to full screen in codea.

Ive been lowering the render resolution by dividing the WIDTH and HEIGHT by a size variable on my main geometry quad, rendering that to an image using setContext(renderImage) and then using setContext() to go back to main draw window and sprite(renderImage,HEIGHT/2,WIDTH/2,HEIGHTsize,WIDTHsize)

@AxiomCrux : In my second version, I had set the window`s size the same as the WebGL:

m1:addRect(WIDTH/4,HEIGHT/2+150,WIDTH/2,HEIGHT/2.5)

and I had tried to set the resolution to half and half/2:

m1.shader.iResolution = vec3(2048/2,1536/2,2000)

m1.shader.iResolution = vec3(2048/4,1536/4,2000)

But the biggest FPS is about 10, can not reach the WebGL`s 30~35. and I am trying to find the reason. The iOS only allow lua run as an “interpret” mode, not a “compile” mode, I guess maybe this make it slow.

Can you tell me which speed you can get when you use setContext()? thanks.

@spacemonkey : Today I searched for WebGL and found that it is based on OpenGL ES 2.0, so we only need to compare it with OpenGL ES 2.0

@binaryblues I don’t think gles shaders are compiled (metal shaders are), so in theory that shouldn’t make a difference. I think @AxiomCrux has the right idea, upscaling the shader.

@yojimbo2000 : What I mean is that the iOS will run Codea`s lua code as an “Interpret” mode, not the shader, maybe this is the difference between WebGL and Codea.