Codea app performance on iPad Pro 3rd Generation worse than on older model

I’ve just configured a new 3rd generation iPad Pro 11’ (1TB storage, 16 GB RAM) with Codea and am running an app I’ve previously developed that calculates and renders a full screen’s worth or small voronoi polygons (heavy on the floating point math) and notice that it runs SO MUCH SLOWER than the same exact code did on a 1st generation iPad Pro 11’ (256GB storage, 4GB RAM). Does anyone have any insights into what’s going on here?

Wow, @rvmott, you’ve got a super new iPad! I’ve got the the lower line version of your pair (original 9.5 iPad Pro and more recently the 8GB 11 iPad Pro), and 3D models consistently seem to render much faster on my new iPad Pro. If you post the code you are testing, I’d be happy to compare the speeds of rendering on my pair if that helps.

@rvmott - not sure which gen is metal, but if you have a metal system perhaps the need to convert from old to new graphically is slowing the process.

Voronoi code posted here. Curious if anyone else experiences the same problem.

how much slower is slower? it renders for me in about 1-2 minutes

~~one thing to keep in mind is the pixel count is much higher in newer models so the use of HEIGHT and WIDTH will be different, probably making your loops much longer; ~~actually never mind, the 1st gen ipad pro has the same resolution

Thanks everyone for your feedback on this so far. The code runs to completion in about 17 seconds on my 1st generation device, but takes around 47 seconds to complete on the newest, 3rd generation device.

I tried it on my iPads. Is there any way to set the size so it can be the same on different devices. That way it would be a better speed comparison.

PS. It also cancelled on the air 4 when it should have completed.

iPad Pro 1 54 sec.
iPad Air 3 23 sec.
iPad Air 4 69 sec.

@rvmott, similar outcome for me:
iPad Pro (new 3rd gen 11 inch 8GB, iOS 15.4): 48 sec
IPad Pro (original 9.7 inch, iOS 15): 24 sec
(Original iPad slightly smaller resolution but wouldn’t think should make that much difference).
That is strange— not sure if everyone is comparing same iOS to iOS (in case that is a confounding variable).

Thanks @SugarRay for running the same tests on your devices!

@John - are there any settings that I am overlooking for optimizing performance with my app?

@rvmott I’m not exactly sure why the performance drops so much on a newer device. But I did try commenting out line 362 (unnecessary call to setContext()) in the code you posted on my M1 Max (a very new device) and got significant performance gains.

With setContext(): 105.4s
Without setContext(): 8s

That’s a > 13x speed up! So what’s happening here? I suspect you thought that image:set(x, y, c) needs set context to to work (it doesn’t). This actually causes significant performance issues due to how image data is stored and transferred to and from the CPU and GPU

In order to enable direct access to pixel data from Lua we have to keep a local copy of the image in CPU memory. We do this by caching the data locally and using flags to keep track of the last time the image was manipulated on either the CPU (via setting pixels) or GPU (via setContext). If Lua has modified the image, we send it to the GPU, if the GPU has modified it we read that back

Reading from the GPU is generally slower, and because we want the data right now we also have to stall the GPU (tell it to drop what its doing and retrieve the image data right away). By setting a single pixel and calling setContext one after the other you get image data being copied back and forth multiple times without actually needing to. If you only ever want to directly set pixels you should use image:set() OR draw a pixel sized rectangle via setContext, never do both

As for the large different in performance on newer devices it could be a change in how things like glReadPixels and glFlush are handled on newer chip designs which give worse performance for atypical usage (i.e. stalling and reading/writing the same image constantly). Apple doesn’t support OpenGL officially anymore so I wouldn’t be surprised if they broke something without realising it

Here’s a Voronoi diagram I wrote that uses setContext. See how this compares on the different devices. On my iPad Air 3 it takes about 67 seconds. It shows a count at the top that stops at 250 then shows the seconds.

-- Voronoi diagram

viewer.mode=FULLSCREEN

function setup()
    s=require("socket")
    img=image(WIDTH,HEIGHT)
    limit=30
    ms,mc,mr=math.sin,math.cos,math.rad
    tab,col={},{}
    fill(255)
    for z=1,limit do
        r,g,b=math.random(255),math.random(255),math.random(255)
        table.insert(col,color(r,g,b,255))
        x=math.random(50,WIDTH-50)
        y=math.random(50,HEIGHT-50)
        table.insert(tab,vec2(x,y))
        setContext(img)
        ellipse(x,y,16)
        setContext()
    end
    rad=8
    st=s:gettime()
end

function draw()
    background(0)
    sprite(img,WIDTH/2,HEIGHT/2)
    if rad<250 then
        incRad()
        en=s:gettime()
        fill(255)
        text(rad,WIDTH/2,HEIGHT-40)
    else
        fill(255)
        text(en-st,WIDTH/2,HEIGHT-40)
    end
end
 
function touched(t)
    if t.state==BEGAN and #tab<limit then
        table.insert(tab,vec2(t.x,t.y))
        setContext(img)
        ellipse(t.x,t.y,16)
        setContext()
    end
end

function incRad()
    rad=rad+1
    for a,b in pairs(tab)do
        drawCirc(b.x,b.y,col[a])
    end
end

function drawCirc(xx,yy,c)
    setContext(img)
    for z=1,360 do
        local x=(mc(mr(z))*rad)//1
        local y=(ms(mr(z))*rad)//1
        if x+xx>0 and x+xx<WIDTH and y+yy>0 and y+yy<HEIGHT then
            local r,g,b,a=img:get((x+xx+1)//1,(y+yy+1)//1)
            if r+g+b==0 then
                fill(c)
                ellipse(x+xx,y+yy,6)
            end
        end
    end
    setContext()
end

Thanks @John for taking the time to diagnose the issue and for offering the insights into the unnecessary and costly setContext() calls, including the background on the CPU-GPU interactions. Codea had been a great platform for me to explore my hobby of coding fractal-based landscapes and simulations, which sometimes require manipulating pixels in multiple image maps simultaneously, and the background you have provided will guide me towards more efficient development moving forward.

@dave1707 cool, I’ll have to check that one out

@rvmott No worries, I think it’s pretty interesting when we run into some quirky hardware differences like this.

I’ve done a bunch of tests with Codea 4 using compute shaders, which would let you run something like this in realtime. I’ve got an implementation of the Jump flood algorithm (JFA) for calculating signed distance fields which I use for drawing outlines in the scene editor

Some info: https://blog.demofox.org/2016/02/29/fast-voronoi-diagrams-and-distance-dield-textures-on-the-gpu-with-the-jump-flooding-algorithm/

It can also be done with regular shaders in Codea 3.x, it’s just using the rasterisation pipeline instead

@john M1 Max? How are you doing that?