I’ve just tried some more examples with jmv38’s code.

First, the good news: a single for-loop *is* faster than a double for-loop.

Next, the bad news: the difference is so insignificant that it is wiped out if you have to do *anything* to compensate for the use of a single loop instead of a double loop.

With identical code inside the loop I get the following figures:

- Double loop: 1.91815s
- Single loop: 1.91681s

(total iterations the same as jmv38: 16e6; I’m on an iPad2)

So that’s a gain of less than 0.01%

Now suppose I really want to get at the variables in the double loop. So something like the original `dummy = i + j`

. (Note: there’s no need to make `i`

and `j`

local as that’s automatic inside the double loop, and `dummy`

should be made local.) The double loop simply has `dummy = i + j`

. The single loop has to compute the `i`

and `j`

. There are a variety of ways to do this. The code in jmv38’s example is very expensive. A cheaper way to do it is with local variables `x`

and `y`

which are initially set to `1`

and `1`

and an auxiliary (local) variable `w`

set to `width + 1`

; then in the loop (at the end) we do:

```
x = x + 1
if x == w then
y = y + 1
x = 1
end
```

The times are now:

- Double loop: 1.9252s
- Single loop: 4.13666s

The above might seem a little wasteful as we always do `x = x + 1`

and sometimes throw away the result. Surely an `if ... then ... else ... end`

would be better!

```
if x == w then
y = y + 1
x = 1
else
x = x + 1
end
```

(here `w`

is set to `width`

) gives 4.38278s! So we lose .2s by introducing an `else`

branch into our conditional. In fact, removing the conditional altogether (whence just having the `x = x + 1`

statement - whence the loops are no longer functionally equivalent) drops the time to 2.4726s. So the conditional is adding over a second to the time whilst the addition operation is adding only half a second. We can narrow this down even further by simply putting in a conditional and nothing else:

```
if x == w then
end
```

(so no increment on `x`

) gives 3.57404s.

I wondered about optimising in other ways. How about caching the values of `i+j`

and simply using a look-up? The point of this would be that if you are going to do the same loop lots of times it might be quicker to save the results on the first loop and reuse them. Sadly, no. Storing `i + j`

in a table and then putting `dummy = t[i]`

in the loop does not help: I got 3.40796s for that (just for the look-up loop, that is).

Now there are lots of ways to compute `x + y`

from `i`

(where `x = i%width + 1`

and `y = math.floor(i/width) + 1`

) using a variety of tests and operations. I’ve yet to find one that brings the total computation down to anywhere near comparable to the double loop.

However, this experimenting does say that it is worth considering alternative ways to do *necessary* computations inside the loops. For example, % is more expensive than / by a long way, but using `math.floor`

even more so (even when localised).