Solution for identifying what object was touched on a 3D scene

If you have a mess of objects on a 3D scene, some in front of others, how do you know which object a user touched?

I understand the math can only get you so far, because touching the screen is like touching a window - it gives you a direction (from your eye to the touch point) but doesn’t tell you how far away the object is, beyond the window. So you have to test which object(s) intersect with that direction vector, and pick the nearest.

I’ve found a way that I think is simpler and which will give pixel perfect accuracy. This is how it works.

Your touchable objects need to be in their own, separate meshes (this is a good idea anyway, if they have transparent edges, because you need to sort them by distance from the camera), with a stored id number 1-255.

You draw them normally until you have a touch on the screen, and then you store the x,y position of the touch. On the next draw, you don’t draw to the screen, but to a hidden image, and (using a shader) any mesh with an id number is drawn with a single colour, being (i,0,0) where i is the id number - and transparent pixels are discarded. Objects without id numbers are drawn normally. When this is done, draw tests the pixel colour of the hidden image at the touch point, and it it is of the form (i,0,0), it extracts i, the id of the object that was touched. Then it goes back to drawing normally again.

Here is a demo with a number of numbered circles, which, when touched, pop up a message. There is also a library image which doesn’t have an id and doesn’t react to touches (except to say “you missed”).
http://www.youtube.com/watch?v=hKFPTd3eQLo

What I like about this is that this should always work, there is only a minor impact on FPS because all this extra work only happens once, just after a touch. Also, the shader discards transparent pixels around each object, so you can touch objects which are behind others, very accurately (if you were using math instead, you would have to somehow check whether any object that intersects with the touch direction was touched in its transparent area, or not).

The code is here: https://gist.github.com/dermotbalson/5898835

It’s a bit messy at the moment, and a lot of it is just to create and draw all the objects. But if it’s useful, I’ll tidy it into something that’s easy to use.

Nice code @ignatz.
I had the same idea but gave up because i thought the lag would be too long.
Now i can check it with your code.
I have added

    ortho()
    viewMatrix(matrix()) 
    if img0 == nil then img0=readImage("Planet Cute:Character Princess Girl") end
    translate(HEIGHT/2,WIDTH*(1+math.cos(ElapsedTime*2*3.14))/2)
    sprite(img0,0,0)

at the end of your draw loop. Unfortunately, i was right on my ipad1: the lag makes it not suitable for real time action (you see the girl freezes). But maybe the code can still be optimized (i didnt study the code in details, but maybe the big image is created each time?)

No, the big image is only created once, after the touch, but feel free to improve it

The problem is that if you went the other way, using math, you would have the problem of transparent pixels (which you want to ignore, so the user can touch an image through the transparent pixels of a closer image). No math solution is going to easily tell you if you touched on a non-transparent pixel of an object, especially if it is translated and rotated!!

This solution deals with all that, because it gives you an accurate WYSIWYG map of the objects as they appear on the screen. So I’m not sure there is any better solution.

As to optimising, I think it needs to be a fragment shader (rather than a vertex shader) because the transparent pixels need to be removed exactly, and the id needs to be applied exactly to non transparent pixels.

Basically, it is a hard problem, so it may not be feasible on an iPad1. Even my iPad 3 seems to skip a little bit every now and then after a touch. So I think true 3D modelling will have to wait until we have more power. The same going for things like ray tracing - we can do pretty demos, but I haven’t seen a useful implementation yet.

Basically i agree the principle of your solution is the good one.
The trick for avoiding he lag could be to use a 10 times smaller image, with a scale(.1). Then with 1/100 of the pixels to draw, it could be ok. The price to pay would be a small loss of accuracy, but pbly acceptable.

Btw, i modified your code to create image only once, in the setup, and there is a visible gain.

Nice solution @Ignatz! is it possibly the hit-test in the draw loop causing performance issues because its getting a pixel from an image on every draw when there is a touch captured?

    --if we've had a touch, stop drawing on the image, and test for a hit on the image
    if T~=nil then 
        setContext()  
        message=testHit() --message showing what we hit
        counter=120
        d=vec2(T.x,T.y)
        T=nil
    end

It only does the hidden image on a touch ENDED, ie once. The performance hit will be the fragment shader, which does the important job not only of id-tagging every pixel, but also discarding transparent pixels, so you can touch objects behind them.

Reducing the size is a neat idea, you wouldn’t lose too much that way.

I’ll play with it a bit more to see if I can speed it up, and also clean it up so the code is easier to follow.

@Ignatz I have to say that the phrase that springs to mind here is:

Use a sledgehammer to crack a red herring.

Here’s a few random thoughts to start with:

  1. Why do you want to be pixel-perfect? Can you really touch the screen with that accuracy? I’ve worked with some vector-drawing software where I have to be pixel perfect with a trackpad to select an object and I can’t manage that. I’m even less accurate with a touch screen. So pixel-perfect is not the right goal of any selection tool. Robustness is a much worthier goal. Make it so that there’s a way for the user to cycle through the objects that are in the neighbourhood of the touch and you’ll save your users so much grief.

  2. The discard method in the fragment shader is actually quite slow so should only be used sparingly. If your textures have large regions of transparent pixels then the mesh should be tightened. Then if the only areas where you have transparent pixels are around the edges (as in anti-aliasing), actually you might want to draw these instead of discarding them to make the target bigger.

Now to more serious comments.

You aren’t getting round the maths with this. If you read the relevant section in my matrices tutorial then you’ll see that you are only tackling the first part of the problem: figuring out which object has been touched. Moreover, you’ll see that I say that this can be done without transforming the coordinates back to 3D but by testing the “shadow” of each object on the screen. So what you’ve done is implement step one of the process that I outline there. But that wasn’t the “heavy maths” part. The “heavy maths” part comes next: what are you going to do with the touch once you’ve figured out which object has claimed it? Here’s a challenge for you: move the touched object in such a way that its position on the screen tracks the touch. That’s where you need the “heavy” maths.

This also feels like a very inefficient method because you are drawing the entire screen rather than just the touch point. If you really want to do this, why not simply have a 1x1 image and translate it so that it corresponds to the touch point? Then any pixel outside the touch will be discarded much earlier in the OpenGL pipeline and thus much more efficiently.

(Also, you can allow for more meshes by using more channels.)

But if you take my point about pixel-perfection being the wrong solution, a better solution would be for each mesh to define a “bounding box” and say “I’ll accept any touch inside that bounding box”. That’s a very quick test to do and I’ll wager that maths is faster than drawing! This will then leave you with a short list of objects that were in the “touch zone”, or near enough that the person might conceivably have been intending to touch that object. Then you highlight one of them, giving the user the chance to see which object has been touched, and if it is the wrong one then they re-touch in the same zone and you present the next object instead, cycling round.

@Jmv38 - you can’t always create the background image just once, I was envisaging a game where you have things swarming about ahd you’re trying to touch them.

@Andrew_Stacey - where is the red herring? I have a 3D scene. I want to touch objects. That is a problem that needs to be solved, and I’m not the only one who’s asked about this.

I can’t get rid of most of the transparent pixels without creating lots more vertices to hug images more closely, which is not only tricky, but slows processing.

How do I use a bounding box for a translated and rotated object?

I agree my method won’t help with moving objects around the screen, but it works fine if all I want to do is touch to kill them. 8-X

My bottom line is this. I have not yet seen any practical code for identifying which object has been touched on a 3D scene. I have seen pages of totally incomprehensible math, but no evidence that it works, because there is no working code. It’s all very well to say I should use math, but I don’t understand it, and I’ll bet hardly anyone else does, either. So a solution I can’t use, is no solution at all.

If your way is better, I would love to see it working, and if it’s better, I’d love to use it. But until then, I have a method that works.

@Jmv38 - try changing this line in the shader

if (col.a<0.2) discard;

to this

if (col.a<0.2) gl_FragColor=col;

and see if it helps.

I have an iPad1 and it runs ok on there, either way/

@Ignatz Maybe I should have said “straw man” instead of “red herring”. Or maybe I shouldn’t have used such language at all - sorry about that. The point is that you aren’t avoiding the maths nor have you found a way around it. All you’ve done is use the GPU instead of coding it yourself.

The way to figure out which object has been touched is quite simple: project each object onto the screen, see which objects overlap the touch, then stack them from back to front and pick the foremost one. That’s the mathematical way. Anything that you’ve read that doesn’t say that has been the wrong thing to read. So I’d quite like to know where you’ve seen “pages of totally incomprehensible math” about this problem because I haven’t seen them (and I certainly haven’t written them). In my experience, mathematics is the shortcut and avoiding understanding the mathematics is always the long way around.

Back to the task in hand. What I describe above is precisely what the GPU is doing in your algorithm. So you are doing the mathematical thing, just doing it with the GPU. The problem is that you aren’t taking advantage of what the GPU does best - parallel processing - and so I doubt that you are actually gaining much by farming this off to the GPU, particularly as you are rendering the entire screen when all you are actually interested in is one pixel. I reiterate my point: do you really need pixel perfect matching? Are you that accurate when touching the screen?

To know how best to advise you about bounding boxes, I’d need to know more about what kind of objects you are dealing with. The principle is that if you define a convex region in 3-space containing your object then the image of that region on the screen will contain the image of your object. Translations and rotations are no problem.

Similarly with transparent regions. I’d be amazed if once they are defined then the more complicated meshes take longer than the simpler ones. In my experience you need a lot of vertices before things slow down.

Finally, to offset your complaint about the lack of working code I’ll offer up this video showing some code that I wrote a long time ago before we had matrices in Codea and I had to handle all of the projections myself. As you can see in that, then in that project there is a 3D shape and you can touch certain parts of it to select and manipulate them.

https://www.youtube.com/watch?v=Qkfo-ynYS7Y

@Andrew_Stacey - thanks for that. Wrt incomprehensible math, I was referring to stuff like this, which I don’t understand enough to turn it into code:
http://trac.bookofhook.com/bookofhook/trac.cgi/wiki/MousePicking

Suppose I just have simple rectangle meshes, whether they contain images or not. They may be rotated. I have them already sorted so I can work from nearest to furthest.

Now - how do I project them onto the screen using math, ie how do I figure out where the corner vertices of my rectangles have mapped to? That is the code I don’t have, and can’t write. Maybe it’s simple for you, but it doesn’t feel that way to me #-o

PS btw, no, I don’t need pixel perfect matching - I’m an actuary, and we are famous for rough estimates! But seeing as my method did it, I thought it was worth bragging about :wink:

@Ignatz Sigh when I read stuff like that I wonder why I bother. I shall gloss over the last line and home in on:

That’s a lot of steps, and it’s easy to screw up, and if you screw up just a little that’s enough to blow everything apart.

If you read something like that, run a mile. It means that the algorithm is sensitive to initial conditions and therefore a Bad One. I like to sum up two of the major developments in mathematics in the last century as:

Chaos Theory
: You need to be precise with initial conditions

Quantum Theory
: You can’t be precise with initial conditions

Conclusion: any time you have to know something precisely is an indication that you’ve done it the wrong way.

Knowing where the points end up does involve a little maths, but not a lot. You get the modelMatrix() * viewMatrix() * projectionMatrix() and then apply it to the vector. As Codea doesn’t have a matrix * vec3/vec4 primitive, we have to write one but it’s a two-liner:

function applyTransformation(v,m)
    m = m:translate(iv.x,iv.y,iv.z)
    return = vec3(m[13],m[14],m[15])/m[16]
end

Then you discard the z coordinate and transform the x and y from [-1,1] to [0,WIDTH] and [0,HEIGHT]. For example, x = WIDTH*(v.x + 1)/2.

(This is all in my tutorial on matrices, by the way. And I did put some working code at the end of that.)

Yep, I tried to read that. And the code. But I think I need a few semesters of of matrix math before I’ll understand it.

Wow! Some discussion.
I’ll try what you propose @ignatz.
When i say ‘create only once’ i mean the image(width,height) command, not the drawing in it. I assume memory allocation takes some time. But of course you must update the image content when the display is touched.
Another question, maybe you know: many times when i use a shade made by sby on the forum, the display image seems to be 2x my ipad screen. Is there somewhere in the shader where you define the actual size and it is for retina display? Do you see the pb i mean?

@Jmv38- I’m not aware of any special retina settings, those happen in the background as far as I know

Your image suggestion is a good one, thanks

I am going to try Andrew’s approach but it is very difficult. My objective is simply to get some code that will help people without specialised math knowledge figure out what has been touched.

@ignatz. Thanks for your quite intersting posts and code!
I have tried your suggection of changing the discard to something else, but it doesnt help much.
I have tried the 1/10 size image: it really helps, but my boucing girl still freezes visibly. On your ipad2 it could be not noticeable.
I understand your approch: even if it is overkill, in terms of computing power, because it could be done by managing many calculations as suggested by Andrew, it is not overkill in term of ‘saving basic user effort and motivation’. Actually that is what all those 3d functions provided by codea are doing: trying to ease our effort when drawing 3d, so we dont have to think too much about it. We could easily (…) recompute all that by hand, but this would distract some of our own brain-computing power from our main objective: getting the program finished! Having myself little brain-computing power available for this leasure, i appreciate your initiative and i hope (for my own sake) that it will succed! :wink:

@Jmv38- thanks. Now I must ask you a stupid question. How did you shrink the image by a factor of 10 (not the image itself, but all the meshes you drew on it - did you redefine their sizes before drawing them)?

This is not stupid at all!
This must be done at many locations:

  • image create.
  • setcontext: scale(0.1).
  • coordinates check: /10.
  • camera settings: /10.
    To be frank, i havent changed the camera settings yet, so it cant work (i always get the missed message). But it can surely be done, and i just wanted to check the speed first (i dont need it to work to check that).