Gesture Recognition Library

Dear community,

As a fairly new Codea user I have been learning Lua and generally getting familiar with the platform.

I wanted to investigate the ability for gesture recognition in a project I am working on, initially for basic ‘swipe’ detection, but also for some more sophisticated gestures…

I came across the following algorithm: http://pokristensson.com/increc.html which was exactly what I was looking for and had the bonus of a good Java implementation.

I have ported the algorithm to Lua/Codea and it seems to work really well. I contacted the author and with their permission have made it available here: https://github.com/brookesi/Codea/tree/master/src

Usage of this algorithm requires attribution to the original paper, so I have added the full copyright notice from the original Java source to the top of CGR.lua. If you use this please do not modify the header.

I do not fully understand how the algorithm works, but it seems to work really well. I have provided a basic Main.lua to demonstrate the algorithm.

Basically you define a set of gesture templates where each template is a series of coordinates in a virtual space. You then register the templates with the library and call the CGR_recognize(…) function with the gesture coordinates generated by the user using the Codea ‘touch’ API.

Hopefully it is fairly clear from the code and example how this works.

The algorithm is actually designed for continuous recognition, e.g. you could call the recognise function each time a swipe coordinate is added to the array, which then returns you an array of probabilities in descending order for each templated gesture… In my example I only call CGR_recognize(…) when the touch event is finished as I suspect performance may be an issue…

I made some minor optimisations to the algorithm during the Lua port to reduce table creation for points, using vec2() instead of a table for each Point with an x and y key, otherwise it is basically as originally written by the authors…

I am new to Lua, but a long time Java programmer so there may well be other optimisations. The copyright allows for general usage (including commercial) and modification so long as the paper is cited…

Notes, warnings and caveats:

  1. My Main.lua implementation uses CurrentTouch rather than the touch() callback, so I occasionally find the gesture touch finish event does not process correctly, experts on this forum will probably know why that is, I am doing touch processing in the draw() callback, so I think that may provide a clue…

  2. The algorithm uses a coordinate space where (0, 0) is in the top left corner of the virtual grid. Codea/iPad has (0, 0) in the bottom left corner so when storing the user coordinates you need to use HEIGHT-y to ‘flip’ the y-axis, if that makes sense…

  3. As I mention in the porting notes in CGR.lua, registering similar templates causes the probabilities to decrease as the algorithm reasonably isn’t so sure what gesture you mean, so this is a potential issue with signalling whether the user has performed a ‘correct’ gesture as CGR will always return probabilities for all templates ranked in descending order, so some thought would be required for ‘real world’ usage…

  4. I have left the implementation as a Lua script with the required functions as globals e.g. CGR_recognize(…) because I ported this on Windows where I did not have the Codea class mechanism, the CGR_ prefix is a lazy namespace as it were…

  5. The function parameters are a bit opaque in places as e.g. p1, p2,…pn as I wrote myself a Java2Lua converter to create the script ‘skeleton’ by parsing the Java static classes and methods using Java Reflection, then implemented each function in Lua by hand (1-based indexing, arghhhh!!!)

I hope this may be of interest to the Codea community at large. I am also new to Github so I hope you can see the files. Please chat on this discussion if you have issues,

Best regards,
Simon Brooke

@Brookesi after looking through this, it looks like a great way of learning the syntax of lua if your porting and changing code.

  1. CurrentTouch isn’t very good, its use (for me) is if I need to use it in the draw loop, but even then I would rather use touched(t) and store the touches in a table to use with draw.

  2. This isn’t much of an issue but a good exercise would be to translate the coordinates to the bottom left by going through the code and changing the all of the positions to the original codea positions, if that makes sense…?

  3. Skip three.

  4. The underscore (_) character isn’t a problem in lua and I use it mainly for variable names when I run out of names, the other use is with meta tables but you probably won’t need to use these for a while…

Anyway this seems very useful for a new type of action to trigger certain callbacks or change values, I’m thinking of a few ideas myself for this, thanks!

A quick question though, how good is this algorithm at recognising different patterns that are quite similar in shape? And does it allow for custom gestures?

Hiya, comments for the above:

  1. I used CurrentTouch just as a copy of one of the tutorials, the proper touch API is the way to go, you just need to store the x and y points from the drag into a table then pass that to the recognise function when the touch/drag is complete…

  2. Agreed, the port was pretty bruising though :wink: but yes, coud easily change the create point function to test that HEIGHT ~= nil and do the math there

  3. Yeah, lazy ‘namespacing’ on my part rather than using Codea classes

Regarding your final point, the algorithm is very accurate even for similar patterns, but the probabilities drop to e.g. 0.25 and similar templates score similarly, that was my point #3 in my original post, how do you tell if a user has actually performed a gesture, as even a random ‘squiggle’ on the screen can generate reasonable probabilities for one or more templates… My gut feeling is to keep registered gestures relatively different so you can reasonably say, if results[1].prob > 0.5 then it’s good…

All gestures are custom, you just define the array of virtual points making up your gesture, graph paper is useful here!!! Note all gestures are evaluated in the order they are declared, so e.g. a clockwise square is a different gesture to an anti-clockwise square based on the order of the coords you provide…

Si

So the algorithm could be isolated to a certain space on the screen to stop it triggering gestures if the user touches anywhere else on the screen?

Y-e-s, but not in the library itself, you would have to write your touch and drag detection so you knew when you had made a touch/drag gesture in a specific screen area, then store those points in a table, then call recognize() on that point array when the touch event completes…

The bottom line is that CGR knows nothing except what you give it; initialisation with a set of possible gestures to recognise, then a table of user coords to run the algorithm against… It doesn’t know anything about screen geometry etc etc, it just works in a virtual coordinate space…

Hope that makes some sense…

That makes sense, sounds easy to use aswell I’ll give it a shot in my project, thanks!

Pretty interesting. Opens up a lot of possibilities for gaming!

Thanks, that’s what I had hoped. The driver for doing this was a discussion on a podcast I heard about how on-screen HUDS where you have to keep your fingers on specific screen elements for controls can lose immersion (when your fingers move and you have to ‘recallibrate’ yourself), so actually using gestures anywhere on the screen is more intuitive…

Also the ability to do spell casting type games where you build combos up from a dictionary of gestures for example…

@brookesi - =D>

@brookesi - here’s my take on gesture recognition from a while back
http://twolivesleft.com/Codea/Talk/discussion/1350/simple-character-recognition-input-demo#Item_7
Initially I was using it to recognise letters but have been messing about with a spellcasting type game.
http://youtu.be/MiViEFU1_2A

Unfortunately you can’t see the gestures but these are traced out with one or more fingers in the bottom left.

Looks interesting, I’ll take a look. The downside (I suspect) of my port is that it may be a bit processor-heavy with all the normalisation and stuff it does, but I think it’s ok for drag-finish events. The original algorithm was written with continuous recognition in mind, e.g. During a drag event you would call the recognise function and ‘home in’ on the likely gestures…

@West, in your demo, have the screen reflect the traces on the spell casting “page” - a simple line() call could do wonders…also you mention it’s a bit processor intensive - what areas of the code do you feel could use trimming?

@aciolino - it already ‘trails’ stars but it doesn’t show up to well on the YouTube video :frowning: In future I intend to leave a scorched effect on the paper :slight_smile: Also, it was @brookesi’s code rather than mine that was mentioned as processor intensive.

@West nice demo, the landscape looked a big magical :slight_smile:

I’ve put together a simple point gatherer and tested this code - it is very slow on gesture recog, so a real-time version of code this ain’t! But in general, it does OK for recognition and is better than what I had (nothing).

I am noticing that if I have a pattern with a lot of points, and a lot of possible gestures, the certainty goes way down, to like 20%, on the higher-resolution patterns.

Hiya, yes, that is true, comment from one of my posts above:

…Regarding your final point, the algorithm is very accurate even for similar patterns, but the probabilities drop to e.g. 0.25 and similar templates score similarly, that was my point #3 in my original post, how do you tell if a user has actually performed a gesture, as even a random ‘squiggle’ on the screen can generate reasonable probabilities for one or more templates… My gut feeling is to keep registered gestures relatively different so you can reasonably say, if results[1].prob > 0.5 then it’s good…

I agree that this may make it of limited use, as for simple gestures you can likely just do the endpoint math for e.g. directional stuff, but even for simple gestures with say, up to 5 points with crossovers, eg spell casting runes it may have some mileage…

Thank you for taking the time to have a look :wink: