Anagram program with foreign font - error question

akiva · January 21, 2013, 5:08pm

Im trying to modify the anagram program to add hebrew support - and i get he following error when i add hebrew letters to the word lit:

error: error: [string “-- Bitmap font class…”]:299: bad argument #1 to ‘char’ (invalid value)

Any idea why?

Thanks

Codeslinger · January 21, 2013, 6:49pm

Lua is not exactly UTF-8 ready. You can use the editor to type any kind of character but the processing within Lua is manual decoding stuff.

What the UTF-8 decoder in the Anagrams example does is translating UTF-8 chars into chars of the local codepage. E.g. I can use German umlauts because they have an equivalent in my local codepage. If they don’t have an equivalent then you cannot use them in Anagrams.

I’m not quite sure how the local codepage on the iPad works. I changed the language of my iPad to Russian and then entered Russian letters as a word in Anagrams but it didn’t work.

akiva · January 22, 2013, 5:28am

Thanks – I suspected as much.

Andrew_Stacey · January 22, 2013, 11:13am

That’s not quite what the UTF8 decoder is doing. It works on the byte-level so should be independent of the local codepage. The underlying lua functions work byte-by-byte. The point of the UTF8 code was to gather bytes together into UTF8 characters. So providing your words are in UTF8 encoding, the decoder ought to work.

That’s not to say that it will work. I think that since the anagrams code was added as an example project then there has been at least one reported bug in the UTF8 code.

I just did a simple check with the latest version of my UTF8 library with Hebrew and it worked fine. However, when I added a Hebrew word to the (latest version of the) Anagrams program then it did still complain, albeit in a different place to the above. I’ll have a play and see if I can get it to work.

Codeslinger · January 22, 2013, 12:10pm

Andrew, better start with another language so you don’t accidentally have to cope with invisible right-to-left marks. I found this very funny when I pasted the word Hebrew (in Hebrew, of course) into the editor and then moved the cursor around. Or are RTL marks supported by the UTF-8 lib?

Andrew_Stacey · January 22, 2013, 2:19pm

I’ve narrowed down the problem to the string.char function.

In short, the UTF8 library internally stores a UTF8-string as a table of numbers, each number being the UTF8 encoding of a character. To convert back to a string, we take those numbers and replace them by the actual character. The eventual conversion to a string is done by the string.char function which takes a number and converts it to a byte. If the number is in the right range, string.char can convert it directly. If the number is too big, we need to first split it into a sequence of integers and convert each to a byte and then concatenate. The UTF8 library does this correctly internally (the UTF8:tostring() method works as it should) but the Anagrams program does a bit of stuff itself which perhaps by rights should be done by the UTF8 library. This is the reordering of the letters. At various points, the reordered letters need to be converted back into a string and the Anagrams program does this directly, not via the UTF8 library. However, as I was testing with single-byte characters, I didn’t do it the Proper Way but just used string.char directly.

I tracked down a couple of places where this occurred and fixed them. Then the application (or rather, its latest incarnation) worked fine with Hebrew words. Except, that is, for the fact that I hadn’t taken into account the fact that the writing direction might be RTL so the “words” that I tried needed to be in the opposite order (needless to say, not speaking any Hebrew then my “words” were gibberish).

I’ll have a look to see if the packaged Anagrams application is patchable. If not, I’ll post my updated code.

Andrew_Stacey · February 2, 2013, 4:49pm

I’ve had a go at patching the code that comes with Codea (my own version has changed a bit since this was published). You can download it from http://www.math.ntnu.no/~stacey/documents/Codea/Anagrams.

If it works, we could suggest to @Simeon that this replace the version distributed with Codea.

Edit: Single file version (for pasting into a new project) at http://www.math.ntnu.no/~stacey/documents/Codea/Anagrams.lua. (I don’t remember if the paste-to-new-project is a 1.5 beta feature or is already in Codea.)