A quick emoji (and other non-standard character) portabilizer, using Lua 5.3 UTF-8 library

I had some issues yesterday sharing the code for my game jam entry, because non-standard characters (such as emoji) can’t be copied properly from raw code bins such as GitHub gist raw pages (and maybe also from Codea Talk?). Fortunately, as the latest version of Codea now has the UTF-8 library and \\u{} escape code that comes with Lua 5.3, this problem is now quite easy to solve. I wrote a quick portabilizer, that turns the contents of your clipboard, or text that you input, into Lua 5.3 compliant utf escape codes, for easier code sharing. Update: optional text input.

Emoji portabilizer

--# Main
-- Emoji portabilizer v1.1. by Yojimbo2000. Requires Lua 5.3
-- how to use: copy to the clipboard a bunch of emoji or other non-standard characters that you want to make portable. Alternatively, you can enter text into the InputString box when you run the app.
-- run this program. It will return to the clipboard the Lua 5.3 compliant \\u{xxx} utf8 escape codes for the characters that you copied or entered
-- There is also an optional verbose output. It places "character = \\u{xxx}, " in the clipboard
-- paste the clipboard back into your code. 
-- Emoji and other non-standard characters will now be portable (ie not get corrupted on gist raw pages, pastebin, Codea Talk etc )

function setup()   
    parameter.text("InputString", pasteboard.text, function()
        verbose, concise = "", ""
        for _,code in utf8.codes(InputString) do --iterator returns utf code as decimal
            local hex=string.format("%x", code) --convert to hex
            local escape ="\\\\u{"..hex.."}" --add (escaped) utf8 escape code
            local char=utf8.char(code) --grab the character (for verbose out)
            verbose = verbose..char.." = "..escape..", "  --concatenate
            concise = concise..escape
        print ("concise output: "..concise)
        print ("verbose output: "..verbose)
        pasteboard.copy(concise) --return to clipboard
        print ("Concise output copied to clipboard")
    parameter.action("Copy verbose output", function() pasteboard.copy(verbose) end) --optional output

I’m trying to use the new UTF-8 libraries that ship with Lua 5.3. The Lua reference manual says this about UTF-8:

The UTF-8 encoding of a Unicode character can be inserted in a literal string with the escape sequence \u{XXX} (note the mandatory enclosing brackets), where XXX is a sequence of one or more hexadecimal digits representing the character code point.

I’m trying to use these with emoji. ie, take “MONKEY FACE”
UTF-8 is F0 9F 90 B5, Unicode is U+1F435 (U+D83D U+DC35)

But when I try "\\u{F09F90B5}", Codea gives me a “UTF-8 value too large” error.

I don’t really know what I’m doing with UTF-8, can anyone help?

It looks like you have the wrong code?

This works for me in Lua:

s = "\\u{1F435}"

I was using the “Character Viewer” on the Mac (the one you access from the languages menu bar item) to get the code. Oh wait, I think I see. 1F435 is the first code in the “Unicode” section, following the “U+” bit

Thanks @Simeon I’ve got it working now

I’ve added a quick portabilizer app to the post at the top of the page, using the new UTF8 library