Markdown Codea : rich text formatting

UPDATE: latest code at the gist: https://gist.github.com/Utsira/9ba147647374a6bd2661

Have you ever wished you could easily format text in Codea, adding different typefaces, bold, italics etc?

I thought I would adapt part of the Markdown syntax, as it’s very easy to write and read (even in the Codea editor), and it’s well known (it’s used on many forums, including this one).

There is a full lua port of markdown, here: https://github.com/stevedonovan/LDoc/blob/master/ldoc/markdown.lua

(ie for converting markdown to HTML)

My code is for outputting to the screen, using Codea’s text API, and just supports a subset of markdown, namely different levels of headers (including different typefaces), indented block quotes, italic, bold, and bolditalic nesting, typographer’s quotes and long em en-dashes.

Here is a screenshot of what it looks like:

md

For comparison, I’ve put the same text below, to see how it is rendered on this board.

First though, a question about Lua pattern matching (which I’m relatively new to). I’m using a gmatch iterator to parse every word (for custom text wrapping, as Codea’s built in wrapping doesn’t tell you where the last line ends, or give you control of first line indent), and to parse the Markdown control characters:

for element, control in string.gmatch(paragraph, "(.-)([%s#*>])") do --separate at white space, #, *, >

The problem is, this only picks up a single instance of # or * at a time, so that ### (Heading 3) registers as 3 separate #s. I’ve tried adding the + magic character to the char-set [%s#*>]+, ie to try to catch one or more #, but it produces very strange results. Can anyone work out how to extend the char-set to capture 1 or more instances of the control character?

#Top level heading — which has an emphasised word

Some body text

##Second level heading, with a strongly emphasised bit.

Here’s an inline “quotation that nests ‘another quotation’” in the middle of a sentence.

###Third level heading, with some very, very strong emphasis

Let’s add some emphasis, and some strong emphasis, and some really, really strong emphasis here

Here’s a block quote–a longer quotation that is indented in its own block–with some emphasis, strong emphasis, en-dashes — not to mention em-dashes — said by someone famous.

Or you can nest italics inside bold. Cool, no? Unfortunately jumping straight in with bold italic text doesn’t quite work yet. Note the typographer’s ‘quotes’ for the apostrophe and for single quotation marks. Also works for “double” quote marks

--# Main
-- Markdown

function setup()
    fontSize(20) --use the regular fontsize command to set the size of the body level text
    textImage = Markdown{
    width = WIDTH *0.9, --wrap width
    text = { --nb each markdown paragraph is a separate item. This is because the new line character \
 is not particularly easy to write or to read in the Codea editor. nb2 double quotes must be escaped: \" this messes up Codea's syntax highlighting, but it does run
    "#Top level heading --- which has an *emphasised* word",
    "Some body text",
    "##Second level heading, with a **strongly emphasised** bit.", 
    "Here's an inline \"quotation that nests '*another* quotation'\" in the middle of a sentence.",
    "###Third level heading, with some **very, *very* strong emphasis**",
    "Let's add some *emphasis*, and some **strong emphasis**, and some *really, **really** strong emphasis* here",
    "> Here's a block quote--a longer quotation that is indented *in its own block*--with *some emphasis, **strong** emphasis*, en-dashes --- not to mention **em-dashes** --- said by someone famous.",
    "**Or you can nest *italics* inside bold**. Cool, no? Unfortunately jumping straight in with ***bold italic*** text doesn't quite work yet. Note the typographer's 'quotes' for the apostrophe and for single quotation marks. Also works for \"double\" quote marks"}
    }
    
end

function draw()
    background(40, 40, 50)
    sprite(textImage, WIDTH*0.5, HEIGHT*0.5)  
end

--# Markdown
--markdown-esque string formatting

local style = { --simple cascading styles
    body = {   --body format
        font = "Georgia", --will crash if font doesnt have bold, italic, boldItalic variants
        col = color(0, 129, 255, 255)
    }, 
    heading = {     --the array part of style.heading contains styles for heading1, heading2, etc (overrides global)
        {size=2}, --Heading1. Size is proportion of body size
        {size=1.5}, --Heading2
        {size=1.2}, --Heading 3
        all = { --global headings settings
            font = "HelveticaNeue", 
            col = color(0, 177, 255, 255)
        }
    },
    block = {  --block quotes
        size = 0.9,
        font = "HelveticaNeue",
        col = color(120, 132, 144, 255)} 
}

local face --name of base font currently being used
local size --size of base font

function Markdown(t)
 
    textMode(CORNER)
    local _, parSep = textSize("dummy") --paragraph separation defined by body style
    size = fontSize() --base size of body text
    
    local img = image(t.width, HEIGHT) --height
    setContext(img)
    textWrapWidth(0) --we need to turn off text wrapping and implement our own because the built-in wrapping does not give us control over the point at which the text starts (first line indentation), nor tell us where the last line ends.

    local cursor = vec2(0,HEIGHT)

    local italic = false
    local bold = false
    local headLevel = 0 --level of heading
    local indent = 0 --for block quotations
    
    for _, paragraph in ipairs(t.text) do
        --PRE-PROCESS TYPOGRAPHY
        paragraph = string.gsub(paragraph, "(%S+)'", "%1\\u{2019}") --right single quote. Do this first in order to catch apostrophes
        paragraph = string.gsub(paragraph, "'(%S+)", "\\u{2018}%1") --left single quote
        paragraph = string.gsub(paragraph, "%-%-%-", "\\u{2014}") --em-dash
        paragraph = string.gsub(paragraph, "%-%-", "\\u{2013}") --en-dash
        paragraph = string.gsub(paragraph, "\"(%S+)", "\\u{201C}%1") --left double quote
        paragraph = string.gsub(paragraph, "(%S+)\"", "%1\\u{201D}") --right double quote
        --RESET TO DEFAULT BODY FONT FOR NEW PARAGRAPH
        style.set(style.body)
        cursor.x = 0
        headLevel = 0
        indent = 0
        fontSize(size)
        paragraph = paragraph.."\
" --add return (this also allows final part of line to be captured)
        local cursorSet = false --set to true once initial cursor position for paragraph is set according to font size, paragraph separation
        local prevControl, prevPrevControl --remember the previous control characters, in order to count number of * etc
        for element, control in string.gmatch(paragraph, "(.-)([%s#*>])") do --separate at white space, #, *, >
            if control==" " and prevControl~=">" then --if whitespace
               element = element.." " --put spaces back in
            end
            --HEADINGS
            if control=="#" then
                style.set(style.heading.all) --global heading settings
                headLevel = headLevel + 1
                style.set(style.heading[headLevel]) --level specific settings
            end
            --BLOCK
            if control==">" then
                indent = size * 3 --indent paragraph
                cursor.x = indent
                style.set(style.block)
            end
            local w,h = textSize(element)
            if t.debug then print(element,control) end --debug print
            if not cursorSet then --place first line of paragraph (paragraph separation etc)
                cursor.y = cursor.y - (h+parSep)
                cursorSet = true
            end
            --WRAPPING
            if cursor.x + w > t.width then --if word will take us over edge
                cursor.x = indent  --carriage return
                cursor.y = cursor.y - h
            end
            text(element, cursor.x, cursor.y) --print word
            cursor.x = cursor.x + w
            --BOLD AND ITALICS
            if control=="*" then 
                --[[
                if prevControl=="*" and prevPrevControl=="*" and element=="" then --three asterisks with nothing separating them      
                    italic = not italic --reinstate previously cancelled-out italics false flag
                    print ("BO-IT")
                else
                  ]]
                if prevControl=="*" and element=="" then --two asterisks with nothing separating them
                    bold = not bold
                    italic = not italic --cancel out previous italics false flag
                else
                    italic = not italic
                end
                if bold and italic then
                    font(face.."-BoldItalic")
                elseif bold then
                    font(face.."-Bold")
                elseif italic then
                    font(face.."-Italic")
                else
                    font(face)
                end
            end
            prevControl = control --remember control code, to count no of asterisks
          --  prevPrevControl = prevControl
        end

    end
    setContext()
    return img
end

function style.set(sty)
    for func, val in pairs(sty) do --set font features for whatever keys are in the style table
        style[func](val) 
    end
end
--the function names below correspond to the bottom level keys in the style table, eg font, col, size

function style.font(f)
    face = f
    font(face)
end

function style.col(col)
    fill(col)
end

function style.size(s)
    fontSize(size * s)
end

have you read this?
http://codea.io/talk/discussion/5605/patterns-tutorial
maybe it can help you.

Thanks for the links @Jmv38 I’d read a tutorial very similar to the one in your comments, but yours went further, in explaining that magic characters couldn’t be used within a char-set. That’s very helpful.

I think I understand what’s happening now.

When I make the char-set greedy, eg [%s#*>]+ instead of just grabbing, say ** (two asterisks) it’s capturing ** (white space, two asterisks). I can work with that.

But is there a way of making sure that the captured characters are all the same (ie a run of asterisks OR white-space, but not a combined white space + run of asterisks)? Like is there a way of incorporating a boolean or into the middle of a char-set in the iterator? eg (pseudo-code) %s OR > OR #+ OR *+?

post some examples of strings and the result you want on them, with good and bad cases, so i can try and understand what you want to do. Patterns are tricky, and cannot do evreyrthing you want…

Ok, I think I’ve worked it out. I have a compound control string, captured with [%s#*>]+, which may contain a combination of white space, hashes, asterisks etc. I then use string.find to see what’s there, and the second returned value of string.gsub to count the number of hashes and asterisks. It now works a lot better (ie the three asterisk bold-italic works), it’s probably quicker too as the iterator loops a lot less.

Anyone have any feature requests for markdown syntax they want to use in Codea? Bullet points maybe?

V3: bullet points, stricter (faster) parsing


--# Main
-- Markdown

function setup()
    setText()
    y,vel = 0,0
    scrollY={}
end

function setText()
    fontSize(20) --use the regular fontsize command to set the size of the body level text
    textImage = Markdown{
    -- debug = true, --debug print
    width = WIDTH *0.9, --wrap width
    text = { --nb each markdown paragraph is a separate item. This is because the new line character \
 is not particularly easy to write or to read in the Codea editor. nb2 double quotes must be escaped: \" this messes up Codea's syntax highlighting, but it does run
    "#*Markdown*-like text formatting --- in **Codea!**",
    "Have you ever wanted an easy way to format text --- adding *italic*, **bold**, ***bold-italic***, different type faces, font sizes and colours, indented block quotes, plus typography features such as \"smart quotes\" and em-dashes, all of them nestable within one-another --- on the fly?",
    "##Well now you can, with **Markdown Codea**.", 
    "> *Try switching the orientation of your device to test the hand-made text wrapping feature! Touch the screen to scroll the text*",
    "###\"But --- *what **is** this **Markdown**?!?*\" I hear you yell",
    "Markdown is a way of adding rich formatting, such as:",
    "- *Emphasis*",
    "- **Strong emphasis**",
    "- *Really, **really** strong emphasis*",
    "- Or **really, *really* strong emphasis** if you prefer",
    "- Block quotes, different headings...",
    "- Oh, and ***bullet points!***",
    "using plain text. So it's great for using in plain-text environments such as code editors. As *Markdown*'s creator John Gruber said: ",
    "> The overriding design goal for *Markdown*'s formatting syntax is to make it as **readable** as possible. The idea is that a *Markdown*-formatted document should be publishable **as-is, as plain text, *without* looking like it's been marked up with tags** or formatting instructions",
    "**But the best thing about Markdown is --- *you already know how to use it***. It's used on lots of forums, including *Codea Talk*. I've thrown in some nice, *Pandoc*-inspired extras such as typographer's quotes for the apostrophe and for 'single quotation marks' and \"double\" quote marks, plus en--dash and em---dash"}
    }
    
end

function draw()
    background(40, 40, 50)
    if y<0 then vel = math.abs(vel) * 0.5
    elseif y>HEIGHT then vel = -math.abs(vel) * 0.5
    end
    if not touching then vel = vel * 0.95 end
        y = y + vel
    sprite(textImage, WIDTH*0.5, y)  
end

function touched(t)
    if t.state==ENDED then
        local av = 0
        for i=1, #scrollY do
            av = av + scrollY[i]
        end
        vel = av / #scrollY
        scrollY = {}
        touching = false
    else
        if #scrollY>10 then
            table.remove(scrollY, 1)
        else
            scrollY[#scrollY+1]=t.deltaY
        end
        vel = t.deltaY 
        touching = true
    end
end

function orientationChanged()
    setText()
end

--# Markdown
--markdown-esque string formatting

local style = { --simple cascading styles
body = {   --body format
font = "Georgia", --will crash if font doesnt have bold, italic, boldItalic variants
col = color(0, 129, 255, 255)
},
heading = {     --the array part of style.heading contains styles for heading1, heading2, etc (overrides global)
{size=2}, --Heading1. Size is proportion of body size
{size=1.5}, --Heading2
{size=1.2}, --Heading 3
all = { --global headings settings
font = "HelveticaNeue",
col = color(0, 177, 255, 255)
}
},
block = {  --block quotes
size = 0.9,
font = "HelveticaNeue",
col = color(120, 132, 144, 255)}
}

local face --name of base font currently being used
local size --size of base font

function Markdown(t)
    
    textMode(CORNER)
    local _, parSep = textSize("dummy") --paragraph separation defined by body style
    size = fontSize() --base size of body text
    
    local img = image(t.width, HEIGHT * 2) --height
    setContext(img)
    textWrapWidth(0) --we need to turn off text wrapping and implement our own because the built-in wrapping does not give us control over the point at which the text starts (first line indentation), nor tell us where the last line ends.
    
    local cursor = vec2(0,HEIGHT * 2)
    
    local italic = false
    local bold = false
    local tightList = false --remove paragraph separation for bullets
    local indent = 0 --for block quotations
    
    for _, paragraph in ipairs(t.text) do
        --PRE-PROCESS TYPOGRAPHY
        paragraph = string.gsub(paragraph, "(%S+)'", "%1\\u{2019}") --right single quote. Do this first in order to catch apostrophes
        paragraph = string.gsub(paragraph, "'(%S+)", "\\u{2018}%1") --left single quote
        paragraph = string.gsub(paragraph, "%-%-%-", "\\u{2014}") --em-dash
        paragraph = string.gsub(paragraph, "%-%-", "\\u{2013}") --en-dash
        paragraph = string.gsub(paragraph, "\"(%S+)", "\\u{201C}%1") --left double quote
        paragraph = string.gsub(paragraph, "(%S+)\"", "%1\\u{201D}") --right double quote
        --RESET TO DEFAULT BODY FONT FOR NEW PARAGRAPH
        style.set(style.body)
        cursor.x = 0
        indent = 0
        fontSize(size)
        paragraph = paragraph.."\
" --add return (this also allows final part of line to be captured, as return is a whitespace character)
        local cursorSet = false --set to true once initial cursor position for paragraph is set according to font size, paragraph separation
        
        --BLOCK
        local bl
        paragraph, bl = string.gsub(paragraph, "^> ", "", 1)
        if bl>0 then       
            indent = size * 3 --indent paragraph
            cursor.x = indent
            style.set(style.block)
        end 
        
        --BULLETS
        local bu
        paragraph, bu = string.gsub(paragraph, "^%- ", "", 1)
        if bu>0 then        
            cursor.y = cursor.y - parSep
            if not tightList then
                cursor.y = cursor.y - parSep
            end
            tightList = true
            text("\\u{2022}", size * 1.75, cursor.y)
            cursorSet = true
            indent = size * 3
            cursor.x = indent
            --   paragraph = "\\u{2022}   "..paragraph
        else
            tightList = false
        end
        
        --HEADINGS
        local hBegin, hEnd = string.find(paragraph, "^%#+") --look for number of hashes at start of para
        if hBegin then           
            local headLevel = hEnd + 1 - hBegin
            paragraph = string.gsub(paragraph, "^%#+", "")
            style.set(style.heading.all) --global heading settings
            style.set(style.heading[headLevel]) --level specific settings
        end
        
        --PARSE WORDS
        for element, control in string.gmatch(paragraph, "(.-)([%s*]+)") do --separate at white space, #, *, >
            
            if string.find(control, "%s")  then --if whitespace
                element = element.." " --put spaces back in
            end
                  
            local w,h = textSize(element)
            if t.debug then print(element,control) end --debug print
            if not cursorSet then --place first line of paragraph (paragraph separation etc)
                cursor.y = cursor.y - (h+parSep)
                cursorSet = true
            end
            
            --WRAPPING
            if cursor.x + w > t.width then --if word will take us over edge
                cursor.x = indent  --carriage return
                cursor.y = cursor.y - h
            end
            text(element, cursor.x, cursor.y) --print word
            cursor.x = cursor.x + w
            
            --BOLD AND ITALICS
            local eBegin, eEnd = string.find(control, "%*+") --count number of asterisks
            if eBegin then
                local emph = eEnd + 1 - eBegin
            --if emph>0 then
                if emph==3 then
                    bold = not bold
                    italic = not italic
                elseif emph==2 then
                    bold = not bold
                else
                    italic = not italic
                end
                if bold and italic then
                    font(face.."-BoldItalic")
                elseif bold then
                    font(face.."-Bold")
                elseif italic then
                    font(face.."-Italic")
                else
                    font(face)
                end
            end
        end --of element
    end --of paragraph
    setContext()
    return img
end

function style.set(sty)
    for func, val in pairs(sty) do --set font features for whatever keys are in the style table
        style[func](val)
    end
end
--the function names below correspond to the bottom level keys in the style table, eg font, col, size

function style.font(f)
    face = f
    font(face)
end

function style.col(col)
    fill(col)
end

function style.size(s)
    fontSize(size * s)
end

Updated picture:

md

For comparison:

#Markdown-like text formatting — in Codea!

Have you ever wanted an easy way to format text — adding italic, bold, bold-italic, different type faces, font sizes and colours, indented block quotes, plus typography features such as "smart quotes" and em-dashes, all of them nestable within one-another — on the fly?

##Well now you can, with Markdown Codea.

Try switching the orientation of your device to test the hand-made text wrapping feature!

###"But — what is this Markdown?!?" I hear you yell

Markdown is a way of adding rich formatting, such as emphasis, strong emphasis, and really, really strong emphasis, using plain text. So it’s great for using in plain-text environments such as code editors. As Markdown’s creator John Gruber said:

The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions

But the best thing about Markdown is — you already know how to use it. It’s used on lots of forums, including Codea Talk. I’ve thrown in some nice, Pandoc-inspired extras such as typographer’s quotes for the apostrophe and for ‘single quotation marks’ and "double" quote marks, plus en–dash and em—dash

this is very good!
Thanks for sharing.

@Jmv38 no problem, thanks for your help. Could be cool if someone’s planning a text-based game, like 80 Days Around the World or whatever

Very nice.

You are standing at the end of a road before a small brick building. Around you is a forest. A small stream flows out of the building and down a gully.

Text games are making a comeback on iOS. The Sorcery series is particularly impressive.

I updated the code above to add bullet points

Another pic:

md

The new version supports many more fonts. It’s too long for a comment now, source is here:

https://gist.github.com/Utsira/9ba147647374a6bd2661

^:)^

nice, what is Codea editor’s syntax?

@firewolf Codea editors syntax is Lua + Codea API. It doesn’t recognise markdown of course, and I’m not suggesting that it should (although… given that most code repositiories have a readme that is in markdown… it would still be pretty low down my list of feature requests). So I’m not suggesting anyone use the Codea editor for long-form markdown writing!

It’s a shame that Lua doesn’t allow a line break in the middle of a string though (I’m not talking about \ , I mean hitting return and splitting the string across lines). That would make it more pleasant to deal with longer text passages. That’s the reason I decided to put each paragraph in its own string.

Although, thinking about this again, perhaps there is a way to have a string with proper editor line breaks, by using the shader string [[ ... ]] braces… Hmmmmm.

Will investigate and report back

@Ignatz is that a Zork quote?

no, the start of the very first text game, Adventure

Wow, using the shader string braces [[ ]] works! I’m calling them the shader braces, but are these braces a Lua thing, or a Codea thing? Whatever they are, they’re really awesome for long-form text. It means you can just copy and paste your markdown text from your editor into Codea, without having to change anything. ie you don’t need to escape the double quotes anymore with \", and you can just use regular returns to separate paragraphs. I’ve added support for tight lists/ loose lists based on the number of returns separating list items. Updated code at the gist: https://gist.github.com/Utsira/9ba147647374a6bd2661

Here’s what the markdown looks like in Codea:

testString = [[
# *Markdown*-like text formatting --- in **Codea!**

Have you ever wanted an easy way to format text --- adding *italic*, **bold**, ***bold-italic,*** different type faces, font sizes and colours, indented block quotes, plus typography features such as "smart quotes" and em-dashes, all of them nestable within one-another --- on the fly?

## Well now you can, with **Markdown Codea.**

> *Try switching the orientation of your device to test the hand-made text wrapping feature! Touch the screen to scroll the text*

### "But --- *what **is** this **Markdown**?!?*" I hear you yell

Markdown is a way of adding rich formatting, such as:

- *Emphasis*
- **Strong emphasis**
- *Really, **really** strong emphasis*
- Or **really, *really* strong emphasis** if you prefer
- Block quotes, different headings...
- Oh, and ***bullet points!*** Bullet points can be displayed in a tight list like this, by only separating each item with one return

Or, if you prefer, you can have:

- A loose list

- Of bullet points

- Just separate each bullet with two returns

And it's all done using plain text. So it's great for using in plain-text environments such as code editors. As *Markdown's* creator John Gruber said:

> The overriding design goal for *Markdown's* formatting syntax is to make it as **readable** as possible. The idea is that a *Markdown*-formatted document should be publishable **as-is, as plain text, *without* looking like it's been marked up with tags** or formatting instructions

**But the best thing about Markdown is --- *you already know how to use it***. It's used on lots of forums, including *Codea Talk.* I've thrown in some nice, *Pandoc*-inspired extras such as typographer's quotes for the apostrophe and for 'single quotation marks' and "double" quote marks, plus en--dash and em---dash
    ]]

@yojimbo2000 - That’s an amazing bit of work - thanks for sharing :slight_smile:

Thanks for the feedback everyone.

I was thinking about how to approach the problem of presenting long-form text on screen. For shorter passages, you could have scrolling (like I have in my code above), like a web page. But for longer passages, first of all you’d need some kind of system to deal with what happens when you get past the 2048 pixel texture limit. Plus, an endless scroll might not be that good a way to deal with a long passage. It’s too frictionless, too easy to flick the screen and completely lose your place.

So then you have the Kindle/ eBook reader approach, which displays discrete, non-scrolling “pages”, and one touch or swipe on either the left or right side to “turn” the page. I tried implementing this, but it’s actually a little tricky, because the page break could well fall in the middle of a paragraph, or a spanned element, so you have to parse everything from the beginning of the paragraph up to the first word at the top of the page.

It struck me that the simplest and most sensible way to approach this, would be to parse the text by full markdown paragraphs (ie text followed by two carriage-returns). So I’ve created a variation of the above code that adds a new paragraph to the page each time you touch the screen, stitching them together into a lengthening page. This probably also makes most sense in a text-based game environment. It’s my take on the engine that Inkle use in the Sorcery games and 80 Days Around the World.

As it involved rejigging the markdown parser slightly, to make full paragraphs the basic unit, I’ve forked the repository.

Paragraph-stitching code is here:

https://gist.github.com/Utsira/5db6e0eccb68c70d3670