GitHub is a diva

I’ve got a problem with GitHub. No, not a personal one.

If I fetch a raw gist (url1) with Codea, the return data contains HTML special entity encodings, e.g. < is represented as <. If I load it on the PC with curl everything’s fine. Bonus problem: If I fetch it on the PC with Lua’s luasocket I get a status 301 (Moved Permanently).

If I fetch a raw file from a GitHub repo (url2), everything’s fine with Codea.

I did it with Codea 1.5, but I think the problem should also exist in 1.4.6. Can anybody explain what’s going on here?


-- Examining GitHub's responses.
-- Select url1 or url2 below.

-- Raw gist: aciolino's take on Cider Controls
-- HTML special entities encoding happens here
url1 = "https://gist.github.com/raw/4127177/"
url1 = url1.."1f5a7859e5a315ff692fd7af8278eb66aeda1312/CiderControls%201.5.2"

-- Raw file in regular git repo: part of ruilov's GitClient
-- No HTML special entities encoding for this file
url2 = "https://raw.github.com/ruilov/GitClient-Release/master/Main.lua"

url = url1

function setup()
    http.request(url, s, f)
end

function s(data, status)
    if s == 200 then
        print(data)
    else print("Error: "..status) end
end

function f(err) print(err) end
function draw() end

I’ve just looked into this, it appears that the Gist server looks at the user agent string to determine how to serve the response.

If you set the user agent to the empty string, the result is returned without HTML escapes, for example:

-- Examining GitHub's responses.
-- Select url1 or url2 below.

-- Raw gist: aciolino's take on Cider Controls
-- HTML special entities encoding happens here
url1 = "https://gist.github.com/raw/4127177/"
url1 = url1.."1f5a7859e5a315ff692fd7af8278eb66aeda1312/CiderControls%201.5.2"

-- Raw file in regular git repo: part of ruilov's GitClient
-- No HTML special entities encoding for this file
url2 = "https://raw.github.com/ruilov/GitClient-Release/master/Main.lua"

url = url1

function setup()
    -- Set the user agent to nothing
    http.request(url, s, f, {useragent=""})
end

function s(data, status)
    if status == 200 then
        print(data)
    else print("Error: "..status) end
end

function f(err) print(err) end
function draw() end

```

Exactly that’s it, thanks!

So at GitHub they fight monoculture with heterogenous server implementations.

Wow - it’s true. I read the above, and it’s wrong enough I had to run my own tests (I was thinking it had to be a difference in how the different clients asked for encoding).

Not only is github/gist sending different data based on the UA - it’s calling it text/plain (utf-8) in all cases, and saying binary encoding, when it blatantly isn’t true. This is just broken - different responses with identical (in all important respects) headers. I’m frankly stunned.

My guess here is there’s some popular client or usage that requires the escapes, and is asking wrong, and so fails - so they special cased it, breaking the (admittedly rare) case of an iOS device requesting and requiring non-encoded responses.