cmodule 0.1.1: sneak peek at a dynamic module loading system designed for Codea

toadkick · May 16, 2013, 3:15pm

Hello,

If you’ve been around here the last couple of months you might have heard me mention a few times that I have been working on a module loading system designed for Codea. I’ve been using it now for quite a while in it’s current state, and I feel that while it’s not yet ready for prime time, it is stable (and hopefully useful) enough to share a “pre-release” version of it. I will consider it at 1.0 when a) Codea supports optional tab execution (which is in the feature tracker), and b) when I figure out how to make it work with exported projects (currently it can only be used within the Codea app itself).

cmodule itself can be found here: https://gist.github.com/apendley/5411561

and a small test suite with basic examples of usage can be found here: https://gist.github.com/apendley/5594141

Why use cmodule? Codea’s basic module structure is great for prototypes and projects of smaller scope. However, as projects grow larger in scope and become “real” projects, it can be quite cumbersome to manage the global namespace, dependencies, as well as ensuring that your program does not fill up the iPad’s memory with code/data that is used only infrequently (e.g. game data, like levels, spritesheet metadata, etc). These problems can all be amplified as well when collaborating with others.

cmodule attempts to fully address this. With cmodule, you can load a tab (file) from any other Codea project on demand, without having to include that project as a Codea dependency (and thus, without cluttering the global namespace). By default, the source code in a module file is not loaded/executed until it is explicitly loaded with cimport/cload, so only modules that are actually used get loaded into memory (and, the garbage collector will collect them when they are no longer used). Additionally, since dependencies are explicit, and are imported locally to each module, debugging can become less of a chore: there’s only so many things a module can break, especially when taking care not to access the global environment unnecessarily from inside the module. cmodule also provides a stack trace for module compiler errors, making it much easier to track down typos or malformed code. Also, cmodule provides a way to load application data/code in an optionally sandboxed environment (using the cload function), which allows you to safely expose only the data and functions necessary for the data file to do it’s job (since, after all, the data files are Lua files).

My main motivation for cmodule is that I feel like it worth taking a stab to try and coordinate efforts for sharing code in a more standardized manner, and a module loading system can be excellent for that, if adoption becomes common (for example, it makes it much easier create shared code libraries, application/template installers, and many other things). This is my attempt at that. I fully do not expect anyone to adopt cmodule, especially in it’s early state, however I am cautiously optimistic that at least a few others will find value in both the cmodule’s methodology and the actual implementation itself. It is usually my goal when hacking in Codea to provide features that TLL will hopefully find useful enough to one day include in Codea itself (either in it’s Lua form or in a native form), thus I want to solicit feedback and get opinions on the idea and the execution itself. At the very least, I hope this can be educational/informative to others; there are a couple of neat tricks in there that are afforded by Lua’s seemingly infinite expressiveness, and you might just learn something you didn’t know from examining the code

This weekend I will move cmodule to a proper github repo (though I’ll still keep an updated Codea-packaged copy in gist at the above links) and take a stab at some proper documentation. Some of the cooler features of cmodule are not obvious, and I want to make sure to get them documented so that they may be taken advantage of.

A note about cmodule file format:
Due to the fact that Codea automatically loads and executes all tabs within a project and it’s dependency projects, module files must currently be wrapped in a long comment to prevent Codea from executing the code prematurely. Eventually Codea will support optional tab execution, and this will no longer be necessary, but for now cmodule must exist with this slightly annoying wart. As a concession, cmodule allows you to use whichever form of long comment you want, to alleviate issues with nested block comments. For example, you can use:

--[[

-- module code goes here

--]]

or:

--[=[

-- module code goes here

--[[
a nested long comment
--]]

--]=]

Essentially, you may choose to use as many = characters in your wrapper comment, and cmodule will respond accordingly.

toadkick · May 16, 2013, 6:36pm

@Andrew_Stacey: Excellent questions.

The answer to both of those questions is actually somewhat related, so I’ll answer them both at the same time. There are 2 ways to export from a module: explicitly returning a value from the module (after all, a loaded Lua file is just a function, and functions can return values), or by returning nothing and allowing the module’s environment table to exported.

Returning something from a module method #1


--[=[

local ExampleClass = class()

function ExampleClass:doSomething()
    print("Did something!")
end

return ExampleClass

--]=]

In this case, cimport will forward the return value of the module back to the caller. For example, to load ExampleClass:


-- from somewhere in the same project that the ExampleClass tab lives:
local ExampleClass = cimport("ExampleClass")
local obj = ExampleClass()
obj:doSomething()

It’s worth noting that a module may return any valid Lua value, not just a table/class, (such as a function, or a string, or a number, or a coroutine, etc). If you want to return multiple items from a module, you may simply return a table from a module containing those items.

The second method is to declare everything you want exported as if it were a global, and to return nothing from your module:


--[=[

-- assume this code lives in a tab called ExampleModule

-- inside of a module, unqualified variable assignments go to the
-- module's environment table, or _M, instead of _G
TestClass = class()

function TestClass:init()
end

function TestClass:doSomething()
    -- note that we can access global variables without qualifying them
    -- with _G, as we are here with the print global:
    print("Did Something")
end

-- If we want to write to the global environment, we need to prefix
-- our assignment with _G:
_G.someGlobal = 10


-- since we aren't explicitly returning a value, cimport will return
-- our module's _M table. As such, we can put whatever we
-- want in it.
TestClassSub = class(TestClass)

function TestClassSub:doSomething()
    TestClass.doSomething(self)
    print("Did something else!")
end

-- don't return anything; the module's _M table will be returned
-- from cimport

--]=]

Using modules imported using method #2:

-- from somewhere in the same project that the ExampleClass tab lives:
local exmodule = cimport("ExampleModule")
local o1 = exmodule.ExampleClass()
local o2 = exmodule.ExampleClassSub()
o1:doSomething()
o2:doSomething()

-- also, let's print out the value of the global we set
print(someGlobal)

Hopefully that makes sense, and you can get a clear picture of how to access globals and return multiple items from a module. A nice side effect of returning a table from a module (or just letting cimport return the module’s environment table) is that it makes it easy to group related APIs…for example, maybe I have a module called “Utilities”, which exports a table of general utility functions.

EDIT: I just noticed the second part of your second question, the “how do I export a whole slew of globals”. It should be pretty trivial to write a function (you could even put it in a module!) that can export multiple items to the global namespace with a single call. For example:


--[=[

return function(exports)
    for k, v in pairs(exports) do
        _G[k] = v
    end
end

--]=]

Say we put that code in a tab named “gexport”, in the project “util”, we could access it and use it batch export many globals at once:


--[=[
local gexport = cimport("util:gexport")

local SomeClass = class()

-- a nice taste of Lua's syntactic sugar: when expressing a variable that
-- contains a function, you may omit the parenthesis if the there is a single
-- argument, and that argument is a string literal or table constructor,
-- and Lua will still recognize the expression as a valid function call.
gexport {
    SomeClass = SomeClass,
    someValue = 10
}

--]=]

from some other module/tab:


print(SomeClass)
print(someValue)

On further thought, I’ve add this function to the global cmodule table, as it seems like a useful utility (I’ll leave the above code here as a quick example of how to make and use a simple module). Now there’s no need to import it::


cmodule.gexport {
    -- things you want to be globals
}

Andrew_Stacey · May 16, 2013, 6:12pm

Two questions:

I have some tabs which define more than one class (grouping for convenience), how do you recommend I handle that?
I have some tabs that define some global stuff, what’s the best way to say that something should be global, or that a whole slew of stuff should be global?

Andrew_Stacey · May 17, 2013, 5:12am

@toadkick Thanks for the answers. For backward compatibility, I’ll probably use the “return multiple elements” method for the time being but it’s good to know about the second.

With regard to the export, how about passing an optional table to use instead of _G. Something like:

return function(exports,metatable)
    metatable = metatable or _G
    for k, v in pairs(exports) do
        metatable[k] = v
    end
end

that way, if I import a module from within a module then I don’t pollute the global namespace if I just want the functions for within that main module.

Andrew_Stacey · May 17, 2013, 5:41am

Hmm, no that wouldn’t work because gexport is called from within the imported module. What I’d need to do is pass the context table in via the cimport function.

toadkick · May 17, 2013, 6:20am

@Andrew_Stacey: I’m not sure what you are trying to do exactly. You can simply call cimport from within a module to import another module, without using globals at all, nothing tricky necessary.

Assume this code lives in a tab in your project called ModuleB, that imports another tab in the same project called called ModuleA:


--[=[

local ModuleA = cimport("ModuleA")

-- do stuff with whatever was returned from cimport by ModuleA

return {
    -- whatever you want ModuleB to return from cimport
}

--]=]

You could get away with not having to declare a single global in your program if you wanted to.

I may have forgotten to mention this detail, but each module loaded with cimport is cached by cmodule, so that successive calls to import the same module will return the cached results from the first call, instead of reloading and re-executing the module’s source.

Andrew_Stacey · May 17, 2013, 7:20am

@toadkick Here’s an example. I have a function RoundedRect which draws a rounded rectangle. I use this in all sorts of places. It’s in a module called Utilities. Now I could have Utilities place this code directly into the global namespace whereupon it is usable by any module that wants to use it. But if only one module wants to use it, this seems to go against the principle of the thing: I could have Utilities place it into the metatable of the importing module to keep the separation. Because, as you say, there is no replication of code then there’s no harm in doing this no matter how many modules want to use it.

Thinking about it some more, I could have Utilities return a function which then imports all of its exported stuff into the calling metatable. That would be one way to do it. But then the calling module has to know that that’s how Utilities works.

Here’s another irritation! When exporting multiple stuff I originally thought it worked as:

return X,Y,Z

so I wrote

local A,B,C = cimport "Module"

I learnt that that didn’t work! I have to return a table:

return {X,Y,Z}

so I put:

local A,B,C = unpack(cimport "Module")

but that doesn’t work either because cimport returns more than just what the module returns. Is there a logic to that? I’m tempted to write a wrapper function cimport_unpack to get round this:

function cimport_unpack(m)
    local r = cimport(m)
    return unpack(r)
end

(Incidentally, I’m only complaining because I’m trying to use it! And these aren’t major complaints, just niggles.)

Andrew_Stacey · May 17, 2013, 7:36am

A-ha. I can get unpack(cmodule "Module") to work as I want:

local A,B,C = unpack(cimport "Module",nil)

works! This is because a function call that is not at the end of a list only returns its first argument (and nil is safe to pass to unpack as the second argument).

toadkick · May 17, 2013, 7:37am

@Andrew_Stacey: Ah, I think I am understanding your issues.

Because the value returned from cimport is cached (with the key being the full path, i.e. “Project:tab”), modules can only return 1 value. It’s possible that I could adapt cmodule to wrap multiple returned values in a table implicitly, but I don’t like like that because of the mismatch it creates from what is returned from the module vs. what is returned from cimport().

Also, yeah, right now cimport/cload return 3 values when a module is loaded: the module’s return value, the name of the project from which the module was loaded, and the name of the module’s tab. I’ve found this information to be very useful in certain cases where library code wants to import a file that you specify, but doesn’t necessarily know which project that file belongs to. This is experimental, and may go away in the future, but I do see how it could create an annoyance if you aren’t expecting it.

Incidentally, the unpack() function only works on arrays with numeric indices (i.e. “arrays”), it will return nil if the table passed into it is not a valid array, so unless your module is returning an array, unpack() will not work. EDIT: oh yeah, and what you said is true also: a function with multiple returns will only end up returning the first value if that function is not called at the end of an initializer list.

Honestly, the simplest solution is to put your rounded rect function in it’s own module (and have your module return it), and then explicitly import it where you need it:


--[=[

local RoundedRect = cimport("RoundedRect")

-- use your rounded rect function
RoundedRect()

--]=]

You can also still include your rounded rect in your utilities module:


--[=[
-- My Utilties Module

local util = {
    RoundedRect = cimport("RoundedRect"),
    anotherUtilFunc = whatever,
    etc...
}

return util

--]=]

Then, you have a choice: if you are in a file that has already imported your utilies module, you can just access RoundedRect via that:


--[=[

local util = cimport("Utilities")

util.RoundedRect()

-- or, if you don't want to prefix all calls:
local RoundedRect = cimport("Utilities").RoundedRect

RoundedRect()

--]=]

Otherwise, if you just need the rounded rect function, you can just import only the RoundedRect module where ever you need it:


--[=[

local RoundedRect = cimport("RoundedRect")

RoundedRect()

--]=]

If you wanted a function that could take a table returned from a module and import it into your module’s environment table, you could write one as such (pretty much the same function you defined above actually):


function mexport(exports, env)
    for k, v in pairs(exports) do
        env[k] = v
    end
end

Usage:


-- every module gets it's own _M (like _G, but for the module)

mexport(cimport("Utilities"), _M)
RoundedRect()

Note that if you import the functions into _M, those functions will be exposed via the return from cimport if you don’t explicitly return anything (honestly, the default export of _M from cimport is a feature I rarely use…90% of the time I explicitly return my exports from my modules). If you are returning your exports explicitly, then there’s no harm in polluting _M, because it will never be exposed.

FWIW, I welcome complaints, suggestions, niggles, compliments, anecdotes, or whatever I want people to use cmodule, and want to try to accomodate everyone’s needs as much as is reasonable.

Andrew_Stacey · May 17, 2013, 8:00am

@toadkick There’s certainly lots of things that I could do which would make more sense than what I am doing! However, I’m trying to convert my library to use cmodule and so initially I want to get it working with the minimum of changes. The unpack(cimport "Module",nil) is working for me so I’ll do that for the moment and then see if there’s a better way later.

Putting my RoundedRect in its own module is possible, but would end up with me having loads of tabs and I’ll need to think about that before I do it. At the moment Utilities is a bit of a scratch pad of random functions that I’ve written that don’t belong to a particular class. It needs organising, but that’s never been my strong point!

(Right now, I’m struggling with cmodule.loaded. But I’ll struggle a bit more before I ask about it.)

toadkick · May 17, 2013, 8:05am

@Andrew_Stacey: heh, organization is a tricky thing. One nice thing about module systems is that usually using them kind of forces you to get better about organization Incidentally, all of the ideas in cmodule are not of my own conception: I’ve heavily borrowed ideas from other module systems, most notably Lua’s own module(), and JavaScript’s CommonJS module spec.

FWIW, what I would probably end up doing in your case is moving all of my utilities to their own separate project, so that it’s not a big deal if you end up having a lot of tabs. Since cimport() works across projects, you can still use it from any other project in Codea (by prefixing the argument to cimport with the project name: cimport(“Utilities:RoundedRect”). So, if I make a project called Utilities, and put each utility in it’s own module in that project, users of the library can choose to either import only the modules they need, or you could also provide a Utilities tab that imports all of your utility functions, and exports them in a single table. (basically, an API).

cmodule.loaded is mainly just a utility. If you call it with no arguments, it returns a list of the keys of the loaded modules. If you pass an argument that is a full path to a module (i.e., project and tab, “MyProject:MyTab”), loaded will return true if that specific module is currently loaded.

Honestly, it’s mostly just there for informational and debug purposes. I’ve never actually had to use it except when I was curious to see which modules were actually loaded at any given time.

Andrew_Stacey · May 17, 2013, 8:09am

@toadkick I have a use for cmodule.loaded. I have a UI class that can load in lots of UI elements. But if I have no need for them, I don’t want to load them. So the main program loads in which elements it wants (menus, number spinners, colour pickers) and the UI then installs the code for only those that are loaded. Thus it uses cmodule.loaded to see if something else has already requested the relevant module and if so loads it.

However, I’ve found either a bug or a discrepancy in the documentation. The argument to cmodule.loaded needs the .lua extension. Maybe you should use the _appendExt function on it first (and how about appending the current project if not specified?).

toadkick · May 17, 2013, 8:18am

@Andrew_Stacey: nice catch, that is definitely a bug, I’ll get that fixed momentarily (EDIT: fixed. Also, if you don’t specify the path in the project, the name of the currently running project will be used).

That’s an interesting and unanticipated way to use cmodule.loaded, but that should work, as long as you are aware that a module’s path has 2 components, the project name and tab name.

Incidentally, one of the big challenges of writing cmodule was providing a way so that you don’t have to prefix your module name with the project in cimport() if you are loading a module from the same project (after all, if you duplicated the project, it would be a huge pain to go update all of your imports with the new project name). Under the hood, each module’s environment gets it’s own version of cimport() that prepends the name of project that owns the module automatically if you don’t specify a project name in the path.

This is an important detail: the default cimport() available outside of module files (i.e., the one available in main via _G.cimport) will load from the currently running project if the project name is not specified in the path. The version of cimport() provided to modules will load from the project that owns the module file that is calling cimport if the project name is not specified in the path. If for some reason you need the default cimport from within a module file (to load a module that lives in the currently running project from a module imported from another project) you can use cmodule.import() or _G.cimport. The view template loader in my own UI library actually uses this to load the custom view templates for my views, since the template files live in the running project, not the UI library itself. Incidentally, only the view components specified in the view templates are imported, so it kind of naturally works out that only the code that is needed gets loaded in my case (that is, I don’t have to check cmodule.loaded to determine if a component used by the view/view controller is loaded).

toadkick · May 17, 2013, 9:21am

@Andrew_Stacey: I’ve modified cimport/cload so that they now only return 1 value (as of v0.0.7). Hopefully that might make your workaround a little easier to pull off. I was able to still suit my needs by providing another API, cmodule.resolve, that resolves a valid Codea path into the project and tab component names (unqualified paths resolve to the currently running project). Turns out this is a much better interface anyway, so your irritation at cimport’s multiple return values turned out to net a positive gain. Thanks!

Andrew_Stacey · May 17, 2013, 9:52am

@toadkick Great - I’ll download the latest version and try it out.

I think I’m running into a problem with your weak table _modules. I don’t fully understand weak references, but it’s currently my best theory for what I’m seeing. In summary, when I return multiple objects (via a table) then sometimes cmodule forgets that I’ve loaded that module and reloads it next time it is requested.

Here’s a mock-up of what I’m doing. I load in my Colour module which returns three things: Colour (a table), ColourWheel (a class), and ColourPicker. So at the end of this module I have:

return {Colour, ColourPicker, ColourWheel}

Now, often I just want one of those so I’ll do something like:

local Colour = unpack(cimport "Colour",nil)

which gets me the Colour one of the above. However, when I run it again then I often get a new instance of Colour instead of the old one. And, indeed, examining the output of cmodule.loaded() shows that the module Colour gets forgotten. I noticed this because another module extends the Colour table a little and its extensions weren’t getting applied (because each time the Colour module was imported it was creating a new Colour table).

What I think is happening is that the _modules table is storing the reference to the container table {Colour, ColourPicker, ColourWheel}. As I pass this straight into unpack, I never create a reference to that table and thus the only reference to it is the one in the _modules table and this, being weak, leads to it getting garbage collected.

I just tested this by doing an explicit garbage collection after loading in the Colour module (and another module) and they were removed.

I understand the strength of the weak references in letting you get rid of code no longer used, but I do want a way around this. There are two situations where I want to be able to designate code as “do not reimport”. One is the case outlined above, where it is not the returned table but the things in it that I’m really interested in. The other is when a module installs some code in the global namespace but doesn’t actually return anything, such as my Utilities module outlined above. In that case then I want to know that the module has been loaded and not load it again.

With the Utilities then I can use the cmodule.null return value. But with my Colour class then I want an opposite of the cmodule.nocache()! I want a cmodule.definitelydocache() (except that I don’t want this as a return value).

I guess that what I’d like to do is to set a variable in the module, something along the lines of:

cmodule.cache = true

or

cache = true

or

local cache = true

(not sure which would work best). Then when cimport looks at the metatable of the returned module it checks for that value and if it is set then it creates a normal reference, otherwise just a weak reference. So you’d need another table for the normal references.

Once all my modules are loaded, then I can clear the cache and allow things to die a natural death. So I’d also want cmodule.clearcache() to call at the end of setup().

toadkick · May 17, 2013, 10:22am

@Andrew_Stacey: forgive me for saying so, but I think you are overcomplicating some of this. The culprit in this case is not cmodule’s implementation itself, but your workaround, because no references are ever kept to your returned values. Why not simply do this?:

local Colour = cimport("Colour").Colour

EDIT: aaaahhhh, I just realized now that this will suffer from the same issue that you mentioned above. Dangit. I’ll leave the old text up there as a mea culpa

“The other is when a module installs some code in the global namespace but doesn’t actually return anything, such as my Utilities module outlined above. In that case then I want to know that the module has been loaded and not load it again”

Well, part of me wants to say “then don’t load that module again”

In all seriousness though, that argument is perhaps the best argument against using a weak table to store the module return values. I’ll have to ponder on what we actually gain from using a weak table: after all, everything that’s loaded has been used (that’s why it was loaded in the first place), so I’m thinking that ultimately not much will end up getting cleaned up, and the savings might in fact be so trivial that they don’t really matter.

Your idea of specifying within the module itself whether it gets cached is intriguing, though it might just be better to get rid of the weak table altogether.

I’ll have to ponder on this for a bit.

toadkick · May 17, 2013, 10:34am

@Andrew_Stacey: Pondering on this a little more, I think the weak references are actually a bad idea now, for exactly the reasons you stated. I don’t think it’s a good idea to have the module system unloading modules out from underneath you. I’ll modify cmodule to nix the weak references, and add an API, cmodule.unload() that will forget the reference to a loaded module, as I think this sort of cleanup is best left to the user, not the system itself.

Andrew_Stacey · May 17, 2013, 10:40am

@toadkick I just tried adding some caching code and it seemed to work. I added this bit just after local _modules = setmetatable ...:

local _cache = {}
local _caching

local function _usecache(t)
    _caching = t
    if not _caching then
        _cache = {}
    end
end

Then in the exports, I put:

_G.ccache = _usecache

and in the _import function I added the following inside the if mod ~= nocache then conditional:

if _caching or env.cache then
    _cache[cpath] = mod
end

Then I could use ccache(true) to start caching stuff “from the outside” and inside a module I could use cache = true to add that to the list. Then calling ccache(false) makes everything available to the garbage collector.

Andrew_Stacey · May 17, 2013, 10:40am

(Cross-posted, I think your method looks best.)

toadkick · May 17, 2013, 10:55am

@Andrew_Stacey: Hey, thanks for giving that a shot! I’ve updated to 0.0.8, which includes the changes I mentioned in the post above. Lemme know how that works out for you.