Saturday, December 8, 2012

Let's do µL again.

Back in 2003 or so, I had a project Lisp-variant that I called µL which eventually got buried behind other things, and I kinda lost the sources and all.

A week or so again (yes, this all happened in a week), I happened to consider the idea of writing a simple Lisp implementation again, and managed to find a moderately usable version of my old µL code. Specifically, the "init" library was what I wanted, as it had most of the "really annoying stuff" done: writing quasiquote without quasiquote is quite painful.. especially without fancy features like let-forms (most other macros depend on quasiquote, because it's just easier that way).

The story short: I decided to get rid of the old "compiler" and wrote a minimal interpreter in Gambit-C; about 250 lines of verbose Scheme-code, not counting primitive definitions (those are under 100 totally trivial lines). This got me running, so I could start writing µL code.

Naturally running µL inside Scheme would be pointless, as I could just use Scheme instead (and macros/redefinitions to make it a bit less call-with-verbose-name). Instead I wanted the whole thing to run inside C++ possibly as multiple copies in a DLL loaded into a third-party application that doesn't care (yeah, plugins, why not).

So.. after some cleanup, I then wrote a basic meta-circular compiler, that targets what is essentially CPS with explicit environments. What comes out of the compiler is still fairly high-level and it "generates code" by calling a bunch of procedures. Currently there exists 3 such "back-ends": primitive back-end in the Scheme-interpreter to generate Scheme closures, primitive back-end in the C++ VM to generate closures for the VM, and then a back-end written in µL that generates C++ source to build it all at run-time (sort of).

This was easy enough, and getting the C++ VM to work wasn't too bad either. Implementing primitives is obviously tedious, but simple enough. Writing a Cheney-style garbage collector took a day or so. Harder than writing the collector was making sure all temporaries are properly protected; in my setup every allocation can trigger a collection and invalidate unprotected pointers.. setting every allocation to actually collect turned out to be nice way to find the bugs. I had to redo some of that later when I added generation collection though, because I needed to retro-fit building of the "old to new" back-reference lists.

A bit more work was writing a reader and pretty printer. Debugging the reader proved especially nasty; it was confusing to try to figure out what input was going to the Gambit-C reader, and what was going to my in-development reader, that would crash randomly leaving the input-stream in weird states. Anyway, the reader is partially table-driven and the pretty-printer uses the same tables, so simple additions to the reader will automatically start printing right the moment they are defined.

Which brings us to the fun point: the C++ VM has just enough primitives to work with all the datatypes (cons, char, fixnum, string, symbol, fn) including the backend to generate functions on the fly, but practically no "higher level" functionality. The REPL, reader, compiler front-end, printer all run in the VM (and can removed by garbage collection if no-longer referenced). The actual I/O primitives required for an interactive prompt are getchar/putchar (that's all the I/O currently).

The VM reports runtime errors (trying to call non-procedures, or primitive argument type errors) back to the µL code in the VM, and there is no way for µL code to crash the VM (this is important for my purposes; the VM must be a safe sandbox by default). There are still two ways to cause "graceful" panic though: generate and run code that violates "linkage rules" or run out of memory. Out of memory is tricky, but turning the former into run-time errors would be trivial. However, I'd rather throw type-errors at link-time (ie when the compiler assembles the code, rather than when it's executed) and writing code for that hardly makes sense until the API has stabilized; being high-level, it needs changes (mostly additions) when adding optimizations to the compiler.

Did I mention that not only full tail-calls but also arbitrary first-class continuations work out of the box? In fact calls to call/cc are also properly tail-recursive (ie. you can loop through let/cc and never run out of space), though I'm not sure if it's possible to implement dynamic-wind without losing that.

Now, the VM needs a lot of cleanup (it can't currently provide the continuations of runtime errors for restarting purposes, for example) and features to integrate with a host-app, but it's stable, doesn't crash (except for the panics mentioned), and seems to run fast enough to be usable. Since inside the sand-box everything is meta-circular and you can redefine pretty much everything on the fly, you can totally break the environment, but it'll still keep running.

Examples of how to break the environment were quite painful when I was extending the macro-system to support extension of special forms (there are 6 special forms: if, do, fn, def, set, quote), but now I have a macro for (def (fname . args) ..) style defines (which you can redefine/extend further) while the compiler only support (def sym value) style.. and fn-forms (lambdas) support some addition features by macro-expansion too... and so on. Naturally replacing the compiler on-the-fly would also work (as far as the VM cares), but it's nice to keep the core language simple (namely 6 special forms and around 50 primitives, not counting the 20 or so that implement the compiler back-end).

Unfortunately, for "full rebuilds" I'm still dependent on the Scheme interpreter, because I don't have file-I/O (well, in the abstract sense of "file" since it's the host-application should decide what a "file" actually is; remember I want to embed this into applications) in µL yet, which prevents the µL in the C++ VM from generating new bootstrap code (in Scheme it's easier to cheat)... but considering this is still "days" rather than "weeks", "months" or "years" old project...

...I'm not sure why I've not done it before.

No comments: