Thursday, December 19, 2013

Silly pattern-matching macro

One of things that I somehow always end up doing with Lisp code is transforming one type of list-structure into some other type of list-structure, where the rules of transformation are simple pattern matches..

That's probably not very surprising, considering some "other" languages are built completely on pattern matching, but one of the cool things of Lisp is that you can add all the missing syntax quite easily. So today I figured I'd hack together a simple pattern matcher in (my own "toy" dialect) µL.

The syntax of the language should be quite obvious to anyone who's ever written any Scheme (it's mostly just shorter), so I won't bother translating it, but the idea is to add a "case-match" form (that should probably get a shorter name) so you can do things like this:

.uL.>
(def (let-format form)
  (case-match form
    (((fn ,args . ,body) . ,binds)
     `(let ,(map list args binds) . ,body))
    (,other other)))

 .r = #!<fn let-format>

.uL.> (let-format '((fn (a b) (+ a b)) 1 2))

 .r = (let
        ((a 1)
         (b 2))
        (+ a b))

See, we match some arbitrary list-structure to each of the case-structures, and pick the first one that applies. In the pattern ,symbol will match anything and capture that symbol as a binding to the matched form in the code for that match, essentially like a destructuring bind, except without unquote the symbols match literally.

Nothing too fancy, not worth writing a blog-post about.. but the cool thing was how simple the implementation turned out to be. I actually wonder why I've not done this earlier (veteran lisp-programmers, please stop laughing). The resulting code only walks once per pattern and aborts early, so it's reasonably efficient too. It could be extended further, but the point is: it's not hard at all to add this functionality into Lisp-like languages. It's here:

;;; main dispatch, with the actual macro
;;; simply binds the "what" part to a symbol
;;; so it can be any arbitrary expression
(macro (case-match what . cases)
  (let ((wsym (gensym)))
    `(let ((,wsym ,what))
       ,(case-match-build (gensym) wsym cases))))

;;; This builds the main (if .. (if .. (if ..))) structure.
;;; The code built by case-match-case returns either closure
;;; or nil (no match), so we check and call closure if any.
(def (case-match-build tmp wsym cases)
  `(let ((,tmp ,(case-match-case wsym (car cases))))
     (if ,tmp 
       (,tmp) 
       ,(if (cdr cases) 
          (case-match-build tmp wsym (cdr cases))))))

;;; Now what case-match-case does, it start's a CPS-style
;;; macro-expansion, because we want the final code to only
;;; recurse the match once (for efficiency). So we pass around
;;; the code that we want to insert in the "tail-position" in
;;; pretty much the same way we'd pass around continuations.
(def (case-match-case wsym 1case)
  (let ((pat (car 1case))
        (body (cdr 1case)))
    (case-match-worker wsym pat
      `(fn () . ,body))))

;;; So here we go: build match for a sub-pattern.
;;; The first cond-branch will do the unquote-match,
;;; and introduce bindings. The second one expands the
;;; CPS-style recursion and the last-one will match
;;; a literal, and either keep going, or abort.
(def (case-match-worker wsym pat k)
  (cond
    ((and (cons? pat) 
          (eq (car pat) 'unquote)
          (and (cons? (cdr pat))
               (symbol? (cadr pat))
               (not (cddr pat))))
     `(let ((,(cadr pat) ,wsym))
        ,k))
    ((cons? pat)
     (let ((pat1 (car pat))
           (patr (cdr pat))
           (ws1 (gensym))
           (wsr (gensym)))
      `(if (cons? ,wsym)
        (let ((,ws1 (car ,wsym))
              (,wsr (cdr ,wsym)))
          ,(case-match-worker ws1 pat1
             (case-match-worker wsr patr k)))
        ())))
    (t `(if (eq ',pat ,wsym) ,k ()))))


That's not too bad; no more manual destructuring for pattern matches. The main thing I don't like is the gensym-spam with a lot of temporary bindings.. which wouldn't be that bad if it wasn't for the lack of "safe-for-space" closures in my current implementation of the language itself. But this thing should help with building a better compiler to fix that.