Monday, April 8, 2013

µL hacks

Today I randomly got the idea "wouldn't it be cool to be able to quasiquote strings" and tried implementing such a feature into my poor little Lisp-variant. Here's how it works:

When you write {string} it gets parsed exactly as if you wrote "string" except the normal string escapes are disabled so {foo"bar\} results in a string that you would normally have to type as "foo\"bar\\" which on it's own is quite handy. However, you can use ~ (maybe should change that to ^ though) and if you do this, the reader will go back to reading a normal Lisp expression. Once it finishes, it'll keep reading the string again (and read another expression next time it sees ~ and so on) and when it finally sees } it'll turn the whole thing into strcat call (which is string concatenation):

{1 + 1 = ~(int->string (+ 1 1))!}
will read as
(strcat "1 + 1 = " (int->string (+ 1 1)) "!")
and evaluate as
"1 + 1 = 2!"
Naturally, since ~ just calls the reader recursively, you can nest such constructs just like any other code... so with a grand total of 11 short lines of code, I just added a rather natural "template" feature to my little language. :)

PS. Yeah, it'd be nice if such hacks could be done "on the fly" without changing the reader (like you could in some "other Lisps"), but currently the dispatch loop from which { needs to be intercepted isn't table-driven yet (several other things like string escapes, #\newline style character constants and special values like #!eof actually are table-driven, mainly because that way the pretty printer will always match what the reader expects.. should fix the dispatch too, I guess).

Another small (though somewhat larger) µL hack was to add support for "alien" types: these are basically opaque typed pointers from Lisp heap to somewhere else. Now it's simple to add new types for external resources (like files, or C-heap types, whatever), while still handling garbage collection sensibly (both collecting such external objects if desired, and allowing the external objects to keep references to Lisp heap objects that might move arbitrarily). Fortunately it turned out to be ridiculously simple to add finalization and a bunch of visitor callbacks to the garbage collector (which isn't that fancy.. just a simple Cheney with a rudimentary nursery to deal with the worst of "garbage on arrival" which the interpreter creates fast enough that I probably should add proper generational logic).

PS. Should probably upload the whole thing somewhere... but I'd like to fix a few more broken/missing features first, such that it might be even theoretically possible to find some practical utility, other than messing around in the REPL.

Sunday, December 16, 2012

More on µL..

I've got µL up to a point, where I can start the REPL, load the compiler with a C-code backend, and type (make-system) to rebuild the whole thing as C-code. The way it works, I have a basic runtime library (ulisp.lib) with just the primitives, garbage collector and other essentials. Then you build an application with µL code "compiled" into C-code (chunks; you could load those dynamically too) that runs the code, without requiring the reader or the compiler. It's still somewhat more like "loading" though, in the sense that it essentially calls what the compiler would have called had it compiled things "on-the-fly."

So last time I was talking about C++, but I actually gave up on C++ on this project. Somehow I couldn't get the library API clean without making it all "C-like" and I figured I could just rewrite it as C. So I did that; it's all now plain ANSI C (for practical purposes at least). Rewriting it all took a day and ended up many times cleaner. Adding new C procedures (whether simple primitive functions or C-closures) is now very straight-forward and simple.

I guess it's starting to get to the point where it might make sense to put it online somewhere and attach an open-source license... if I can figure out what that license should look like.

Saturday, December 8, 2012

Let's do µL again.

Back in 2003 or so, I had a project Lisp-variant that I called µL which eventually got buried behind other things, and I kinda lost the sources and all.

A week or so again (yes, this all happened in a week), I happened to consider the idea of writing a simple Lisp implementation again, and managed to find a moderately usable version of my old µL code. Specifically, the "init" library was what I wanted, as it had most of the "really annoying stuff" done: writing quasiquote without quasiquote is quite painful.. especially without fancy features like let-forms (most other macros depend on quasiquote, because it's just easier that way).

The story short: I decided to get rid of the old "compiler" and wrote a minimal interpreter in Gambit-C; about 250 lines of verbose Scheme-code, not counting primitive definitions (those are under 100 totally trivial lines). This got me running, so I could start writing µL code.

Naturally running µL inside Scheme would be pointless, as I could just use Scheme instead (and macros/redefinitions to make it a bit less call-with-verbose-name). Instead I wanted the whole thing to run inside C++ possibly as multiple copies in a DLL loaded into a third-party application that doesn't care (yeah, plugins, why not).

So.. after some cleanup, I then wrote a basic meta-circular compiler, that targets what is essentially CPS with explicit environments. What comes out of the compiler is still fairly high-level and it "generates code" by calling a bunch of procedures. Currently there exists 3 such "back-ends": primitive back-end in the Scheme-interpreter to generate Scheme closures, primitive back-end in the C++ VM to generate closures for the VM, and then a back-end written in µL that generates C++ source to build it all at run-time (sort of).

This was easy enough, and getting the C++ VM to work wasn't too bad either. Implementing primitives is obviously tedious, but simple enough. Writing a Cheney-style garbage collector took a day or so. Harder than writing the collector was making sure all temporaries are properly protected; in my setup every allocation can trigger a collection and invalidate unprotected pointers.. setting every allocation to actually collect turned out to be nice way to find the bugs. I had to redo some of that later when I added generation collection though, because I needed to retro-fit building of the "old to new" back-reference lists.

A bit more work was writing a reader and pretty printer. Debugging the reader proved especially nasty; it was confusing to try to figure out what input was going to the Gambit-C reader, and what was going to my in-development reader, that would crash randomly leaving the input-stream in weird states. Anyway, the reader is partially table-driven and the pretty-printer uses the same tables, so simple additions to the reader will automatically start printing right the moment they are defined.

Which brings us to the fun point: the C++ VM has just enough primitives to work with all the datatypes (cons, char, fixnum, string, symbol, fn) including the backend to generate functions on the fly, but practically no "higher level" functionality. The REPL, reader, compiler front-end, printer all run in the VM (and can removed by garbage collection if no-longer referenced). The actual I/O primitives required for an interactive prompt are getchar/putchar (that's all the I/O currently).

The VM reports runtime errors (trying to call non-procedures, or primitive argument type errors) back to the µL code in the VM, and there is no way for µL code to crash the VM (this is important for my purposes; the VM must be a safe sandbox by default). There are still two ways to cause "graceful" panic though: generate and run code that violates "linkage rules" or run out of memory. Out of memory is tricky, but turning the former into run-time errors would be trivial. However, I'd rather throw type-errors at link-time (ie when the compiler assembles the code, rather than when it's executed) and writing code for that hardly makes sense until the API has stabilized; being high-level, it needs changes (mostly additions) when adding optimizations to the compiler.

Did I mention that not only full tail-calls but also arbitrary first-class continuations work out of the box? In fact calls to call/cc are also properly tail-recursive (ie. you can loop through let/cc and never run out of space), though I'm not sure if it's possible to implement dynamic-wind without losing that.

Now, the VM needs a lot of cleanup (it can't currently provide the continuations of runtime errors for restarting purposes, for example) and features to integrate with a host-app, but it's stable, doesn't crash (except for the panics mentioned), and seems to run fast enough to be usable. Since inside the sand-box everything is meta-circular and you can redefine pretty much everything on the fly, you can totally break the environment, but it'll still keep running.

Examples of how to break the environment were quite painful when I was extending the macro-system to support extension of special forms (there are 6 special forms: if, do, fn, def, set, quote), but now I have a macro for (def (fname . args) ..) style defines (which you can redefine/extend further) while the compiler only support (def sym value) style.. and fn-forms (lambdas) support some addition features by macro-expansion too... and so on. Naturally replacing the compiler on-the-fly would also work (as far as the VM cares), but it's nice to keep the core language simple (namely 6 special forms and around 50 primitives, not counting the 20 or so that implement the compiler back-end).

Unfortunately, for "full rebuilds" I'm still dependent on the Scheme interpreter, because I don't have file-I/O (well, in the abstract sense of "file" since it's the host-application should decide what a "file" actually is; remember I want to embed this into applications) in µL yet, which prevents the µL in the C++ VM from generating new bootstrap code (in Scheme it's easier to cheat)... but considering this is still "days" rather than "weeks", "months" or "years" old project...

...I'm not sure why I've not done it before.

Friday, November 2, 2012

Complex numbers vs. 2D rotations

This is rather obvious, but if you're not too comfortable with complex numbers or you're struggling with 2D rotations, it might help you.

Let's start with a 2D vector [x,y]. To rotate this, we'd multiply by a matrix [[cos(w), -sin(w)],[sin(w),cos(w)]] leading to a vector [x*cos(w)-y*sin(w),x*sin(w)+y*cos(w)]. Not too complex so far.

Now, let's pick a complex number such as a+i*b. This could also be written as r*e^(i*w), where r=|a+i*b|. The complex exponent then expands to r*(cos(w)+i*sin(w)). So we have a rotation and uniform scaling as r*[[cos(w),-sin(w)],[sin(w),cos(w)]] or simply [[a,-b],[b,a]], where a = r*cos(w) and b=r*sin(w).

So complex multiply comes down to 2D rotation and a uniform scaling coefficient. This not only means you can represent 2D rotations as complex numbers, but because any complex number is essentially just a 2D vector on the "complex plane" you can represent 2D rotations as 2D vectors (special case that doesn't work for higher dimensions; something similar can be done in 3D with quaternions though).

Let's look at an application: Suppose we have a quadrilateral sprite described by 4 points. We want to point that sprite in a particular direction. We might have an image of an arrow and it has some velocity vector (which we add to it's position at regular intervals), and you want to point the arrow in the direction it's moving and draw the thing on screen. One solution would be to use atan2() to solve for the angle of the velocity, then use the cos-sin formula to build the rotation matrix.. but because we can treat the velocity as a complex numbers, we can simply (1) normalize it to get rid of the scaling and (2) do complex multiplies with all the point (treating them as complex numbers). You don't even need a single cos() or sin() for this (just an sqrt() for the normalize).

Nothing very magical about any of the above, but maybe it'll help someone bridge their mental models of the two concepts (2D vectors and complex numbers).

PS. Another potentially useful application is to treat stereo signals as complex numbers (with "imaginary side channel" or something). You can then "rotation pan" (essentially a simple extension of classic equal power panning) with simple complex multiplies (and get the uniform gain as by-product too). You can even FFT such representation, multiply by an FFT of a real-only FIR, and when you do an IFFT, you've essentially filtered the two channels independently (even if in spectral domain they aren't "separate" as such). Why this works should be obvious when you look at the matrix representation.

Wednesday, August 1, 2012

Premultiplied in GIMP

This is really just a note to self (because I keep forgetting how to do it): when you want to do pre-multiplied stuff in GIMP, first take the image with regular alpha into a layer, then create layer mask (specify "copy of layer's alpha channel"). Then set background color to black and remove alpha channel from the layer (but keep the mask). This way the layer gets blended on a black background (effectively pre-multiplying it) and the layer mask keeps a copy of the alpha channel intact. This can then be saved as PNG or whatever and it'll look right when blended as pre-multiplied.

Wednesday, February 29, 2012

How to split quads of a height-map terrain?

Recently I needed to import some basic terrain height maps for a prototype project. It quickly became obvious that in order to get "smooth" geometry (whether you want it smooth for rendering, collision detection, or both) it's important to pay attention to how the terrain quads are triangulated. The problem is that even on a smooth heightmap most quads will not be planar, so two different ways to split the quad into two triangles will result in different geometry. Most of the time one of these is much worse than the other (sometimes neither is ideal, but at least you pick the "less bad" one). I'd also like to note that adding a vertex in the middle of the quad to get four triangles instead of two actually tends to make things worse on top of doubling the triangle count.


After searching the web for an algorithm and finding out that some big-name engines (eg UE3 docs about the issue just to name one example) just seem to let you set the orders manually, I ended up rolling my own algorithm. I'm not going to claim this is the best possible algorithm, nor am I going to claim it's novel or anything, but since it's very simple, works almost perfectly as long as the source data isn't awfully noisy (in which case any order tends to be bad), and I haven't found any other algorithms published I thought I'd share it here:

So, first do any non-uniform scaling of the heightmap terrain data such that we have a 2d array of 3d coordinates of the terrain vertices. Now for each vertex, calculate diagonal (one for each diagonal direction, on-axis doesn't work nearly as well because the problems occur in diagonal directions) two-sided finite-difference (eg [-1,0,1] kernel) "tangents" for each point (in 3d), then do a cross-product of the two "tangents" to get a "normal" and normalize the result. These "normals" are not that great for shading (you want to use something more sophisticated to calculate final normals), but take a dot-products between these "normals" at diagonally opposite corners of each quad, compare the results from the two diagonal pairs, and split the quad by adding an edge between the pair that gives the larger dot-product (ie have more "similar" normals) and you have an algorithm that will choose the better split order with "surprisingly large probability", even when neither of them is ideal.

That's not very complicated, nor did it take very long for me to come up with the algorithm. I'm sure more intelligent people could come up with much better algorithms, but if you just need "something" (so you don't need to rely on manual ordering) the above should provide you with a decent starting point.

Saturday, January 14, 2012

Z-buffer precision, camera pivot and optimal zNear

Everyone who has done any non-trivial 3D programming knows that sooner or later you run into problems with z-buffer precision. Especially a first-person situation you can have objects very near to the camera, which should not clip against the near-plane. But you also want to draw objects far away (say hills to the horizon). So you need a small zNear and a large zFar which runs into problems with precision, because of the non-linear screen-space z.

There are well-known solutions from depth partitioning to distorting the z values, but sometimes you just need a "bit more precision" and the above solutions just cause more trouble than they solve. If only you could move the zNear a bit further away.

Let's look at a first-person situation. The necessary (and sufficient) condition for avoiding any near-plane clipping (with any colliding geometry at least) is to place the near-plane completely inside the collision sphere (for rotational symmetry) of the camera. This can be a collision sphere of a player, or it can be the largest sphere that will fit inside the AABB or whatever else we collide with.

The question is, what is the largest possible value of the zNear (for a given field-of-view) that will keep the near-plane completely inside the collision sphere? Well, it depends on where you put the "eye" point.


As should be obvious in the image, the near-plane (and hence the zNear distance) is larger in the right-image. The largest possible near-plane that will fit inside the sphere will always go through the sphere origin. It is important to realize that it doesn't matter (at all) whether the "eye" is inside the collision sphere: anything behind the near-plane will get clipped anyway and the eye is purely virtual. Both placements will avoid any near-plane clipping as long as no geometry gets inside the sphere.

In practice first-person cameras (and other cameras that you turn directly) usually use the example on the left (it's quite obvious when you zoom: if you don't see any "parallax" movement when turning, then it's using the "left" version). This is what any text-book view/projection matrices will give you (eg D3DX helpers and whatever) unless you explicitly move the camera somewhere else.

Now, I would like to argue that the placement on the right is better even if you don't have precision issues, because it ends up feeling much more natural (on top of giving you larger optimal zNear which is easier to calculate). To understand why, we need to think of the "near-plane" as a lens. Then for a narrow field-of-view (eg high zoom) the "virtual" eye should end up behind the "real eye." The pivot point then should be much closer to the lens than the virtual eye position.


When I actually tried this in practice, I immediately realized why zooming in most games has always felt "wrong" as far as turning goes. It's because the pivot is in totally wrong place with regards to the view. By placing the pivot at the near-plane I ended up reducing the feeling of "tunnel vision" significantly and high-zoom no longer felt "shaky" at all. Turning actually gives you a sense of depth too (because of the slight parallax movement). You also see slightly more to your sides, which reduces the pressure to use high field-of-view for observing your surroundings (and you no longer need high field-of-view to combat the weird pivot either). All that on top of getting 4-8 times (well, it's FoV dependent; with more "zoom" you get larger zNear and hence more precision out in the distance) as large zNear without any near-plane clipping. I can't name a single negative really (go ahead and post of comment if you disagree).

So given a collision sphere radius and field of view, how do we calculate the optimal zNear when the near-plane goes through the camera origin? It turns out to be quite simple. Something like the following:

// fovY is vertical field of view in radians
// aspect is screenWidth / screenHeight
// solve half near-plane diagonal for nominal zNear = 1
float diag = tan(fovY / 2.f) * sqrt(1.f + aspect*aspect);

// solve zNear for diagonal to equal collision radius
float zNear = collisionRadius / diag;

Now you can calculate view matrix as usual, except move the camera backwards by zNear, because "camera" is really the "virtual eye" and we want near-plane through origin of the "real" camera.