Notes

Custom Binding Constructs Without Macros, Part 2: Desugaring Bindings

(This is a continuation of last week's experiment.)

Let's assume we want to implement our own pattern matching functionality in a language that does not provide such a functionality and define a custom “match” function:

// print x only if v is a pair with 1 as its second field
let v = Pair(2, 1);
match(v, Pair(x, 1), { print(x) })

In most languages we're out of luck, because matching a pattern such as Pair(x, 1) against a value involves binding the first element of the value to the variable x in the scope of the match clause, but as soon as the argument Pair(x, 1) is passed to the match function, the variable x will be evaluated in the current scope instead of being used as a name that we want to bind to.

Lisp macros provide a solution, but they come at a cost: Macros operate on a syntactic level before variables are resolved and can interpret variables in any way they wish, even behaving differently based on the name of a symbol. This makes macros powerful, but goes against our intuition of how variables are used: Renaming a variable should never change program behavior. Can we do better?

Explicit Bindings

Here's a proposal for how custom binding constructs could be defined without bringing in full macros, in other words, without exposing the names of variables to a macro:

// 'x is an explicit binding, used in the nearest block { ... }
match(v, Pair('x, 'x), { x })

Bindings are made explicit by quoting a variable x as 'x. Any unquoted variable is resolved as normal, but a quoted variable is a binding, a “variable to be”, which is not resolved in the current context, but will be bound in the nearest block, which is delimited by { ... }.

Blocks are just syntactic sugar for anonymous functions, whose parameters are defined by the explicit bindings that appear to their left in the code. In the above example, the block { x } would be desugared to (x) => x, the anonymous function of one argument x that just returns x.

More explicitly, whenever a function f is called with a block as its argument, this function acts as a “binder”: The block will consume all the bindings that occur as arguments of f (or as arguments to arguments of f and so forth) before the block. A block never receives bindings that come from “above” the binder f in the syntax tree, which makes binders act as scopes:

// this binds `x => x + x` to f, then calls f(2)
let('f, fn('x, { x + x }), { f(2) })
//   |      \__________/          |
//   |       scope of x           |
//   \____________________________/
//           scope of f
//
// { x + x } is desugared to x => x + x
// { f(2) } is desugared to f => f(2)

Implementing a Binding Construct

So far, so good. But how can binders, functions that accept blocks, be implemented in such a way that they can bind explicit bindings to the right values? Let's return to the hypothetical binding construct for pattern matching and let's look at its arguments in turn:

match(
    v,            // a value, resolved before passed to `match`
    Pair('x, 'x), // ???
    { x }         // desugared to a normal function, x => x
)

A binder is just a normal function, not a macro, and so the first argument, v, is resolved in the current scope and its value is passed to the match function. The block { x } is transformed into an anonymous function, as discussed, and then passed to the match function as its last argument. The transformation to an anonymous function needs only static information and can thus be done during parsing, the target language only needs to support anonymous functions. But what about Pair('x, 'x)?

An explicit binding is turned into a special value that can be compared for equality, but never unwrapped. The two 'x in Pair('x, 'x) can thus be compared and will be equal, whereas the bindings in Pair('x, 'y) would not be equal. The parser needs to ensure that explicit bindings in different scopes need to become different values, such as in this example:

// in pseudo JS syntax:
let just_the_binding = (binding, _block) => binding;
let y = just_the_binding('x, { x });
let z = just_the_binding('x, { x });
y == z // false

Even though both bindings are called “'x”, they come from different scopes and are thus different. This, together with the requirement that the name of bindings can never be accessed, guarantees that they behave like variables and that renaming a binding never changes the behavior of a program.

Runtime Reflection

There's one more requirement if we want to make the match(v, Pair('x, 'x), { x }) example work: Pair('x, 'x) cannot be a struct that is sealed off and can only be accessed if its name and number of fields are known. Instead, the match function needs to inspect it at runtime and split it into its constituent parts: the name “Pair” and a list of fields that can be iterated over. This requires some form of runtime reflection.

The value Pair('x, 'x) that is passed to the match function is then simply a pair that stores two bindings as its fields. It is the match function's job to figure out that only one binding was used in the pattern, because the block { x } is desugared to x => x and thus expects a single argument. If the pattern had been Pair('x, 'y) the block { x } would have been desugared to (x, y) => x, expecting two arguments. Using runtime reflection, the match function can ensure that it always calls the block with the right number of arguments.

There is a possible alternative that does not require support for runtime reflection: When a binder function call is parsed, for example when the call match(v, Pair('x, 'x), { x }) is parsed, the parser could statically destructure arguments that aren't blocks into a custom representation that exposes its structure without reflection, such as:

// using Rust-like syntax:
Struct {
    name: "Pair",
    fields: [
        Binding('x),
        Binding('x)
    ]
}

Not requiring runtime reflection might seem appealing, but comes at a cost: It is a leaky abstraction, because the parser can only destructure what it sees statically:

// without reflection, `p` appears atomic to the parser
let p = Pair(1, 2);
match(v, p, { ... })

In the above code, pattern matching could only be implemented if there's already a way to compare aggregate data structures for equality and if this built-in form of equality is sufficient. But if we for example want to implement a custom pattern matching construct that can unify Pair(1.0, 1.0) with Pair(1, 1) but the built-in equality does not support coercion between ints and floats, we're out of luck.

Under the Hood

Here's a sketch for a lower level language that is deliberately simple but supports just enough runtime reflection that explicit bindings and custom binding constructs can be built on top of it:

t = var                                       // variable
  | var "=>" t                                // abstraction
  | t "(" t ")"                               // application
  | tag                                       // tag
  | "if" t "is" t "then" t "else" t           // conditional
  | "with" t "as" var "," var "do" t "else" t // destructure

The first three lines are just standard lambda calculus: We can build multi-argument functions using currying, using “(x, y) => x” as sugar for “x => y => x”, and call functions, using “f(x, y)” as sugar for “(f(x))(y)”.

Tags, starting with an uppercase letter to distinguish them from variables, are interned strings (known as keywords, atoms or symbols in other languages) and can be applied to other values to form simple data structures: “Foo(bar, baz)” is the tag “Foo” applied to “bar”, then applied to “baz”. A simple if-is-then-else can compare two atomic tags (but not tags applied to other values) for equality:

if Foo is Foo then True else False           // True
if Foo is Bar then True else False           // False
if Foo(Bar) is Foo(Bar) then True else False // False, not atomic

To destructure aggregate values, the with-as-do-else form has to be used, which will pop off the last field of a data structure (the last value to be applied):

with Foo(Bar, Baz) as xs, x do // xs = Foo(Bar), x = Baz
    Pair(xs, x)                // Pair(Foo(Bar), Baz)
else
    ...

This is just enough to implement higher level constructs (such as equality for aggregate data structures) using custom binding constructs. To do that, the above calculus has to be exposed as special forms: The parser automatically translates them into the underlying language, but special forms can be reimplemented as and shadowed by normal functions. Variables and function application do not need special forms, the other constructs do:

fn('x, { body })                       // abstraction
if(x, y, { then }, { else })           // conditional
with(value, 'xs, 'x, { do }, { else }) // destructure

While the lower level language does not distinguish between tags and bindings, the higher level language needs a way to distinguish between these two types of values without exposing the internals of bindings (which would make their names observable and violate the desired guarantees).

The easiest way to do this is to let the parser never produce atomic tags, but always wrap tags and bindings in special type tags. A tag “Foo” thus becomes TAG(Foo), whereas the binding “'x” becomes BINDING(...), wrapping a tag that is guaranteed to be different from an 'x in a different scope. The two types of values can then be distinguished using two more functions, which can be implemented in terms of the others:

if-tag(x, { then }, { else })
if-binding(x, { then }, { else })

As long as there is no way to construct bindings other than wherever the parser converts an explicit binding (that is to say, there must be no way to dynamically turn a non-binding value into a binding), it is safe to use the available reflection to destructure tags and bindings and compare their inner tags with others for equality (assuming that tags and bindings use different prefixes / namespaces).

And that's it! This should be enough to implement custom binding constructs using only statically resolved scopes and bindings. The syntax is quite bare and leaves a lot to be desired, but that's a topic for another time.