(This is a continuation of last week's experiment.)
How could a language look like that combines the minimalism and generality of Lisp with enough syntactic sugar to steer clear of Lisp's soup of parentheses? A language that is general and minimal enough that every language construct is a function (even control flow operators and binding constructs) which can be redefined and shadowed by user-defined functions, but also flexible enough that pattern matching syntax can look like this:
match(x) [
Pair('x, 'x) -> { f(x) }
'x -> { g() }
]
Here's a proposal for enriching a minimal functional language step by step until it supports syntax like the above, using explicit bindings.
Our base language will be a minimal functional language with variables, anonymous functions and function application. Basically lambda calculus, but with slightly different syntax:
x // variable x
f(x) // function f called with argument x
Multi-argument functions can be simulated by currying, in other words by repeated function application:
f(a)(b)(c)
Anonymous functions are defined using blocks, written { ... }. A block is like the body of an anonymous function that does not specify its arguments. The arguments of a block are statically resolved by looking for the nearest binding:
// 'x is an explicit binding, used in the nearest block { ... }
fn('x)({ x })
Bindings are made explicit in the syntax by quoting a variable x as 'x. Any unquoted variable is resolved as normal, but a quoted variable is a binding, a “variable to be”, which is not resolved in the current context, but will be bound in the nearest block, delimited by { ... }.
Blocks are thus just syntactic sugar for anonymous functions, whose parameters are defined by the explicit bindings that appear to their left in the code. In the above example, the block { x } would be desugared to the anonymous function of one argument x that just returns x.
The “fn” function is a predefined function that discards its first argument and returns its second argument. If the first argument is a binding and the second argument is a block, this has the effect of returning an anonymous function with one argument.
Now that we have variables, explicit bindings, function calls and anonymous functions (using blocks), we can enrich the language with a bit of syntactic sugar.
Right now there is no way to represent data, which might be enough for lambda calculus, where everything is a function, but insufficient for a real programming language. Let's add tags (called atoms, keywords or symbols in other languages), which are atomic values that just represent themselves, written like variables but starting with an uppercase letter. Data structures can be built by applying tags to other values:
Foo // the tag Foo
Foo(Bar) // Foo applied to Bar
Foo(Bar)(Baz) // Foo applied to Bar, then applied to Baz
Calling a function f with three arguments a, b and c as “f(a)(b)(c)” is quite annoying, so let's define some syntax which just desugars to repeated application, which can be used both for functions and tags:
f(a, b, c) // sugar for `f(a)(b)(c)`
Pair(Foo, Bar) // sugar for `Pair(Foo)(Bar)`
Right now there are no "zero argument" functions, each function call needs an explicit argument. But in a language with side effects, it would be annoying to always have to call functions that only have a side effect with a bogus argument, so let's define a special nil tag “()” and define that “f()” is sugar for calling f with the nil value:
() // special nil tag
f() // sugar for f(())
Prefix calls of the form “f(a, b)” are great, but it would be nice to be able to define infix functions such as “a + b” or “a == b”. So let's add some sugar for that:
a f b // sugar for `f(a, b)`
a f(b) // sugar for `f(a, b)`
a f(b, c) // sugar for `f(a, b, c)`
How should a chain of infix calls such as “a f1 b f2 c” be parsed? We could simply disallow chained infix calls, but let's instead go with left-associativity here:
a f1 b f2 c // sugar for `f2(f1(a, b), c)`
This gives us a sort of uniform function call syntax for free, because instead of “a.f(b, c).g(d, e)” we can just write “a f(b, c) g(d, e)”.
Strictly speaking, we don't need grouping via parentheses to resolve precedence, because infix calls are left-associative and the right hand side of an infix call can already be wrapped in parentheses even for a single argument. But sometimes it can be helpful to be explicit about left-associativity, so let's allow parentheses even on the left hand side:
(a + b) * c // sugar for `*(+(a, b), c)`
(A code formatter / linter could always add explicit parentheses in cases where the infix functions have names such as “+”, “-”, “*”, “/”, “=”, “=>” etc, where something other than left-associativity might be assumed.)
While tags and tags applied to other values make it possible to build arbitrary data structures, sometimes we simply want a collection of values without tagging it with an explicit name, so let's add another special tag to build lists and add some sugar for it:
[Foo, Bar] // sugar for the list tag applied to Foo, then to Bar
Functions calls, lists and blocks all assume that their elements are separated by commas. Let's treat commas and newlines interchangeably as separators, let's allow separators at the beginning and end of calls, lists and blocks, and let's treat multiple successive separators as a single separator:
// sugar for `[Foo(Bar, { f, g }), Baz]`:
[
Foo(
Bar
{
f
g
}
)
Baz
]
Finally, let's allow trailing arguments: Whenever tags, bindings, lists or blocks appear as the last arguments of a function call, they can be written after the closing parenthesis instead. These trailing arguments can themselves have arguments that are wrapped in (...), but no trailing arguments. The parentheses after a function call can be omitted if all arguments are trailing arguments.
f(a) { b } // sugar for `f(a, { b })`
f(a) Foo [b, c] { d } // sugar for `f(a, Foo, [b, c], { d })`
f Foo Bar // sugar for `f(Foo, Bar)`
f Foo { Bar } { Baz } // sugar for `f(Foo, { Bar }, { Baz })`
f Foo(Bar) { Bar }() // sugar for `f(Foo(Bar), { Bar }())`
Using all of this syntactic sugar, we can see how...
match(x) [
Pair('x, 'x) -> { f(x) }
'x -> { g() }
]
...is sugar for...
match(x)(
LIST_TAG(
->(Pair('x)('x))({ f(x) })
)(
->('x)({ g(NIL_TAG) })
)
)
...which is further desugared using explicit bindings into...
// assuming a JS-like target language,
// with `x => y` as syntax for anonymous functions
match(x)(
LIST_TAG(
->(
Pair(BINDING_TAG("x"))(BINDING_TAG("x"))
)(
x => f(x)
)
)(
->(
BINDING_TAG("x")
)(
_ => g(NIL_TAG)
)
)
)
The resulting language is syntactically quite flexible, but I'm not sure whether it might not be too much sugar. In many situations, there are several different ways to write a function call: Using either prefix or infix syntax, using either normal or trailing arguments:
f(Foo, g(Bar, { h() }))
f(Foo, g Bar { h() })
Foo f g(Bar, { h() })
Foo f g Bar { h() }
In my first experiments, this flexibility definitely seemed to lead to a higher mental load of having to think about where infix calls and especially trailing arguments can be used to shorten the code. But maybe that's just a result of unfamiliarity and infix calls / trailing arguments will mostly be used by the “core” control flow structures of the language, whereas most “normal” functions will use prefix calls, at most with trailing blocks?
Perhaps having both infix calls and trailing arguments is too much and one construct is enough. Alternatively, trailing arguments could be made more restrictive, so that they only apply to blocks (or only blocks and lists), but not tags or bindings.
There is another bit of syntactic sugar that falls in the same category of sugar that is generally applicable to function calls but can be used to write control flow constructs with fewer parentheses: Optional keyword lists as used in Elixir (and also explained in the optional syntax sheet). These could be added to the language described here by using special syntax for pairs of tags and values:
// sugar for `[Foo, Bar]`:
foo: Bar
// sugar for `if(x == y, [Do, { f(x) }], [Else, { f(y) }])`:
if (x == y) do: {
f(x)
} else: {
f(y)
}
I'm generally happy with how these bits of sugar interact, even though it might be necessary to dial down the use of trailing arguments a bit. It's still too early to tell which of these bits of sugar really carry their weight, but the direction seems promising. (Sweet expressions are a similar attempt, I'm sure there are others I've missed.)
The next step is to implement a few higher level control structures and binding constructs in this language, then work on the missing features related to explicit bindings.