Notes

Minimal Yet Extensible Modern Syntax (Without All the Parentheses)

This is a proposal for a programming language syntax that is both modern and familiar to anyone used to C/JS-style languages with curly braces, yet minimal and extensible enough that all control structures, including conditionals and variable declarations, can be implemented as normal functions and redefined at will.

It is similarly minimal and flexible as Lisp, but does not rely on macros and uses parentheses a lot more sparingly. Other approaches that I'm aware of include Sweet expressions in Lisp, keyword lists in Elixir and the minimal but general syntax of Koka.

(This proposal builds on the idea of explicit bindings and last week's note on syntax, but is self-contained and links back where necessary.)

The end result looks like this:

// user-defined variable declaration, `=` is just a function:
'x = "foo"
'y = "bar"
f(x, y)

// user-defined pattern matching, `match` and `->` are just functions:
match (x) [
    Pair('x, 'x) -> { f(x) }
    'x           -> { g() }
]

// user-defined if-else, `if` and `==` are just functions:
if (x == y) {
    f()
} else {
    g(x, y)
}

The Starting Point: Curried Functions

Let's start with a minimal functional language with variables, anonymous functions and function application. Basically lambda calculus, but with slightly different syntax:

x           // variable x
f(x)        // function f called with argument x
{ 'x => x } // anonymous function with argument x, returning x

Variables and function calls work just like in C/JS-like languages, while the syntax for anonymous functions is borrowed from Kotlin's lambdas, except with “=>” instead of “->”, and the variable that is being bound is written as 'x instead of x.

Multi-argument functions can be simulated by currying:

f(a)(b)(c)

Tags: Foo, "foo", "foo bar"

Right now there is no way to represent data, which might be enough for lambda calculus, but insufficient for a real programming language. Let's add tags (called atoms, keywords or symbols in other languages), which are interned strings / atomic values that just represent themselves, written like variables but starting with an uppercase letter. Data structures can be built by applying tags to values:

Foo           // the tag Foo,
              // an interned atomic value

Foo(Bar)      // Foo applied to Bar,
              // basically struct Foo with field Bar

Foo(Bar)(Baz) // Foo applied to Bar, then applied to Baz,
              // basically a struct Foo with fields Bar and Baz

It is possible to have tags that do not start with an uppercase letter or that contain whitespace by explicitly wrapping a tag in "...":

"foo"
"foo bar"
"Foo" // same as `Foo`

Lists: [...]

While tags make it possible to build arbitrary data structures, sometimes we simply want a collection of values without tagging it with an explicit name, so let's add syntax for lists:

[Foo, Bar, Baz]

Side Effects: (), f(), { ... }

Right now there are no “zero argument” functions, each call needs an explicit argument. But in a language with side effects, there might be functions that don't care about their argument, so let's define a special nil value “()” and define that “f()” is sugar for calling f with the nil value:

()  // special nil tag
f() // sugar for f(())

Additionally, an anonymous function { ... } that does not declare a variable using “'x => ...” will be treated like a function that simply ignores its argument:

{ f(y) } // sugar for `{ 'x => f(y) }`

Prefix Calls: f(a, b, c) [...] {...}

Calling a function f with three arguments a, b and c as “f(a)(b)(c)” is a bit unfamiliar coming from C/JS-like languages, so let's define some sugar that can be used both for functions and tags:

f(a, b, c)     // sugar for `f(a)(b)(c)`
Pair(Foo, Bar) // sugar for `Pair(Foo)(Bar)`

Trailing arguments are allowed: Whenever [...] or {...} appear as arguments of a function call, they can be written after the closing parenthesis instead. Trailing arguments cannot be followed by (...), only by further [...] or {...}. The parentheses after a function call can be omitted if all arguments are trailing arguments.

f(a) { 'x => x }        // sugar for `f(a, { 'x => x })`
f(a) [b, c] { 'x => x } // sugar for `f(a, [b, c], { 'x => x })`
f [a, b]                // sugar for `f([a, b])`
f { Bar } { Baz }       // sugar for `f({ Bar }, { Baz })`

f { Bar }(x)  // not allowed, (x) appears after trailing arg

Piped Calls: a.f(b, c) [...] {...}

It is often natural to express computations as a chain of operations applied to a value, transforming the value step by step. Instead of a pipe operator or threading macro, let's just use uniform function call syntax:

a.f1(b, c).f2(d) // sugar for `f2(f1(a, b, c), d)`

Piped calls are just prefix calls whose first argument is written on the left of the function, so piped calls support [...] and {...} as trailing arguments.

Infix Calls: a f b

It would be nice to be able to use infix functions without any parentheses, such as “a + b” or “a == b”. So let's add some sugar for that:

a f b // sugar for `f(a, b)`

How should we deal with chained infix calls such as “a + b + c”? Are infix calls left-associative, right-associative or can this be defined? Let's keep it simple and always require parentheses for chaining infix calls:

(a - b) - c   // sugar for `-(-(a, b), c)`
a -> (b -> c) // sugar for `->(a, ->(b, c))`

Let's also require that while the arguments a and b of can be arbitrary expressions (that might have to be wrapped in (...) if they are infix calls), the function f must be always be a variable, never a composite expression.

Keyword Calls: f (a) { ... } else { ... }

Requiring infix functions to be variables makes it possible to have keywords calls, which are similar to Elixir's Optional keyword lists: Whenever a variable f is followed by a (...), [...] or {...} (ensuring that it's not an infix call), all variables that follow are treated as “keywords” and passed to the function f as a list of tags and their argument(s). Let's look at an example:

if (x) { // the variable `if` is followed by a non-variable `(x)`...
    f()
} elif (y) { //...so `elif` is treated as a keyword...
    g()
} else { // ...and so is `else`...
    h()
}

// ...and the above is desugared to:
if(
    x,
    { f() },
    [
        ["elif", y, { g() }],
        ["else", { h() }]
    ]
)

Having 4 different ways to call functions (prefix call, piped call, infix call and keyword call) might seem excessive, but it's usually easy to know which one to pick:

Explicit Bindings and {...}

It might seem strange to make the bound variable of an anonymous function explicit by writing { 'x => x } instead of just { x => x }, but making bindings such as 'x explicit allows us to define => as a normal function instead of a built-in part of the syntax. Here's how:

// the "anonymous function" syntax:
{ 'x => x }
// ...is sugar for:
'x => { x } 
// ...which is sugar for:
=>('x, { x })

Bindings are made explicit by quoting a variable x as 'x. Any unquoted variable is resolved as normal, but a quoted variable is a binding, a “variable to be”, which is not resolved in the current context, but will be bound in the nearest block, which is delimited by { ... }.

Blocks are just syntactic sugar for anonymous functions, whose parameters are defined by the explicit bindings that appear to their left in the code. In the above example, the block { x } would be translated to an anonymous function of one argument x that just returns x.

To separate “surface syntax” (which includes functions like => and { ... } blocks) and the anonymous functions that they are translated to, the following examples will use the non-surface syntax notation “x ===> x” for an anonymous function that returns its argument.

More explicitly, whenever a function f is called with a block as its argument, this function acts as a “binder”: The block will consume all the bindings that occur as arguments of f (or as arguments to arguments of f and so forth) before the block. A block never receives bindings that come from “above” the binder f in the syntax tree, which makes binders act as scopes (more details here):

// this binds `x ===> x + x` to f, then calls f(Y)
let('f, 'x => { x + x }, { f(Y) })
//   |   \____________/          |
//   |     scope of x            |
//   \___________________________/
//           scope of f
//
// { x + x } is translated to `x ===> x + x`
// { f(Y) } is translated to `f ===> f(Y)`

Nested Blocks: {..., ...}

Let's add a bit more syntactic sugar: Right now blocks always have to be used after the explicit bindings that the block is supposed to consume, but it makes sense to also allow blocks to consume bindings by enclosing bindings:

{ 'x = y, f(x) }

Whenever explicit bindings are declared inside a block but not consumed until the end of the block, the bindings will be consumed by the enclosing block, as if everything that follows the call with the explicit bindings had been passed as a block right then. The above example will be desugared to “=('x, y, { f(x) })”. In other words, instead of passing a block as the last argument of a function call, we can just use that function inside the block. This is especially helpful for nested bindings:

// desugars to: =('a, x, { =('b, y, { Pair(a, b) }) })
{ 'a = x, 'b = y, Pair(a, b) }

This assumes that every element in a block will be a binding construct that knows how to handle the block argument that is implicitly passed to it. But especially in a language with side effects it would be nice to allow block elements to not bind anything at all and just “do” something, like print(x).

To allow this, whenever a block element does not contain any explicit bindings that could be consumed by the enclosing block, the element will be evaluated (call-by-value) as an argument to an anonymous function, which when called will return the rest of the block:

// binding construct, uses the binding 'x:
{ 'x = Foo, ... } // translates to: `=('x, Foo, x ===> ...)`

// side effect, does not use the binding:
{ f(), ... } // translates to: `(_ ===> ...)(f())`

This makes it easy to mix binding constructs and side effects:

// translates to: =('a, x, { (_ ===> f(a))(print(a)) })
{ 'a = x, print(a), f(a) }

Lastly, a program is treated as a {...} block without the “{” and “}” at the start/end:

// sugar for `{ 'x = "foo", 'y = "bar", f(x, y) }`:
'x = "foo"
'y = "bar"
f(x, y)

Separators: Commas or Newlines

Functions calls, [...] and {...} all assume that their elements are separated by commas. Let's treat commas and newlines interchangeably as separators, let's allow separators at the beginning and end of calls, [...] and {...}, and let's treat multiple successive separators as a single separator:

// sugar for `[Foo(Bar, { f, g }), Baz]`:
[
    Foo(
        Bar
        {
            f
            g
        }
    )
    Baz
]

Putting it All Together

Using all of this syntactic sugar, we can see how...

match (x) [
  Pair('x, 'x) -> { f(x) }
  'x           -> { g() }
]

...is sugar for...

match(x)([
    ->(Pair('x)('x))({ 'x => f(x) }),
    ->('x)({ 'y => g(()) })
])

Using keyword calls, the code...

// user-defined if-else, `if` and `==` are just functions:
if (x == y) {
    f()
} else {
    g(x, y)
}

...is sugar for...

if(
    ==(x)(y)
)(
    { 'x => f(()) }
)([
    ["else", { 'z => g(x)(y) }]
])

Possible Extensions

While the above syntax makes up the core of the proposal, there are a couple of extensions that could be added while keeping the syntax minmal yet extensible: