These are a few of my favourite macros

Much of this post seems familiar to me, as if I've seen it somewhere else, perhaps on LL1-discuss or comp.lang.*. But I can't find the post I remember, so maybe I'm imagining someone else saying what I'm thinking.

Macros are flexible, and unfamiliar to most programmers, so they inspire a lot of confusion (more, in my opinion, than they deserve, but that's a topic for another day). Sometimes people try to make sense of this confusion by classifying them into a few categories. These classifications typically include:

  1. Macros that evaluate some arguments lazily, like if and and, or repeatedly, like while.
  2. Macros that pass some arguments by reference rather than by value, like the setf family.
  3. Binding macros that simply save a lambda: with-open-file. In languages with very terse lambda (like Smalltalk) these are not very useful, but in languages that require something like (lambda (x) ...), they're useful and common.
  4. Macros that quote some arguments (i.e. treat them as data, not expressions).
  5. Defining macros like defstruct.
  6. Unhygienic binding macros: op, aif.

The reasons for the classifications vary. Sometimes the point is that all of the categories are either trivial or controversial. (The people making this argument usually say the trivial ones should be expressed functionally, and the controversial ones should not be expressed at all.) Sometimes, as in this case, the point is that some of the categories are hard to express in any other way. Sometimes the point is that some categories are common enough that they should be built in to the language (e.g. laziness) or supported in some other way (e.g. terse lambda) rather than requiring macros.

These classifications aren't wrong, but they are misleading, because the most valuable macros don't fit any of these categories. Instead they do what any good abstraction does: they hide irrelevant details. Here are some of my favourites.

Lazy cons

If you want to use lazy streams in an eager language, you can build them out of delay and eager lists. But this is easy to get wrong. Do you cons an item onto a stream with (delay (cons a b))? (cons (delay a) (delay b))? (delay (cons (delay a) b)? Something else?

This is hard enough that there's a paper about which one is best and why. Even if you know (and regardless of whether you disagree with that paper), it's easy to make mistakes when writing the delays by hand. But the exact place where laziness is introduced is an implementation detail; code producing streams doesn't usually care about it. A lazy-cons macro can hide that detail, so you can use lazy streams without worrying about how they work. That's what any good abstraction should do.

Sequencing actions

Haskell's do is not, officially, a macro, but this is only because standard Haskell doesn't have macros; in any case do is defined and implemented by macroexpansion. Its purpose is to allow stateful code to be written sequentially, in imperative style. Its expansion is a hideous chain of nested >>= and lambdas, which no one wants to write by hand (or read). Without this macro, IO actions would be much more awkward to use. Some of this awkwardness could be recovered through functions like sequence, but the use of actions to write in imperative style would be impractical. do hides the irrelevant functional plumbing and relieves the pain of something necessary but very un-Haskell-like. Really, would you want to use Haskell without it?

List comprehensions

Haskell's list comprehensions, like its do, express something that could be done with functions, but less readably. List comprehensions combine the functionality of map, mapcat, and filter in a binding construct that looks a lot like set comprehensions. They save having to mention those list functions or write any lambdas.

I sometimes wish there was a way to get a fold in there too, but it's a good macro as it is.

Haskell list comprehensions wear a pretty syntactic skin over their macro structure, but this is not essential. Clojure's for demonstrates that a bare macro works as well.

Partial application

Goo's op (and its descendants like Arc's [... _ ...] and Clojure's #(... % ...)) is an unhygienic binding macro that abbreviates partial application and other simple lambdas by making the argument list implicit. It hides the irrelevant detail of naming arguments, which makes it much terser than lambda, and makes high-order functions easier to use.

Language embedding

There is a class of macros that embed other languages, with semantics different from the host. The composition macro from my earlier posts is one such. A lazily macro that embeds a language with implicit laziness is another. The embedded languages can be very different from the host: macros for defining parsers, for example, often look nothing like the host language. Instead of function call, their important forms are concatenation, alternatives, and repetition. Macros for embedding Prolog look like the host language, but have very different semantics, which would be awkward to express otherwise.

Like do, these macros replace ugly, repetitive code (typically with a lot of explicit lambdas) with something simpler and much closer to pseudocode.

The usual tricks

Most macros do fall into the simple categories: binding, laziness and other calling conventions, quotation, defining, etc. It's easy to think, of each of these uses, that it ought to be built into the language so you don't have to “fake” it using macros.

Fake? There's nothing wrong with using a language's expressive power to supply features it doesn't have! That's what abstraction is for!

The C preprocessor is a very useful thing, but of course it has given macros a bad name. I suspect this colors the thinking even of people who do know real (i.e. tree) macros, leading them to prefer a “proper” built-in feature to its macro implementation.

From my point of view, a macro is much better than a built-in feature. A language feature complicates the language's kernel, making it harder to implement, and in particular harder to analyze. Macros cover all of them, plus others the designers haven't thought of, in a single feature — and they don't even complicate analysis, because they disappear when expanded, so the analysis phase never sees them.

(To be fair, macros do require the language's runtime to be present at compile-time, and create the possibility of phasing bugs. But either interactive compilation or self-hosting requires the former anyway, and the latter only interferes with macros, so at worst it's equivalent to not having them. Neither is remotely as bad as being unable to express things the language designer didn't think of.)

So I see macros not as a weird, overpowered feature but as an abstractive tool nearly as important as functions and classes. Every language that aims for expressive power should have them.

4 comments:

  1. One of my beefs with macros (vs builtins) is exactly the analysis issue; how do you get good (i.e. semantic) error messages, warnings etc. out of features implemented as macros? IMO the (practical) analysis issue is rather harder for macros than builtins, without a whole lot of other machinery that starts dictating the design of your compiler.

    But my biggest beef is that they enable above-average programmers to create their own half-baked embedded language. A bunch of people do this for various libraries, and now you have to try and compose the lot of them. Or you have to hire new people to work on code that's effectively written in a proprietary language nobody wants to learn. Highly expressive macros enable too much weirdness, without enough enforcement of conventionality. A situation that's fine when you're working all on your own, but not sustainable in a broader ecosystem.

    I'm not arguing for Java, by any means, but there's a lot of power in a software ecosystem that has common constructs that everybody uses, such that you don't have to deal with type incompatibilities and write shims to join two libraries together. And a limited language defines the scope of a library's interface. But macros enable arbitrary enhancement of the language, and you'll end up with different groups creating different, incompatible implementations of common ideas, and a mess trying to glue them all together.

    ReplyDelete
    Replies
    1. Producing source-level error messages is indeed hard (it requires keeping source locations in the ast, like Racket), but it's not very pressing, because macroexpansion-level error messages are (surprisingly) usually pretty transparent.

      Does unreadable overuse of macros really happen in the wild, or only in speculation? I've seen plenty of unmaintainable code-mazes made out of classes, and functions, and C++ templates, and mere conditionals — but never out of macros.

      Delete
  2. How about fexprs? To me, they're macros done right: hygiene is trivial, phasing problems are non-existent, and the design is just much cleaner.

    ReplyDelete
    Replies
    1. There's just that small compilation problem...

      Actually that's a problem with first-class special forms, not with fexprs per se. Second-class fexprs (restricted to being statically known, like ordinary macros, and using only statically determinable environments — did I miss any restrictions?) have expressive power equal to macros, and can always be compiled by inlining. I don't know if anyone has tried this yet.

      FWIW I'm not sure fexprs are really cleaner than macros — they force environments to exist at runtime, while macros are purely static, and wonderfully simple at heart. It's the hygiene and phasing apparatus that's unclean.

      Delete

It's OK to comment on old posts.