A hazard of precedence

I encountered a bug recently from a precedence mistake. Someone had written if (a = b == c), intending if ((a = b) == c) - but in C == has higher precedence than =, so that wasn't how the parser saw it. The typechecker didn't object, because the result type of == in C is int, so all was apparently well. None of the humans noticed either, perhaps because b was a large expression, so the two infix operators were far apart.

Operator precedence saves lots of parentheses, but it does occasionally lead to bugs. It's one of the class of language features that work by resolving ambiguity, and these have a common hazard: when you don't notice the ambiguity, it will still be resolved, and not necessarily as you intend.

Short names are addictive

This morning I tried to do this in Common Lisp:

(defun neighbors (x y)
  (mapcar (fn (dx dy) (vector (+ x dx) (+ y dy)))
          '(-1 0 1 0)
          '(0 1 0 -1)))

Naturally, SBCL spit out half a page of warnings, starting with this:

;     (+ X DX)
; 
; caught WARNING:
;   undefined variable: DX

;     (DX DY)
; 
; caught STYLE-WARNING:
;   undefined function: DX
; 
; caught WARNING:
;   undefined variable: DY

It took me an embarassingly long time to figure out why. I have become so used to fn being lambda — in Arc, Clojure, and any number of other new Lisps, including my own — that I didn't notice anything out of the ordinary.

I did at least remember that CL can't add vectors, which was how I wanted to write it. This is a hazard of learning new languages, and of designing them: improvements, even such minor ones as shorter names, are addictive. They're fun at first, but soon you become dependent on them, and the old ways seem unbearably clunky, if you can even remember them.

The value of extensible format strings

Common Lisp's version of printf is the famously overpowered format. In addition to everything printf does, it has conditionals, iteration, recursion, case-folding, tabulation and justification, pretty-printing, English plurals and number names (cardinal and ordinal), and two kinds of Roman numerals. Surprisingly, in a language where almost everything is user-accessible, most of these features are not available separately, only through format. The complexity is such that there's a compiler to make format strings execute faster. Sadly (or not, depending on your perspective) it's not Turing-complete, but that's only because it has no way to store information.

Of course there's a way around that. None of format's features is quite as overpowered as ~/, which calls arbitrary functions: (format stream "~/package:function/" x) does (package:function stream x nil nil). (The two nils are for some other features not used in this case. Yes, there are more. Are you surprised? And yes, in practice the explicit package name is required.) User extensibility is a good principle, but this seems silly. Why would you ever want to call a function through format instead of doing so directly?

Well, the other day I had to generate some XML in C, and the obvious way was with printf:

fprintf(somefile, "<element attr=\"%s\" />", somestring);

But there might be XML-meaningful characters in the string, and I didn't want to mess around with fancy XML-generation libraries. So I had to either escape the string before printing...

char buffer[BIG_ENOUGH]; /* yeah, right */
escape_xml(buffer, BIG_ENOUGH, somestring);
fprintf(somefile, "<element attr=\"%s\" />", buffer);

...or give up on printf, which was what I ended up doing:

fprintf(somefile, "<element attr=\"");
print_xml_string(somefile, somestring);
fprintf(somefile, "\" />");

Either option destroys the clarity of the printf. What I really wanted was a custom printf operation, like this:

fprintf(somefile, "<element attr=\"%/escape_xml/\" />", somestring);

That's exactly what format's ~/ command does:

(format somefile "<element attr=\"~/xml:escape/\" />" somestring)

I guess it's not so silly after all. It is verbose (imagine repeating ~/xml:escape/ for each of a dozen attributes), but that's fixable. If there were an interface for defining new format commands, then any frequently used ~/ function could be given a one-character form, and all would be well, except possibly for readability. (Although in this case all of the obvious characters x e & < s are already taken.) Lisp being Lisp, you generally can get at the implementation's way of defining format commands, e.g. sb!format::def-format-directive, but depending on implementation internals is not usually a good idea. Exposing this interface would make format more malleable, like the rest of the language, and would also make its long feature list easier to swallow.

For new languages, though, I think I prefer string interpolation, which avoids the issue entirely:

(put somefile "<element attr=\"$(xml:escape somestring)\" />")

It would also be nice to have a choice of string delimiters, so the quotemarks don't require escaping. But that's a different, less interesting issue.