Arcane Sentiment: August 2011

Superficial features affect style

While translating some code from Perl to a toy Lisp, I noticed I was treating something differently in the two languages: variables with one-letter names look much worse in Lisp than in Perl. $s obviously doesn't collide with anything predefined, but s looks like it might collide with something (perhaps the s combinator?), so I felt obliged to give it a longer name in Lisp, even though this didn't really help readability. But I was perfectly comfortable with the one-letter name in the original Perl.

There are some good reasons for worrying less about name collisions in Perl. Variable sigils reduce the risk of collisions by putting functions and scalars in separate namespaces. Perl doesn't have many predefined names, and entirely avoids one-letter names, so there's less need to worry about collisions with them. But I suspect my main reason was just that $s doesn't look like a one-letter name.

Closures of sets

Some operations are common in mathematical reasoning, but are not traditionally used in programming, even when they make sense. I stumbled on one of these recently: I described something as the closure of a set under an operation, and then sought some other way to compute it, because the mathematical description didn't sound like an implementation. After a few minutes I realized there was no reason it shouldn't be:

(defn closure-unary "Closure of a set under a function." [f xs]
  (loop [done #{}
	 pending xs]
    (if (empty? pending)
      done
      (let [x (first pending)]
        (if (contains? done x)
	  (recur done (rest pending))
	  (recur (conj done x) (cons (f x) (rest pending))))))))

user=> (closure-unary - (range 3))
#{0 1 2 -2 -1}
user=> (closure-unary #(mod (inc %) 10) [0])
#{0 1 2 3 4 5 6 7 8 9}
user=> (defn collatz [n] (if (even? n) (quot n 2) (inc (* n 3))))
#'user/collatz
user=> (closure-unary collatz [27])
#{160 1 161 577 2 1154 1186 322 866 2051 35 4 2308 484 1732 5 3077
 325 4102 70 3238 71 103 167 263 8 40 488 4616 41 137 233 425 10
 6154 106 650 107 395 1132 364 46 2158 142 206 334 526 2734 47 175
 719 911 16 9232 80 976 433 593 82 242 274 466 754 850 1619 20 244
 1300 1780 53 182 214 310 502 566 790 23 1079 1367 7288 184 1336
 121 377 122 4858 890 27 91 155 251 283 92 124 1276 412 3644 668
 700 61 2429 445 62 94 350 1438 638 1822 958 31 319 479}

That was the version I wanted, but closure under a binary operator is probably more common:

(defn closure-binary "Closure of a set under a binary operator." [f xs]
  (loop [done #{}
	 pending xs]
    (if (empty? pending)
      done
      (let [x (first pending)]
	(if (contains? done x)
	  (recur done (rest pending))
	  (recur (conj done x)
		 (concat (map #(f x %) done)
			 (map #(f % x) done) ;in case it's not associative
			 (list (f x x))
			 (rest pending))))))))

user=> (closure-binary #(not (or %1 %2)) [false])
#{true false}
user=> (closure-binary #(mod (- %1 %2) 11) [1])
#{0 1 2 3 4 5 6 7 8 9 10}

The two variants are identical except in how they generate new elements. The common part could be factored out...

(defn closure* "Closure of a set under a function generating new elements:
(children element old-elements) should return a seq of elements to add."
  [children xs]
  (loop [done #{}
	 pending xs]
    (if (empty? pending)
      done
      (let [x (first pending)]
        (if (contains? done x)
	  (recur done (rest pending))
	  (recur (conj done x) (into (rest pending) (children x done))))))))

(defn closure-unary [f xs]
  (closure* (fn [x done] (list (f x))) xs))

(defn closure-binary [f xs]
  (closure* (fn [x done]
	       (concat (map #(f x %) done)
		       (map #(f % x) done)
		       (list (f x x))))
	    xs))

...but passing done to children is not very intuitive.

Special behaviour for interactive redefinitions

Dynamism — allowing definitions to be replaced in a running program — is a hugely convenient feature, because it can shorten the edit-test loop by so much. But like many powerful features, it makes some errors harder to detect — in this case, name collisions. It's easy to catch these automatically in a static language: simply make redefinition of an existing variable an error. But in a dynamic language, redefinition is a feature, so we can't simply reject it. We can give a warning, so collisions won't go undetected, but this will annoy users whenever they interactively redefine something. Can we do better?

In some Lisps, defvar has a related problem, and illustrates a possible solution. Its purpose is to create a (global) variable with a given initial value. There's a problem with redefining: what do you do if the variable already exists, and has a different value?

In both Common Lisp and Emacs Lisp, defvar has the curious feature that it sets the variable only if it doesn't already have a value — if it already exists, it's not changed. This is intended to make re-loading sourcefiles safe: otherwise, it would reset your global variables.

If you think this is surprising behavior, you're not alone. Alan Crowe once suggested that it would be less confusing if defvar were called something like def-perv-noclobber instead, because neither its pervasive specialness nor its clobber-avoidance are what you would expect from something called defvar. (defonce is a better name.)

When a user reevaluates a single defvar interactively, however, they probably do want to clobber the value, or they wouldn't bother. Both Emacs and Slime therefore have a special case: when a defvar is evaluated interactively (as by C-M-x), it will clobber any existing value. This is ugly, of course, but like C-t's irregular behaviour at end-of-line, it does what the user wants.

Something similar might be useful for detecting errors of redefinition. Can we allow redefinitions when a definition is evaluated interactively, but not when it's loaded as part of a file? No, because it's normal to reload a source file after editing it, which replaces every definition in the file, but shouldn't produce any warnings. Nor should we disable the warning for all interactive evaluation, including load, because that's when the warnings are most useful. Interactivity alone is not a sufficient reason to suppress warnings.

What we really want is to warn about redefinition only within a single interactive operation. It's generally OK for a load to replace old definitions; it's only suspicious if a definition is made twice without user intervention. So if a name is defined twice in a single load (directly, or in any other files it loads), that's an error or warning, but when it's defined twice in two separate batches, that's OK. This is easy to implement (by e.g. recording each definition made in the current load), and gives much more precise warnings.

If it's good enough for a real antivirus, it's good enough for a fake one

I encountered a piece of scareware recently — “Windows Vista Antispyware”, which claimed to be part of Windows and to have detected viruses and other malware, and demanded money to “remove” them.

It was slickly done. The UI looked like part of the Windows Security Center control panel, there were only a few typos and stylistic missteps, and the infections it reported were the (randomly selected) names of real malware. But it would have been more convincing if it hadn't claimed the machine was infected with the dreaded EICAR-Test-File.

State is not always to blame

Nelson Elhage describes a cute Linux vulnerability, of which the heart is that Linux sometimes temporarily turns off its pointer validity checks, and then does things that aren't safe without those checks.

My reaction (after the initial wow, Linux has stupid mistakes too, as if it should be a surprise) was interestingly wrong: I blamed the problem on the use of state, and briefly supposed it should have been done with something like dynamic scope instead. But actually state had nothing to do with it. The problem was in turning off the checks over an unnecessarily large scope, and in not being clear about what operations were safe with the checks turned off. That they were turned off statefully was irrelevant; the result would have been the same regardless of the mechanism.

Zealous functional puritans blame all kinds of problems on state, and they have an unwholesome influence on other functional programmers. We learn that state is an acceptable target, that it's the usual suspect for any problem, that no evidence is necessary to accuse it. And so we do so even in cases like this vulnerability, where a moment's thought would reveal that state was innocent, and for many other problems whose causes are not so obvious. How many really have anything to do with state?

Why use keywords as symbols?

From the beginning, Lisp has been associated with symbolic programming, for which symbols have always been the canonical type. So it's odd that many Lisps also have a separate :keyword type, which they use instead of symbols for all symbolic purposes except representing their own programs. Why bother? Why have two separate types that mean the same thing?

The motivation, in Common Lisp and (AFAIK) Clojure, is that these languages' symbols aren't pure symbols: in addition to identity and name, they include some other features, in particular a package or namespace. That's useful for representing programs, but most other applications of symbols don't want namespaces, so the languages also have keywords to serve as pure symbols. Lisps whose symbols don't have packages don't usually bother with keywords, but Racket and elisp have them anyway for use with keyword arguments. (In Racket, they need to be distinct from symbols because the function call form handles them specially rather than just passing them as part of the arglist. In elisp, they're only cosmetic; plain symbols could have been used instead.)

Avoiding namespaces is a good reason to use keywords, but I don't think it's the only reason, because users don't see keywords as a workaround; they seldom have to be told to avoid symbols because they're more complex than they appear. Instead, they overwhelmingly prefer keywords, not just for keyword arguments but as symbolic values. Even in elisp, where there's no reason not to use ordinary symbols, keywords are sometimes used instead. Why?

For one thing, they help distinguish programs from data. Humans, even Lispers who know better, tend to see these as two different things, and get confused when they mistake one for the other. When programs are written with symbols and data is written with keywords, it's easy to tell them apart.

Keywords are also more regular in appearance, because they're self-evaluating. It's tempting (for beginners, and as an unconscious heuristic for non-beginners) to think of symbols and keywords as differing only in their prefix character: 'x is the practical equivalent of :x, right? But they're not equivalent, because symbols have a quote only when they appear as forms. When they appear in a larger piece of quoted data, or as output from the REPL, or in non-evaluated contexts, they don't have the quote. This is only a superficial inconsistency, but it's annoying:

CL-USER> (list 'a 'b 'c)
(A B C)
CL_USER> (second '(a b c))
B
CL-USER> (defclass foo () ((a type integer initarg a accessor foo-a)))

(That last is not valid CL, obviously; it's what defclass would look like without keywords. I suspect users would perennially forget that most of the slot options aren't evaluated, even more than they already do, and put in quotes anyway, because they're not accustomed to writing unquoted symbols.)

Keywords, because they're self-evaluating, don't have this variation. They look the same whether they're in quoted data, or in the output of the REPL, or standing alone as forms, or as non-forms:

CL-USER> (list :a :b :c)
(:A :B :C)
CL-USER> (second '(:a :b :c))
:B
CL-USER> (defclass foo () ((a :type integer :initarg :a :accessor foo-a)))

More comfortable, isn't it? I wonder if this explains some of the popularity of keywords. (And also of Clojure's vectors, which while not strictly self-evaluating are similarly more regular in appearance than lists, because they don't usually require quoting.) If so, this is a disappointment to me, because it means this seemingly pointless distinction, like that between symbols and strings, is actually doing something important, and can't necessarily be simplified away.

An anaphoric-macro-defining macro

Attention conservation notice: this post is about a special-purpose macro-defining macro. I doubt it will be of much interest to anyone but macro fans.

A year ago, Vladimir Sedach asked for arguments about why macros are and aren't useful. (He didn't get many; most of the people who can make such arguments are tired of doing so and not confident they're right.) In the same post, he mentioned that he uses the Anaphora macro library. Anaphora itself provides an example of how macros are useful, not only because it defines some handy ones, but because many of their definitions are repetitive:

(defmacro awhen (test &body body)
  "Like WHEN, except binds the result of the test to IT (via LET) for the scope
of the body."
  `(anaphoric when ,test ,@body))

(defmacro swhen (test &body body)
  "Like WHEN, except binds the test form to IT (via SYMBOL-MACROLET) for the
scope of the body. IT can be set with SETF."
  `(symbolic when ,test ,@body))

(defmacro acase (keyform &body cases)
  "Like CASE, except binds the result of the keyform to IT (via LET) for the
scope of the cases."
  `(anaphoric case ,keyform ,@cases))

(defmacro aetypecase (keyform &body cases)
  "Like ETYPECASE, except binds the result of the keyform to IT (via LET) for
the scope of the cases."
  `(anaphoric etypecase ,keyform ,@cases))

This is the sort of repetition that could be addressed with a macro. We can generate these definitions automatically, docstrings and all:

(defmacro defanaphoric (name (firstarg &rest moreargs) original &key settable)
  "Define NAME as a macro like ORIGINAL, but binding IT to the result of FIRSTARG.
If SETTABLE is true, IT will be a symbol-macro which can be set with SETF."
  (let ((whole (gensym "WHOLE")))
    `(defmacro ,name (&whole ,whole ,firstarg ,@moreargs)
       ,(format nil "Like ~S, except binds the result of ~S to IT (via ~S) for the scope of the ~A.~A"
          original
          firstarg
          (if settable 'let 'symbol-macrolet)
          (if (and moreargs (member (car moreargs) '(&body &rest))) (cadr moreargs) "body")
          (if settable " IT can be set with SETF." ""))
       ,@(mapcan (lambda (a) (unless (member a lambda-list-keywords)
                               `((declare (ignore ,a)))))
                 (cons firstarg moreargs))
       `(,',(if settable 'symbolic 'anaphoric) ,',original ,@(cdr ,whole)))))

Then it's easy to define individual anaphoric macros:

(defanaphoric awhen (test &body body) when)
(defanaphoric swhen (test &body body) when :symbolic t)
(defanaphoric acase (keyform &body cases) case)
(defanaphoric aetypecase (keyform &body cases) etypecase)

Note that almost half the code about generating docstrings. Anaphoric macros, like many other things, would be much simpler to define if they didn't need documentation.

The argument list (firstarg &rest moreargs) also exists purely for documentation purposes. We don't need it to define the macros — a simple &rest would do — but we need to pass it to defmacro so tools like SLIME and describe can display it. But repeating it as a parameter is ugly; it would be better to get the original's argument list automatically. There's no standard way to do this, but every implementation supports it; if we have a portability wrapper like swank-backend:arglist (or some non-Swank-dependent equivalent), we can do away with the explicit argument list:

(defmacro defanaphoric (name original &key settable)
  "Define NAME as a macro like ORIGINAL, but binding IT to the result of the first argument.
If SETTABLE is true, IT will be a symbol-macro which can be set with SETF."
  (let ((whole (gensym "WHOLE"))
        (arglist (swank-backend:arglist original)))
    `(defmacro ,name (&whole ,whole ,@arglist)
       ,(format nil "Like ~S, except binds the result of ~S to IT (via ~S) for the scope of the ~A.~A"
          original
          (car arglist)
          (if settable 'let 'symbol-macrolet)
          (if (and (cdr arglist) (member (cadr arglist) '(&body &rest)))
       (caddr arglist)
       "body")
          (if settable " IT can be set with SETF." ""))
       ,@(mapcan (lambda (a) (unless (member a lambda-list-keywords)
                               `((declare (ignore ,a)))))
                 arglist)
       `(,',(if settable 'symbolic 'anaphoric) ,',original ,@(cdr ,whole)))))

This makes the definitions of anaphoric macros transparently simple:

(defanaphoric awhen when)
(defanaphoric swhen when :settable t)
(defanaphoric acase case)
(defanaphoric aetypecase etypecase)

This might be useful for simplifying Anaphora, or even as a tool to expose to its users, but it's not much good as an answer to Vladimir's request for arguments for and against macros — macro-defining macros do not look useful to anyone who doesn't already think macros are useful.