Arcane Sentiment: February 2011

The curious attraction of one-argument typep

In Maclisp, the function typep takes one argument and returns its type in the form of a symbol. This is a fundamental operation, and it's egregiously misnamed, as it's not a predicate. (The -p convention is much older than typep, so the name was misleading even in Maclisp.) Lisp Machine Lisp preserved it for compatibility, but Common Lisp fixes the confusion by calling it type-of and using typep only for the two-argument type membership predicate.

I've never used Maclisp, nor any dialect with one-argument typep, so this misnaming should interest me only as historical trivia. But I find it oddly compelling. When I read typep, I think of the unary version first, and only remember the type membership test when context reminds me. When I refer to the “what type is this?” operation without having a specific language in mind, I tend to carelessly call it typep.

Why? I suspect it's because the most obvious name, type, is too bare — it sounds like it might collide with something else, although it doesn't in either Maclisp or Common Lisp — so I look for an affix to distinguish it. The -of in type-of works, but it's meaningless and irregular, and sounds wrong — more like function application than part of a name. -p is semantically inappropriate, but it's so common and familiar that it sounds right anyway. So I latch onto it and don't think about the meaning.

I can't be the only one with this problem. I've heard other people call it “one-argument typep”, and someone must have made the same mistake decades ago to give typep its misleading name in the first place. (Was it derived from a predicate? Did its designers have a different interpretation in mind, like “return the type predicate that's true of this value”?) If you are also drawn to this misnomer, or if you know more of its history, I'd like to hear about it.

Followup: How typep got its name

Formal comments and stylistic lag

If you've spent much time programming in industry, you've probably seen comments like this:

/***************************************
 * Function: Frobnicate
 * Parameters:
 *    frob - a pointer to a Frob
 *    options - a pointer to a FrobnicationOptions
 * Returns:
 *    a boolean: true if frobnication succeeded, false otherwise
 * Description:
 *    Frobnicate the given frob.
 * Logic:
 *    1. Check for valid frob
 *    2. Check for valid options
 *    3. Frobnicate the frob
 *    4. Clean up
 ******************************************/
bool Frobnicate(Frob *frob, FrobnicationOptions *options) {
  ...200 lines bearing no resemblance to the “Logic” section above...
}

Such formal comments could in principle contain useful information, but in practice they hardly ever do. Usually the name and parameters simply repeat information from the signature, the return value and description are easily guessable, and the Logic section is useless if not completely wrong. I don't think I've ever seen one that includes the sort of information the maintainer is most likely to want. (Does Frobnicate write to options->frobnication_output? What does it return if the options say to skip this frob? Does it lock frob->mutex, or does the caller have to do that?) Most simply waste space and prevent the reader from seeing other code.

Why are useless formal comments so common? I suppose they're often written by people who want to comment their code but don't understand what information is useful, and who, by following the form, can crank out lots of comments they don't realize are useless. But why do some people not only write them but even advocate them, and include them in style guides?

There is a context where these comments make sense: assembly code. When there are no parameter lists or type declarations or even variable names, the information they convey will have to be included somewhere else, because it's vital to maintenance. When the body of a function consists of many pages of inscrutably low-level instructions, an introductory explanation of the logic can be a big help (although I still prefer having it sprinkled among the code, where it helps with navigation and is more likely to be kept up to date.) So prescriptions to include these things in comments are not completely out of the blue. They're based on what made sense, decades ago, for languages that couldn't express this information in any other way. Naturally, they've become entrenched in some programmers' concepts of good style, and are applied even in languages where they make no sense.

I wonder how many of our widely accepted stylistic rules are similarly out of date or misapplied. The widespread misuse of Hungarian notation qualifies, but that's no longer popular. There are a lot of prescriptions for object-oriented style that make no sense in functional languages, but they're seldom applied there. There are some language-specific archaisms, like writing typedef struct Foo_ { ... } Foo; in C++, but I don't think that's actively promoted except for code that has to also be valid as C.

What stylistic prescriptions are still widely accepted today, even where they don't make sense?

Japanese Lisp, Forth, and historical contingency

Yusuke Shinyama speculates: what would Lisp look like if it had been invented by speakers of a consistently postfix language like Japanese? Might it be written postfix instead of prefix?

Maybe so. But this is superficial. The car of a Lisp form is special because it's the head, not because it's first; as long as there are head and rest operators, it makes no difference whether the head is stored first or last or even in the middle. So while a postfix Lisp looks different, this is only a superficial matter of syntax; Lisp would work the same way if it had been invented in Japan.

Forth, though, might be dramatically affected — not in nature but in timing. Despite its simplicity, Forth appeared rather late: it was developed in the 1960s and not publicized until 1970, which was too late to become part of the cultural foundation of computing. I suspect this was an anomaly; Forth is so simple that it could easily have been discovered earlier, had anyone bothered to explore postfix notation. Speakers of a postfix natural language have an obvious example to encourage them. (Postfix mathematical notation would be an even stronger encouragement. IIUC Japanese arithmetic words are infix, so a Japanese-inspired notation would also be infix; postfix arithmetic could arise more naturally in a natlang where arithmetic operators are postpositions, and postpositional phrases follow their head noun, but this is not a common combination.)

If Forth had been known by the mid-1950s, it could have outcompeted Fortran to become the canonical high-level language. This would have exerted significant pressure on hardware: machines would be designed to run Forth, much as they're designed to run C today, so there would be a lot of stack machines. Since Forth makes such a good assembly language for such machines, there would be less pressure to develop other high-level languages. Programmers accustomed to its simplicity and flexibility and convenience would see all proposed alternatives as unusable and crippled and insanely complex, so other language families could go unexplored. Forth and its descendants might rule computing unchallenged for decades, until some long-delayed development made tree languages competitive.

History could be different. Lisp, Fortran, Algol — all the great old language families — might not exist, if the pioneers of computing had spoken a head-last natural language and found Forth first.

Infixifiers

Haskell allows using any function as an infix operator, without declaring it infix or giving it a nonalphanumeric name: x `mod` 3 is the same as mod x 3, for any mod. I used to think this was a silly extravagance of syntax, but I've come to like it, and to use it frequently in pseudocode. Like the pipe operator, it lets me write operations in a more natural order, and this is important enough to be worth a little syntactic complexity. With a suitably low precedence, it can also save a few parentheses, which is convenient, especially when writing code on paper, where balancing parentheses is hard.

The backquotes still seem odd to me, probably because I confuse them with the output-of-a-command operator in the Bourne shell and Perl. I currently prefer a colon suffix: x mod: 3, like Smalltalk. (However, I also sometimes use that for keyword arguments, as I prefer sort xs key: cost to sort xs :key cost.)

Scala almost does this without requiring an explicit infixifier at all, as it uses mere juxtaposition to mean method call: x mod 3 is x.mod(3) (which happens not to work, since there is no Integer.mod). However, this doesn't work for functions in general, as Scala distinguishes them from methods. And of course it conflicts with using juxtaposition as ordinary prefix function call, which is a far more important and more general (because it allows arbitrary numbers of arguments) construct than infix.

Update 25 March: R also has an infixifier: a %f% b is f(a, b).

Alternating cond

The traditional cond used in most Lisps, including Scheme and CL, has (test . body) pairs:

(define (signum x)
  (cond ((< x 0) -1)
        ((> x 0) 1)
        (else 0)))

That has Lots of Irritating Superfluous Parentheses, of a sort that are particularly annoying because they aren't forms. Arc and Clojure avoid this by alternating tests and bodies:

(defn signum [x]
  (cond (< x 0) -1
        (> x 0) 1
        :else 0))

Which is nice. But: when test and body no longer fit on one line, it becomes hard to indent, because the list structure doesn't match the semantic structure. Indenting normally, according to list structure, makes the body look like another test:

(defn fibonacci [n]
  (cond (not (and (integer? n) (>= n 0)))
        (throw (java.lang.IllegalArgumentException.
                (str "fibonacci's argument must be a nonnegative integer: "
                     (pr-str n))))
        (> n 1) (+ (fibonacci (- n 2)) (fibonacci (dec n)))
        :else n))

I tend to read conds by vertically scanning the conditions, which doesn't work with this indentation. Indenting according to sense solves this, but makes the body look like a subform of the test:

(defn fibonacci [n]
  (cond (not (and (integer? n) (>= n 0)))
          (throw (java.lang.IllegalArgumentException.
                  (str "fibonacci's argument must be a nonnegative integer: "
                       (pr-str n))))
        (> n 1) (+ (fibonacci (- n 2)) (fibonacci (dec n)))
        :else n))

...and your editor will probably “fix” it anyway the next time you autoindent.

I'm tempted to rewrite with nested ifs just to avoid the misleading indentation. Which is madness, because cond is common, and tests and bodies often outgrow one line, and I shouldn't change code structure for formatting reasons. But I don't like being unable to parse my code. The main value of S-expressions is that structure is obvious, and the structure of alternating cond isn't.

The same problem can occur in any expression with alternating subforms, such as Clojure's let, or dictionaries, but in neither case is it as bad as in cond. Variable names are distinctive, and anyway tend to be short, so let is rarely formatted with initforms on their own lines. Dictionaries don't usually have large subforms, and often have distinctive :keywords as their keys, so it's harder to get the keys and values confused. cond is particularly vulnerable (probably along with friends like typecase), because it often has large subforms that look alike but don't work alike.

I think I prefer the old cond, despite the parentheses.

min-key and max-key should take lists

Clojure's min-key and max-key operators are variants of min and max that minimize or maximize some function of their arguments:

user=> (max-key count "Find" "the" "longest" "word")
"longest"

I used min-key a few times recently, and found it awkward every time, because the input arrived in a list, not in several separate expressions. Converting from one to the other is a simple matter of apply, but the resulting expression is not as clear as it ought to be:

user=> (apply min-key count
         (clojure.string/split "Find the shortest word" #" "))
"the"

It seems to me that while min and max are almost always used on separate arguments, min-key and max-key are much more likely to be used on lists. (In general, I think high-order functions are much more likely to be used on lists than their first-order relatives.) Googling clojure min-key supports this: almost all uses are (apply min-key ...), not (min-key ...) — and all of the latter seem to be either artificial examples or mistakes. Even the spelling corrector which was the original motivation for adding max-key uses it with apply. So these functions should really be provided in list form (like Arc's most) rather than separate-argument form.

This is a general problem, though. Any operator that takes an arbitrary number of arguments will probably be used sometimes on lists and sometimes on a few explicit arguments. apply or reduce are the obvious ways to transform one into the other, but they're rather disappointing when you're expecting a clear, convenient list function.

(Functions that aren't useful with only one argument (like min-key) can be overloaded to provide both: (min-key f xs) could operate on the elements of xs, while (min-key f a b) operates on a and b. This is what Python does. This is a potentially susprising irregularity, though ((min-key f a) no longer does what you expect), and it doesn't work for functions that can usefully take one argument. So I don't think it's a good idea.)

Single-user Unix

There are a few basic security rules that every Unix newbie is taught, and one of them is: don't log in as root. So when I went to demonstrate something to a new coworker a while ago, and began by logging in as root, it was an awkward moment.

Was I being reckless? No, I was using Unix as a single-user OS. This was a shared machine, used for testing and little else, and there was no reason to distinguish multiple users; everyone needed root access anyway. Having multiple accounts would have been useless complexity; all we needed was a single-user system. Unix can be one, if you log in as root.

It's not a very good single-user OS, because most of its safety checks are user-based. When everything runs as root, a minor accident can destroy everything on the machine. But this machine had no valuable data on it — at worst, we'd be forced to reinstall everything, so we didn't even bother to use a shared non-root account. Safety just wasn't much of an issue. Security wasn't an issue at all; everyone who used that machine needed to know the root password anyway.

This isn't an unusual situation. Most personal computers have only a single user, and many others are shared among several users with no personal data. The user-based security model was developed for timeshared machines, which are now quite rare. Yet our orthodoxy still says user-based security is necessary, and systems without it are not to be taken seriously. Even after decades in which the most important computers have been personal ones, and even now that single-user smartphones have become the most fashionable platforms, we still judge operating systems by their ability to protect multiple users from each other, not their ability to protect a single user from malicious code. Security, we think, means inter-user security, whether or not that's a good model of the threats we face.

So I feel guilty every time I log in as root. I've been indoctrinated too thoroughly; even though I know it's not really a problem, I feel compelled to make excuses, and to point out that it wasn't me who set up that machine, and I wouldn't have done it that way. But I shouldn't. A single-user system is not necessarily a bad thing, even if it's Unix.