Arbitrary overloading isn't always confusing

I have always accepted the conventional wisdom about overloading: that while it's very nice for operations that are conceptually identical, overloading two unrelated operations, like + for catenation, is just an invitation to mistakes. But I have to revise this opinion, because there's a conspicuous counterexample in C++. It uses << and >> for both shifting and I/O, and I've never found this confusing. Why?

I suspect what's going on here is that the shifting and I/O operators are seldom used together, or even in similar contexts. I/O code doesn't look anything like bit-crunching code, so opportunities for confusion are few and far between. The problem with + may be that addition and concatenation have a vague semantic similarity (which is what inspired the overloading in the first place), which leads them to appear in similar contexts. (It seems to me that it's much easier to be uncertain about whether some expression is a number or a string than whether it's an integer or a stream.) It probably doesn't help that they're both very common operations found in all sorts of code. And programmers rely on the associativity of both operations, so the non-associativity of the overloaded combination is not just a theoretical problem. But C++'s shift-or-stream operators show that overloading unrelated operations isn't intrinsically confusing. It depends on how they're used.

Not that I'm advocating arbitrary overloading. But maybe we shouldn't be too paranoid about it - which is good because it turns up surprisingly often. Some familiar operations are traditionally overloaded in semantically dubious ways, including good old +. Integer and (fixed-precision) floating-point arithmetic have different mathematical and practical properties, because one rounds and the other doesn't. Theoretically they should be separate, as they are in OCaml, but most languages unify them anyway. Is this a problem? Maybe a little, when someone forgets they're doing floating-point. But for the most part the two (or more) +es coexist fine.

But addition and catenation don't. But shifting and I/O do. What's the pattern here?

The cost of naive scheduling

The other day one of my coworkers expressed alarm at the condition of one of his servers: it was running at 100% CPU usage! Obviously, he said, something was wrong. I resisted the urge to say that what was wrong was that it was a Windows machine. He's a longtime Windows user, after all. And like a lot of Windows users, he thinks of a busy processor as a bad thing, a sign of a machine about to fail. On any other platform, it's a normal situation. Of course there's work to do!

But my coworker was right. Full processor usage is a bad sign on a Windows machine, because it performs remarkably badly under load. Where a loaded Unix machine simply feels slower, a loaded Windows machine is unpredictably unresponsive. Why?

I understand the Windows scheduler is pretty simpleminded - it simply runs the highest-priority available process, and doesn't try very hard to schedule fairly or prevent starvation. The strange thing is that Windows has had this problem for years, even though there are many well-known solutions. Is there some other goal that conflicts with good scheduling?

There seems to be a similar problem with I/O. One process reading a lot of files can starve all the others - even of virtual memory, because their page faults are not given higher priority than ordinary I/O. As it happens, one of the programs I maintain at work is memory-intensive and I/O-bound, so when it's running, other processes' memory gets paged out, and they can wait a long time for it to be paged back in. The result is to make my 2 GHz machine feel slower than my pocket calculator.

My coworker felt the same way about his server. He tried to stop the offending services, but the machine was so unresponsive that his Remote Desktop session was repeatedly disconnected before he could do so. I think he eventually gave up and walked over the server room to hit the power switch. But as it turned out, there was nothing wrong with the machine. Only with the Windows scheduler.

Is there a better name for generic cons?

Any serious programming language has a variety of collections in its library. Most reduce the resulting complexity by overloading the common operations for multiple types. But few include cons among the generic operations. This is partly because it's unnatural for many types - only the functional collections like lists and trees support it efficiently. But it may also be a problem that the operation has no obvious name.

Dylan calls it add, which sounds like arithmetic. Clojure calls it conj, for conjoin, which is unfortunately confusable with the complex conjugate function. C++ has push, but that's implicitly imperative. Other possibilities: extend, augment (or aug?), with, or maybe a more abstract symbol like &. Does anyone have a better idea?

Mmm, tar!

Yossi Kreinin calls it Fun at the Turing tar-pit. It's another form of programming candy: the nastier a language, the more fun it is so solve simple problems in it.

I’ll tell you why it happens. When you write code in a full-featured programming language, clearly you can do a lot of useful things. Because, like, everybody can. So the job has to be quite special to give you satisfaction; if the task is prosaic, and most of them are, there’s little pride you’re going to feel. But if the language is crippled, it’s a whole different matter. “Look, a loop using templates!” Trivial stuff becomes an achievement, which feels good. I like feeling good.

It's just like a game - overcoming obstacles is fun! And a perverse language, like a low-level one, is a great source of obstacles. Artificial, unnecessary obstacles, sure, but that's fine - it means you don't have to feel bad about being defeated by them. And when you do overcome them, you have the reward of getting something done, so it's easy to forget the obstacles were of your own creation. Programming makes a great game!

But it's more than a game, or it should be. There's nothing wrong with solving puzzles for puzzles' sake, but programming has the additional reward of doing something useful. The trouble is that there are many puzzles available in any program, and only some of them make progress toward whatever the program is supposed to do. The "real" problems are not necessarily more fun, and are often harder to appreciate, than the obvious ones created by inadequate tools. This tarry candy may taste good, and provide something to chew on, but it's not very filling, not by itself. Here's a puzzle that matters: how can I, and the many other programmers in the same boat, find the pleasure of more interesting problems, rather than that of solving easy ones over and over?