Atomic file replacement and unpredictable primitives

Many programs need to update files atomically, so they don't corrupt them if they crash while writing. The usual primitive for this is an atomic replacement operation like POSIX rename, which allows programs to implement atomic updates by writing to a temporary file and then replacing the real file with it. Typical use is as in this C macro:

#define ATOMIC_WRITE(filevar, path, mode, body)         \
  do {                                                  \
    const char *realpath = path;                        \
    char temppath[PATH_MAX];                            \
    if (snprintf(temppath, PATH_MAX, "%s.temp", realpath) >= PATH_MAX) \
      die("path too long: %s", realpath);               \
    FILE *filevar = fopen(temppath, mode);              \
    if (!filevar)                                       \
      die("unable to write file: %s", temppath);        \
    body                                                \
      fclose(filevar);                                  \
    if (rename(temppath, realpath)) {                   \
      remove(temppath);                                 \
      die("unable to replace file: %s", realpath);      \
    }                                                   \
  } while (0)

...but it's not usually written as a macro, because of a common problem of C: there's no good way for the macro to communicate errors to its caller, or to clean up when the caller has an error. It can be written as three functions — one to generate the temporary name and open the file, and two for successful and unsuccessful close, but this is complex enough that we seldom think of it. Instead we just write the same code over and over with different error handling, and different bugs, each time.

This makes it a good candidate for standard libraries, at least in languages that don't suffer C's error-handling deficiencies. It could be conveniently provided as an open mode (or a separate operation, if your language don't have modes) that writes to a temporary and atomically replaces the file when it's closed.

Common Lisp's :if-exists :supersede option to open sounds like it does this...

The existing file is superseded; that is, a new file with the same name as the old one is created. If possible, the implementation should not destroy the old file until the new stream is closed.

...but the replace-on-close behavior is optional, and not necessarily atomic. :supersede is also the only portable way to request that the file be truncated when opened, so AFAIK no implementation actually gives it a meaning beyond that.

Why is this so hard in Common Lisp?

I initially gave the example in Common Lisp instead of C, so it could handle errors properly. That part is easy, but it's much more complicated for other reasons:

(defun make-temp-pathname (path)
  "Append .temp to the name of a file, before the extension (if any).
Unlike /temp, this keeps it on the same filesystem, so renames will be cheap."
  ;;Simply appending .temp to the namestring doesn't work, because
  ;;operations like rename-file “helpfully” misinterpret it as a file
  ;;type and use it for defaulting, so e.g. (rename-file "a.temp" "b")
  ;;renames a.temp to b.temp.
  (make-pathname :name (format nil "~A.temp" (pathname-name path))
                 :defaults path))

(defmacro with-atomic-output-file ((streamvar pathname) &body body)
  "Execute BODY with STREAMVAR bound to an output stream, like WITH-OPEN-FILE,
but update the file atomically, and only if BODY returns normally."
  (alexandria:with-gensyms (ok? tempfile realfile)
    `(let* ((,ok? nil)
            (,realfile ,pathname)
            (,tempfile (make-temp-pathname ,realfile)))
      (unwind-protect
        (with-open-file (,streamvar ,tempfile :direction :output :if-exists :supersede)
          ,@body
          (setf ,ok? t))
        (if ,ok?
          (rename-file ,tempfile ,realfile #+clisp :if-exists #+clisp :overwrite)
          #-sbcl (delete-file ,tempfile)))))) ;SBCL deletes it automatically and complains that it doesn't exist

It also isn't portable, because Common Lisp doesn't specify that rename-file will replace an existing file. SBCL does, but Clisp doesn't (even on Unix, surprisingly — it goes out of its way to break this) unless it's reassured with :if-exists :overwrite. Also, with-open-file might automatically delete the temporary on abnormal exit, and delete-file might complain if it doesn't exist. These unreliable semantics, together with the perverse conveniences of pathnames, make it harder to write atomic replace portably in CL than in C.

So when you provide access to system primitives like rename, don't change their semantics. Users will not be surprised by the system's native behaviour, and sometimes they need it.

No comments:

Post a Comment

It's OK to comment on old posts.