Eliminating FORMAT from Lisp

Drew McDermott

February 14, 2003\\ Revised August 16, 2004

The FORMAT statement started with Fortran, in the mid-50's. To print an integer and a real, interspersed with explanatory tags, one would write:

     PRINT 101, I, X
101  FORMAT (2HI=, I5, 6W, 2HX=, F6.2)
Here 2HI= is a specification of a Hollerith string; "2H" means the string has two characters ("I" and "="). I5 and F6.2 tell how to print the integer and real: the former has a field width of 5 characters, the latter of 6, with two to the right of the decimal point.

The C language took over the same idea, embodied in the printf function from the C standard I/O library. Instead of a separate FORMAT statement, the information about how to print the data was incorporated into a string argument to printf. So the example above became:

    printf("I=%5d      X=%6.2f\n", i, x);
This is an improvement. (Even IBM eventually got rid of Hollerith strings.)

Early versions of Lisp had builtin I/O procedures such as print, terpri, etc., but they were awkward for any purpose much more complex than printing a single S-expression. At some point someone had the bright idea of importing format into Lisp. The example above became

   (format t "I=~5d      X=~6,2F~%" i x)
which differs only in minor details from the C version.

In my opinion, this was a silly mistake. Lisp is a syntactically extensible language, meaning that it is quite easy, using macros, to create arbitrary language extensions, so long as they obey two basic rules: (1) A new statement must look like (op ...), where "..." has balanced parentheses; (2) the lexical conventions inside the new statement must be Lisp's (e.g., more characters (including '*', '+', and such) are ordinary symbol constituents, in contrast to their role in other languages, so adjacent symbols must be separated by whitespace; double quote starts a string; single quote, sharpsign, and a few other characters have special meanings). If you're used to Lisp, these rules are barely noticeable, so that Lisp hackers come to think of it as having the most flexible syntax in the world.

The format statement takes a completely different approach. format is implemented as a function, whose second argument is a string containing instructions on how to print the remaining arguments. This "format control string" is essentially a little program written in a special "format control language." This language doesn't obey any of Lisp's syntax rules. Over time, the format control language has evolved to the point where it contains conditionals, iteration, and even "goto"s. It even has its own compiler, the formatter function.

If this language were particularly suited to I/O, it might be worth putting up with. But it is unbearably clumsy from the word go. Why should one have to write (format t "x = ~s, y = ~s, z = ~s~%" x y z), and then match up the occurrences of "~s" with the variables to see where x, y, and z are printed? Why not write this instead:

    (out "x = " x 
         ", y = " y 
         ", z = " z :%)
Now it is obvious at what points in the output the three results are to be printed. My alternative format uses the out macro, described in greater detail in the YTools manual. I will describe more features of out as I go along. The arguments to out, for now, are strings that are princ'ed and expressions that are evaluated and printed.

As soon as control structures enter the picture, it becomes even more obvious how much better one can do than format. Suppose we have a structure f1 of type Foo, and we want the print-function for this type to print its label and each element of its contents. Furthermore, if its status is marked :abnormal, we want to print a question mark after the label. At this point the Lisp community had to choose between abandoning format and adding conditionals to it. Unfortunately, they chose to do the latter. Here is what I would like to write:

   (out "#<Foo " (Foo-label f1) 
        (:q ((eq (Foo-status f1) ':abnormal)
             "?"))
        (:e (dolist (x (Foo-contents f1))
               (:o " " x)))
	">")
We use the "guide symbols" :q, :e, and :o to signal that we are writing expressions whose meanings are specific to the out macro, not general-purpose Lisp expressions. These constructs allow us to mix evaluated Lisp forms, such as (Foo-contents f1), with forms in "out mode," such as "?". (:q --clauses--) is like cond, except that after the test in a clause, the remaining expressions are in out mode. So if f1 has abnormal status, a question mark is printed. The :e construct is more general:(:e --exps--) causes the exps to be evaluated and the results discarded, except that any subexpression of the form (:o --outstuff--) returns us to out mode.

The net result for printing f1 is that if f1 has label "tree", is not flagged as abnormal, and has a contents list (eenie meenie minie moe), it would be printed as

   #<Foo tree eeenie meenie minie moe>

Here is a larger example, from Steele's "Common Lisp: The Language" (edition 2). He calls this the "hairiest format control string I have ever seen," although I believe I have seen worse. We have a datatype called a "xapping":
(defstruct 
  (xapping (:print-function print-xapping) 
           (:constructor xap 
             (domain range &optional 
              (default ':unknown defaultp) 
              (infinite (and defaultp :constant)) 
              (exceptions '())))) 
  domain 
  range 
  default 
  (infinite nil :type (member nil :constant :universal) 
  exceptions)

where the print-function is defined thus:

(defun print-xapping (xapping stream depth)
   (declare (ignore depth))
   (format stream
	   ;; Are you ready for this one?
	   "~:[{~;[~]~:{~S~:[->~S~;~*~]~:^ ~}~:[~; ~]~
            ~{~S->~^ ~}~:[~; ~]~[~*~;->~S~;->~*~]~:[}~;]~]"
	   ;; Is that clear?
	   (xectorp xapping)
	   (do ((vp (xectorp xapping))
		(sp (finite-part-is-xetp xapping))
		(d (xapping-domain xapping) (cdr d))
		(r (xapping-range xapping) (cdr r))
		(z '() (cons (list (if vp (car r) (car d)) (or vp sp) (car r)) z)))
	       ((null d) (reverse z)))
	   (and (xapping-domain xapping)
		(or (xapping-exceptions xapping)
		    (xapping-infinite xapping)))
	   (xapping-exceptions xapping)
	   (and (xapping-exceptions xapping) (xapping-infinite xapping))
	   (ecase (xapping-infinite xapping)
	     ((nil) 0)
	     (:constant 1)
	     (:universal 2))
	   (xapping-default xapping)
	   (xectorp xapping))) 
I quote Steele's exegesis of the format control:
Here is a blow-by-blow description of the parts of this format string:
~:[{~;[~] Print ``['' for a xector, and ``{'' otherwise.
~:{~S~:[->~S~;~*~]~:^ ~} Given a list of lists, print the pairs. Each sublist has three elements: the index (or the value if we're printing a xector); a flag that is true for either a xector or xet (in which case no arrow is printed); and the value. Note the use of ~:{ to iterate, and the use of ~:^ to avoid printing a separating space after the final pair (or at all, if there are no pairs).
~:[~; ~] If there were pairs and there are exceptions or an infinite part, print a separating space.
~ Do nothing. This merely allows the format control string to be broken across two lines.
~{~S->~^ ~} Given a list of exception indices, print them. Note the use of ~{ to iterate, and the use of ~^ to avoid printing a separating space after the final exception (or at all, if there are no exceptions).
~:[~; ~] If there were exceptions and there is an infinite part, print a separating space.
~[~*~;->~S~;->~*~] Use ~[ to choose one of three cases for printing the infinite part.
~:[}~;]~] Print ``]'' for a xector, and ``}'' otherwise.

Folks, you don't have to put up with this nonsense. Here is the civilized way to write the print-function.

(defun print-xapping (xapping stream depth)
   (declare (ignore depth))
   (out (:to stream)
      ;; Print ``['' for a xector, and ``{'' otherwise. 
      (:q ((xectorp xapping) "[")
	  (t "{"))

      ;; Print the pairs implied by the xapping.
      ;; Whether the element to the left of the arrow comes from
      ;; the list 'd' or the list 'r' depends on whether the
      ;; xapping is a xector.  An arrow is printed only if
      ;; xapping is not a xector or a xet.  The element to the
      ;; right of the arrow always comes from 'r'.
      ;; Each pair is followed by a space, except the last.
      (:e (do ((vp (xectorp xapping))
	       (sp (finite-part-is-xetp xapping))
	       (d (xapping-domain xapping) (cdr d))
	       (r (xapping-range xapping) (cdr r)))
	      ((null d))
	     (:o (if vp (car r) (car d))
		 (:q ((not (or vp sp)) "->"))
		 (car r)
		 (:q ((not (null (cdr d))) " ")))))

      ;; If there were pairs and there are exceptions or an infinite part,
      ;; print a separating space. 
      (:q ((and (xapping-domain xapping)
		(or (xapping-exceptions xapping)
		    (xapping-infinite xapping)))
	   " "))

      ;; Given a list of exception indices, print them.
      (:e (do ((el (xapping-exceptions xapping) (cdr el)))
	      ((null el))
	     (:o (car el)
		 (:q ((not (null (cdr el))) " ")))))

      ;; If there were exceptions and there is an infinite part,
      ;; print a separating space.
      (:q ((and (xapping-exceptions xapping) (xapping-infinite xapping))
	   " "))

      ;; The infinite part is omitted if nil, printed as "->k" if it's a
      ;; constant k, and printed as "->" if it's "universal"
     (:e (ecase (xapping-infinite xapping)
	    ((nil))
	    (:constant (:o "->" (xapping-default xapping)))
	    (:universal (:o "->"))))

     ;; Print ``]'' for a xector, and ``}'' otherwise. 
     (:q ((xectorp xapping) "]")
	 (t "}"))))

Note that Steele's comments now have a place to go. Of course, now the comments can be about content, not the grubby details of the control string. Furthermore, we no longer have to squeeze the output data into a form intelligible to format, because we can use any Lisp control structure we like. This especially applies to the code for printing the arrow-separated pairs and the "infinite part" of the xapping.

Irritatingly, format control strings are used in more than one Lisp construct. The macros error, cerror, and break use them to control printout. So to eliminate all uses of format control strings we must provide new versions of these macros.

Acknowledgements: The out macro I've given examples of is a descendant of the MSG macro described in

Eugene Charniak, Christopher Riesbeck, Drew McDermott and James Meehan 1987 {\it Artificial Intelligence Programming, {\rm (2nd edition)}}. Lawrence Erlbaum Associates,
and incorporated into Nisp, although I believe it was originally created at UC Irvine by Meehan and colleagues in the early 1980s.