BSDCan2011 - Final (with audio).5

BSDCan 2011
The Technical BSD Conference

Kristaps Dzonsons
Day Talks - 1 - 2011-05-13
Room Tutorial
Start time 15:00
Duration 01:00
ID 231
Event type Lecture
Track Hacking
Language used for presentation English

How the Cart Came to Draw the Ox: the Roff Tradition and Mandoc

How did it come about that something so simple ended up so complicated? Writing a utility takes little more than a compiler; why does its manual require the cooperation of preprocessors and formatters, and checks for portability?

The answer comes by way of software inertia. What started as a single type-setting language, Roff, became a complex set of simplified macro sets (Mdoc, Man, etc.) variously-compatible across UNIX deployments. Roff evolved, been re-implemented, and was littered with extensions and reductions. The rest is history.

I propose surveying the field of UNIX manual formats -- Roff and its competitors -- from the perspective of Mandoc. Mandoc, once a naive utility accepting (non-idiomatic) Mdoc, has grown to accept more and more of the Roff legacy. I'll be critical of Mandoc's short-comings and demonstrate it's lesser-known strengths, such as CSS-customised HTML output. Finally, consider the system's future, and the future of UNIX manuals in general, by way of lessons of the past.

It may come as a surprise that there's just no better way to write UNIX manuals than as we do today: despite generations of evolution, the classical UNIX manual formats persist.

In fact, the ".TH" line, which begins UNIX manuals in the traditional Man format, predates me. The ".Dt" line, from the "modern" Mdoc format, dates to the dismantling of the Soviet Union. Both of these formats descend from Roff, a type-setting language, which was part of the first release of UNIX. Roff itself derived from Runoff at a time_t less than zero.

Over the years, Mdoc and Man have defied many attempts at replacement: it seems that each generation has mounted a fresh attack with the day's trends. Some gained niche acceptance, but in the end, none surpassed Roff. All the while, its Mdoc and Man formats were being enriched with extensions from the various and sundry Roff deployments and installations. So enriched, in fact, that the formats themselves have become dependant upon the tools built to compile them, which have grown to enormous size. The uncompressed source archive for GNU Troff, "the most common Roff system today", is 15 megabytes for some two-dozen utilities.

In this lecture I'll discuss how Mandoc, a Mdoc/Man compiler, came to handle (a small part of) the mess. Mandoc is at heart a cheater: instead of accommodating for the full Roff language, it cherry-picks only those components specifically required for UNIX manuals. Mandoc is comprised largely of ad hoc workarounds---the personal effects of the forty-year history of Roff---built around a core set of common syntactic behaviours.

Aside from surveying the history and structure of UNIX manuals, then Mandoc, I'll raise some questions about the future. What, given the tools we have and the conditions forced upon us, can we do better? Where have we gone wrong in the past in trying just that?