Mandoc in OpenBSD

Training a foal to replace a venerable workhorse

Ingo Schwarze
<schwarze@openbsd.org>

BSDCan 2011
Ottawa, May 13

Csik - Foal. - Photo: Adam Tomk @flickr (CC)


Ingo Schwarze: Mandoc in OpenBSD - page 2: INTRO I - BSDCan 2011, May 13, Ottawa

What for?

Formatting man(1) pages.

→ See also the companion talk by Kristaps Dzonsons, "The roff tradition and mandoc".

$ mandoc /usr/src/share/man/man7/mdoc.7
$ man mdoc

Both commands produce:

$ man mdoc
MDOC(7) OpenBSD Reference Manual MDOC(7) NAME mdoc - mdoc language reference DESCRIPTION The mdoc language is used to format BSD UNIX manuals. This reference document describes its syntax, structure, and usage. The reference implementation is mandoc(1); the COMPATIBILITY section describes compatibility with other troff -mdoc implementations.

Make documentation easily accessible:

Thus, manuals are very important in OpenBSD.


Ingo Schwarze: Mandoc in OpenBSD - page 3: INTRO II - BSDCan 2011, May 13, Ottawa

What from?

→ See also Kristaps' talk.

mdoc(7) or man(7) source files - plus roff(7), tbl(7), ...

$ less /usr/src/share/man/man7/mdoc.7
.Dd $Mdocdate: April 17 2011 $          \" rcs(1) style Mdocdate ID
.Dt MDOC 7                              \" prologue: meta information
.Os
.Sh NAME
.Nm mdoc                                \" one word document title
.Nd mdoc language reference             \" one line document description
.Sh DESCRIPTION                         \" section structure
The                                     \" free text
.Nm mdoc                                \" logical markup
language is used to format
.Bx
.Ux
manuals.
This reference document describes its syntax, structure, and
usage.
The reference implementation is
.Xr mandoc 1 ;                          \" cross references to other manuals
the
.Sx COMPATIBILITY                       \" cross references inside one manual
section describes compatibility with other troff \-mdoc implementations.
.Pp
An
.Nm
document follows simple rules: lines beginning with the control
character
.Sq \.                                  \" block enclosures
are parsed for macros.
Other lines are interpreted within the scope of
prior macros:
.Bd -literal -offset indent             \" display blocks
\&.Sh Macro lines change control state. \" escaping
Other lines are interpreted within the current state.
.Ed

Ingo Schwarze: Mandoc in OpenBSD - page 4: INTRO III - BSDCan 2011, May 13, Ottawa

Table of contents


Ingo Schwarze: Mandoc in OpenBSD - page 5: Apr to Dec 2009 - BSDCan 2011, May 13, Ottawa

Grazing the foal's pasture

Half a year free of serious responsibilities.

Related timeline:

2008 Nov 22: first commit to mdocml.bsd.lv by kristaps@
2009 Mar 27: first direct commit by schwarze@ to OpenBSD (not mandoc)
2009 Apr 06: mandoc imported into OpenBSD by kristaps@
2009 Apr 15: first help from another OpenBsd developer (miod@)
2009 May 23: schwarze@ first talks to jmc@ about mandoc
2009 May 31: at c2k9 in Edmonton, schwarze@ talks to deraadt@
2009 Jun 09: kristaps@ agrees to work closely together
2009 Jun 14: merge to OpenBSD started by schwarze@
2009 Jun 15: first patches merged back from OpenBSD to bsd.lv
2009 Jun 21: mandoc usable in OpenBSD and in sync with bsd.lv
2009 Jun 23: bugfixing in OpenBSD started by schwarze@
2009 Jul 05: OpenBSD 4.6 release rolled without mandoc
2009 Jul 12: joerg@ sends his first patch from NetBSD
2009 Jul 18: uqs@ sends his first patch from FreeBSD
2009 Oct 27: start src/regress/usr.bin/mandoc
2010 Jan 02: start of systematic integration

Ingo Schwarze: Mandoc in OpenBSD - page 6: TRAIN I - Jan and Feb 2010 - BSDCan 2011, May 13, Ottawa

See the world!

Let mandoc build the whole tree.

Systematically investigate all fatal issues.

Most workarounds got fixed later:

Related timeline:

2010 Jan 02: first patches to mdoc(7) manuals to fix the build with mandoc
"Fine. Even if mandoc goes nowhere, it has found some bugs. ;)" jmc@
2010 Feb 17: first patch to a man(7) manual in order to fix the build
2010 Feb 20: found first manual bug caused by DocBook
2010 Feb 24: first non-fatal manual fix found by -Tlint
2010 Feb 25: tree now builds with mandoc
2010 Mar 18: OpenBSD 4.7 release rolled without mandoc

Ingo Schwarze: Mandoc in OpenBSD - page 7: TRAIN II - Feb 2010 - BSDCan 2011, May 13, Ottawa

Block nesting

An example of nice mandoc design

→ See also Kristaps' talk.

Example manual snippet using nested blocks:

$ mandoc chgrp.1
SYNOPSIS
     chgrp [-fh] [-R [-H | -L | -P]] group file ...

The corresponding mdoc(7) source code:

$ less chgrp.1
[...]
.Sh SYNOPSIS
.Nm chgrp
.Op Fl fh
.Oo
.Fl R
.Op Fl H | L | P
.Oc
.Ar group
.Ar
[...]

Mandoc respresents this by this syntax tree:

$ mandoc -Ttree chgrp.1   # much simplified
    Sh (block) SYNOPSIS
        Nm (block) chgrp
            Op (block)
                Fl (elem) fh
            Oo (block)
                Fl (elem) R
                Op (block)
                    Fl (elem) H
                    (text) |
                    Fl (elem) L
                    (text) |
                    Fl (elem) P
            Ar (elem) group
            Ar (elem) file ...

Traditional roff/mdoc design: no block structure!

low level: roff requests provide

high level: mdoc macros


Ingo Schwarze: Mandoc in OpenBSD - page 8: TRAIN III - Feb to Jun 2010 - BSDCan 2011, May 13, Ottawa

Blood, sweat, and tears

Badly nested blocks and the Xo macro

Explicit blocks, badly nested:

.Ao ao
.Bo bo
.No ac Ac
.No bc Bc

<ao [bo ac> bc]

Implicit block broken by explicit block:

.Aq aq Bo bo eol
.No bc Bc

<aq [bo eol> bc]

The most important case in practice: .It Xo

$ less find.1
[...]
.It Xo
.Ic -exec Ar utility
.Op argument ...
.No ;
.Xc

-exec utility [argument ...] ;

Our first thought: deprecate this abomination!

All mdoc(7) manuals build at this point!

Related timeline:

2010 Feb 23 remove .Oo .Xo .Oc .Xc mis-nesting from manuals (questionable)
2010 Feb 26 support .It Xo (good)
2010 Jun 29 support badly nested blocks in general (even better)

Ingo Schwarze: Mandoc in OpenBSD - page 9: TRAIN IV - Mar 2010 - BSDCan 2011, May 13, Ottawa

Hey, aren't you cheating?

Explicit pod2man(1) preamble support.

Example code from the standard pod2man preamble:

$ less perl.1
[...]
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'

This uses low-level roff requests:

No chance to get all that implemented quickly.

Quick and dirty solution:

Now all man(7) manuals build as well!

No fixes in man(7) manuals, just workarounds in mandoc.

Related timeline:

2010 Mar 01 implement pod2man(1) pseudo-macros
2010 Sep 20 mandoc can now handle the standard pod2man preamble
2010 Nov 28 remove the pod2man(1) pseudo-macros

Ingo Schwarze: Mandoc in OpenBSD - page 10: TRAIN V - conclusions - BSDCan 2011, May 13, Ottawa

Don't be lazy.

What went wrong while getting the tree to build?

Lessons learnt:

New Forest Foal. - Photo: Krista van der Voorden <Kris*V*@flickr> (CC)


Ingo Schwarze: Mandoc in OpenBSD - page 11: DRAW I - Apr 2010 - BSDCan 2011, May 13, Ottawa

What are you waiting for?

Switch over the tree as soon as possible.

Related timeline:

2010 Mar 01: mandoc ready to build the tree
2010 Mar 18: OpenBSD 4.7 release rolled without mandoc
2010 Mar 20: Xenocara can now build with mandoc as well
2010 Apr 02: link mandoc to the OpenBSD build
2010 Apr 03: switch base build to mandoc, excepting tbl pages
2010 Apr 03: fix all fatal issues where mandoc kills ports builds
2010 Apr 04: first port maintainer explicitely switches to mandoc
2010 Apr 04: fix first mandoc bug that was found by ports usage
2010 Apr 05: espie@ implements USE_GROFF framework and groff-1.15 port
2010 Apr 07: first major merge from bsd.lv after the switch
2010 Apr 08: first mandoc bugfix found in ports propagates upstream
2010 Aug 12: OpenBSD 4.8 release rolled with mandoc

Photo: tiny_package @flickr (CC)


Ingo Schwarze: Mandoc in OpenBSD - page 12: DRAW II - Apr and May 2010 - BSDCan 2011, May 13, Ottawa

If ifs & ands were docs & mans...

No way around some low-level roff requests.

Related timeline:

2010 Apr 25: implement roff conditional instructions, man(7) only
2010 May 15: mandoc roff library started by kristaps@
2010 May 19: the roff library replaces my preliminary support for conditionals
2010 Jul 03: rudimentary implementation of user-defined strings
2010 Sep 22: interesting commit message: no hope for .de
2010 Nov 25: implement the .de (define macro) roff instruction

Ingo Schwarze: Mandoc in OpenBSD - page 13: DRAW III - May 2010 - BSDCan 2011, May 13, Ottawa

Desperation lead to success:

How an afterthought improved the design.

Original mandoc main program:

Modified mandoc main program:

Paradigmatic switch:

roff: expand high-level macros into low-level requests

mandoc: first handle low-level requests (preprocessor)

→ See Kristaps' talk for applications


Ingo Schwarze: Mandoc in OpenBSD - page 14: DRAW IV - May 2010 - BSDCan 2011, May 13, Ottawa

.if .ds .de

Examples of roff macros we need.

Very young zebra. - Photo: Tambako @flickr (CC)


Ingo Schwarze: Mandoc in OpenBSD - page 15: DRAW V - Jun and Jul 2010 - BSDCan 2011, May 13, Ottawa

Invented here.

Home-grown non-standard features even in OpenBSD.

Almost all related to SYNOPSIS formatting:

Quite some work to get this right, for features of no theoretical importance that are not even documented.

Similar stuff would probably come up in other systems, when switching them over.

Related timeline:

2010 Jun 25 schwarze@ at the c2k10 hackathon in Edmonton
2010 Jun 26 basic implementation of .Bk/.Ek
2010 Jun 27 full .nr nS support, unbreaking the kernel manuals
2010 Jul 01 improve .Nm indentation in the SYNOPSIS

Ingo Schwarze: Mandoc in OpenBSD - page 16: DRAW VI - Summer 2010 - BSDCan 2011, May 13, Ottawa

Count your beans.

Automatic comparisons as part of the build system.

Iterative process, several cycles:

In parallel, tweak groff to reduce noise in comparisons:

A text file in CVS was the ideal bug tracker:

Related timeline:

2010 Aug 15 systematic bug hunting in /bin and /sbin
2011 Jan 23 systematic bug hunting in /usr/bin

Ingo Schwarze: Mandoc in OpenBSD - page 17: DIGRESSIONS - BSDCan 2011, May 13, Ottawa

All work and no play...

Nice features along the way

Critical points:

However, fancy output is fun.

Related timeline:

2009 Oct 21 HTML and XHTML output modes by kristaps@
2009 Oct 24 hyphenation patch by schwarze@ (still unused)
2010 Mar 01 proper inter-sentence spacing for mdoc(7)
2010 Mar 05 at the end of lines, split words at existing hyphens
2010 Apr 22 proper tab handling
2010 Jun 10 PostScript output mode started by kristaps@
2010 Jul 31 PDF output mode completed by kristaps@
2010 Sep 09 use mandoc instead of groff to build PostScript manuals

Foals Just Wanna Have Fun. - Photo: Gary Tanner <gazzat@flickr> (CC)


Ingo Schwarze: Mandoc in OpenBSD - page 18: REMOVE I - early Oct 2010 - BSDCan 2011, May 13, Ottawa

Only a dozen pages...

... are using tbl, the last obstacle to groff removal.

Related timeline:

2010 May: stand-alone implementation of tbl started by kristaps@
2010 Aug 12: OpenBSD 4.8 release rolled with mandoc
2010 Oct 15: import tbl parser and renderer written by kristaps@
2010 Oct 17: build tbl(1) pages with mandoc(1), not groff
2010 Oct 18: disconnect groff from the base build
2010 Oct 18: "I absolutely don't intend to merge tbl into mandoc" kristaps@
2011 Jan 04: clean tbl integration by kristaps, remove mine

Ingo Schwarze: Mandoc in OpenBSD - page 19: REMOVE II - late Oct 2010 - BSDCan 2011, May 13, Ottawa

Not scared by one-way streets!

Groff emigrating to ports land.

Related timeline:

2010 Aug 12: OpenBSD 4.8 release rolled with mandoc
2010 Oct 18: disconnect groff from the base build
2010 Oct 19: switch default /etc/man.conf to mandoc
2010 Oct 23: schwarze@ at p2k10 hackathon in Budapest
2010 Oct 26: support .so (low-level roff "switch source file")
2010 Oct 27: OpenBSD ports FAQ section about mandoc and groff
2010 Oct 29: landry@ performs the first major USE_GROFF removal
2010 Oct 29: millert@ removes colcrt(1), checknr(1), soelim(1)
2011 Feb 07: use mandoc in Xenocara Imake builds
2011 Mar 02: OpenBSD 4.9 release rolled without groff
2011 Mar 12: cvs rm groff

Cart. - Photo: garycycles7 @flickr (CC)


Ingo Schwarze: Mandoc in OpenBSD - page 20: CONCLUSIONS I - BSDCan 2011, May 13, Ottawa

Done!

Reached the stage: It just works.

To get there:

Other systems can now do the same, if they want.

Just mail us!


Ingo Schwarze: Mandoc in OpenBSD - page 21: RECURRING I - BSDCan 2011, May 13, Ottawa

Show what's relevant, not more.

Why it is critical to get errors and warnings right.

Related timeline:

2009 Jul 12: fewer knobs: remove -Wsyntax -Wcompat
2010 May 13: fewer knobs: remove -fno-ign-chars
2010 May 23: unified error and warning system by kristaps@
2010 Aug 19: simple, consistent user interface for error handling
2010 Oct 24: do not throw fatal errors when there is no need to
2010 Oct 26: downgrade nearly 20 errors to warnings
2011 Jan 16: downgrade yet another bunch of fatal errors
2011 Jan 22: check argument count validation for all in_line() macros

Ingo Schwarze: Mandoc in OpenBSD - page 22: RECURRING II - BSDCan 2011, May 13, Ottawa

Bogue dj vue:

Collecting regression tests.

Related timeline:

2009 Oct 27: start src/regress/usr.bin/mandoc
2010 Jun 30: major update of the mandoc test suite
2010 Jul 01: enable mandoc regression tests; ok phessler@
2010 Dec 04: major additions to the regression suite
2011 Feb 05: commit many regression tests found in my trees

Horse Fly Portrait. - Photo: Jeff Burcher <VonShawn@flickr> (CC)


Ingo Schwarze: Mandoc in OpenBSD - page 23: RECURRING III - BSDCan 2011, May 13, Ottawa

Bad patches triggering good ones

Preliminary code put in and ripped out again.

That may seem inefficient, but actually it's a perfectly sane approach:

Related timeline:

This approach got used in at least five cases:

2010 Mar 01 - May 14: end of sentence detection
2010 Apr 25 - May 19: roff conditionals
2010 Mar 01 - Sep 20: pod2man preamble
2010 Jun 16 - Jun 27: roff registers
2010 Oct 15 - Jan 04: tbl integration

Ingo Schwarze: Mandoc in OpenBSD - page 24: CONCLUSIONS II - BSDCan 2011, May 13, Ottawa

Newly designed.

Clean implementation of dirty languages.

Shun complexity!

This is a third reason why simplicity is key:
1. correctness and 2. security are well known.

But flexibility is important as well!


Ingo Schwarze: Mandoc in OpenBSD - page 25: CONCLUSIONS III - BSDCan 2011, May 13, Ottawa

Move fast!

How a replacement project can succeed.

And keep the balance:


Ingo Schwarze: Mandoc in OpenBSD - page 26: FUTURE I - Mar and Apr 2011 - BSDCan 2011, May 13, Ottawa

The future is already past.

When i proposed this talk, groff-1.20 was a plan.

We now provide a groff-1.21 port.

According to the ChangeLog, groff-1.21 has:

Related timeline:

2011 Mar 19: update ports groff from 1.15 to 1.21
2011 Mar 20: rudimentary eqn support by kristaps@
2011 Apr 24: tweak mandoc to conform to newest groff habits
2011 Apr 26: fixed groff-1.21 invocation for Imake ports

Ingo Schwarze: Mandoc in OpenBSD - page 27: FUTURE II - BSDCan 2011, May 13, Ottawa

Not done...

Possible future directions.

Related timeline:

2011 Jul 02: p2k11 - worldwide main hackathon in Edmonton
2011 Sep 16: s2k11 - European general hackathon in Ljubljana

Ingo Schwarze: Mandoc in OpenBSD - page 28 - BSDCan 2011, May 13, Ottawa

Thanks!

For important bug reports and discussions:

David Coppa - Edd Barrett - Giovanni Bechis - Gleydson Soares - Ian Darwin - Janne Johansson - Jasper Lievisse Adriaanse - Kurt Miller - Matthew Dempsky - Matthias Kilian - Nicholas Marriott - Paul Irofti - Philipp Guenther - Remi Pointel - Ted Unangst - Thordur Bjornsson (all OpenBSD)

Anthony J. Bentley - Chris Bennett - Frantisek Holop - Igor Zinovik - James Jerkins - Maxim Belooussov - Mikolaj Kucharski - Nicolas Joly - Pascal Stumpf - Ryan Flannery - Tim van der Molen - Tristan Le Guern - Yuri Pankov

... and probably some i have forgotten (sorry for that)

For images under CC licenses:

http://www.flickr.com/photos/tomkoadam/4778126822/
http://www.flickr.com/photos/kristavandervoorden/4737488285/
http://www.flickr.com/photos/tiny_packages/5045038219/
http://www.flickr.com/photos/tambako/3578468294/
http://www.flickr.com/photos/gazzat/3495392530/
http://www.flickr.com/photos/garycycles7/5658207855/
http://www.flickr.com/photos/jimmy-jay/4672901414/
http://www.flickr.com/photos/66176388@N00/3436935367/

A Youngster on the Quantocks. - Photo: Mark Robinson <me'nthedogs'@flickr> (CC)