raspberry-pi GNU groff: Powerful document formatting in a small package

Learn about groff's history and development in this interview with one of its developers.

The first Unix systems introduced a small but powerful document formatting system called "roff." Unix 2nd Edition updated this with nroff, or "new roff." Later Unix versions supported phototypesetter devices with troff ("typesetter roff") and ditroff ("device independent troff"). 

In the late 1980s, the GNU Project started work on groff, the GNU version of roff. Thirty years later, groff can generate output in a variety of formats, including HTML, Postscript, and PDF. I interviewed G. Branden Robinson from the GNU groff project about the latest developments in groff.

Tell us about your work in groff

I'm a software engineer with a graying beard who grew up using 8-bit Z80 and 6809 microcomputers. I discovered Unix and then the Linux kernel in the 1990s, found like-minded colleagues in the Debian Project, and took up maintenance of its XFree86 (X Window System) packages. I was elected Debian Project leader in 2005, and assembled a team to replace myself as the XFree86 package maintainer. After I tried to manage the cognitively dissonant task of leadership of an anarchist collective with yet another team, I faded from the project into family and professional life.

In 2017, I decided to get involved again with developing Free Software and making people's computing experience more satisfying.

Since 2020, I've been responsible for most of the GNU groff development in a quantitative sense, serving as lead developer and deputy maintainer. My emphasis is on writing and improving documentation, adding automated tests, fixing bugs, shepherding modest extensions and reforms to groff's suite of interlocking interfaces, and trying to make the system more comprehensible to users and developers from the source code level all the way up to the "10,000-foot view." You can see that in my huge commit count, visible in the groff 1.23.0 release notes.

How did groff get to where it is today?

The early history of groff is less clear than I would like it to be. The best resources I know of for the context of groff's origins are a 2006 email to the World Wide Web Consortium's "public-grddl-wg" email list and a 1996 post to the USENET newsgroup comp.text by Nils-Peter Nelson, a Bell Labs alum perhaps best known for developing the original C library's string-handling functions.

As I understand the history, the Free Software Foundation sponsored James Clark for a time, starting in around 1989, to develop a from-scratch replacement for Unix troff, which was then proprietary software. Clark was familiar with the documentation of the troff in SunOS 4.0.

There was also a conceptual cross-pollination between groff and sqtroff, a product of a Canadian company named SoftQuad. sqtroff was produced under license from AT&T and descended from AT&T's Documenter's Workbench (DWB) 2.0 product. AT&T continued to maintain DWB independently (as did Brian Kernighan in yet another line of troff descent) so sqtroff could be viewed as a fork.

SoftQuad had written a reportedly award-winning manual for its troff, which might have motivated Clark to reimplement many of SoftQuad's extensions to Unix troff in groff. I surmise that this aim was well in keeping with the GNU Coding Standards document's exhortation to avoid arbitrary limits. Unix troff, even after it was significantly refactored by Brian Kernighan in about 1980 to implement device-independent troff, or ditroff, still had many of those.

James Clark stayed on as GNU maintainer of groff until about 1995. After he released groff 1.10, he quietly retired and (as far as I know from personal attempts to reach him) does not respond to emails about groff. I'd love to interview him about it myself, if I could. Notwithstanding my commit messages' occasional gripes about the existing implementation, I regard his development of groff as an enviable achievement. I also think he was courageous to select C++ as an implementation language well before it was standardized, given the volatility of its development and erratic quality of vendors' compilers for it.

After Clark stepped away, groff development was quiet for a few years. In 1999, Werner Lemberg stepped in and proved no less capable a developer, sparking a new flurry of activity. Entries in groff's NEWS file from release 1.12 through 1.22.3 show how much he, with some important supporting players, advanced groff in virtually every respect.

Is the name pronounced like "gee-roff" or as one word, "groff"?

I think of "groff" as "giraffe" pronounced with a ludicrous accent. When I started contributing, I pronounced it as a one-syllable "groff." But as I speak to more people about the project and my work in it, I've found I've adjusted my pronunciation to match others.

Gavin Freeborn's YouTube channel hosts several videos featuring groff, and he uses the single-syllable "groff" pronunciation as well. I'm not complaining. I appreciate him for steering users our way!

How compatible is GNU groff with original Unix nroff, troff, and ditroff? Is there a standard?

This is a hard question to answer. No version of *roff has ever been specified, either by a standards body like the Austin Group (which produces POSIX) nor with what I consider proper formality. The Bell Labs document CSTR #54, in its last revision by Kernighan in 1992, is the closest thing to a "spec" for Unix troff that exists. It's good, but not without errors; we've documented several corrections in the GNU Troff manual (scroll down to "CSTR #54 errata").

Ossanna's troff (the original Unix *roff) and Kernighan's ditroff ("device-independent" troff) are not perfectly compatible with each other. But they're pretty close for the features they have in common. GNU troff (the formatter program) is highly compatible with Kernighan's ditroff when called with the -C command line option to make it operate in "AT&T compatibility mode."

Even without the -C option, it is not difficult to write troff input that is portable to both implementations, and the GNU Troff manual offers many examples of this in cases where we aren't illustrating GNU extensions. Generally, and extensions aside, the more "readable" your troff input is, the less you show off your cleverness by embedding invisible control characters or pretending that your space bar is broken, the more portable your document will be.

Do you use any troff or groff documents as a "test suite"?

Yes, but not in a rigorous way. On the groff mailing list, I've called multiple times for submissions of documents we might use as a test corpus. So far this has not been fruitful. I surmise that much of the problem is due to licensing; people who have written *roff documents have often done so as works for hire. Even if they'd like to contribute their documents, they don't have legal control to do so.

A similar misplaced fear might discourage other people. Thanks to the persistent and incorrect belief that the Free Software Foundation has a zero-tolerance policy for materials in its projects for which it doesn't administer the copyrights, some folks might assume that they'd have to sign over their rights to the FSF, and doing so is not worth it to them. But there is already such material in groff. Our LICENSES file discusses this. I'd be happy to collect such documents in our "contrib" directory.

I take two approaches to establishing compatibility with historical *roff:

First, I craft a small input file that exercises a feature of interest. Then I compare how several different implementations handle it: groff, Version 7 Unix troff and nroff (running on a SIMH PDP 11/45), DWB 3.3 troff, Heirloom Doctools troff, and mandoc for man pages. I often have reason to check groff against its own history, to determine if we have regressed something; I keep a few versions handy for that reason. I less often investigate the behavior of neatroff and Plan 9 from User Space troff, but I have them around.

Second, I do use some historical documents to measure groff's behavior. I check our memorandum macro package (mm) with examples from the DWB 3.3 mm manual and a couple of documents from historical BSD. This drove several bug fixes in groff 1.23.0 and more have landed in our Git repository since. Also, I specifically selected the Kernighan & Cherry paper "Typesetting Mathematics - User's Guide (Second Edition)," better known simply as the original eqn manual, as a basis for comparison. Apart from the fixes and changes that effort drove to our ms macro package, I documented the fruits of this work in a GitHub repository.

How does the user community use groff? What's the most common use of groff?

We hear from users regularly, several times a month, either via queries to the groff mailing list or via bug reports submitted to the Savannah issue tracker.

It's no secret that online manual or "man" pages are the tail that wags our typesetting dog. It was my frustration with the quality of man pages - the imprecision of documentation about writing them well, and the vagueness of the package's specification in its man page - that led me into groff development in the first place.

In Debian, judging by installations as counted by its opt-in popularity contest system, something like eight times as many people use groff for man page rendering alone as employ the full power of the typesetting system.

Beyond that, it is hard for me to answer; I don't solicit testimonials. My insights into use cases arise mostly due to bug reports. And when addressing those, the focus is on minimal reproducible test cases to manage the effort to resolve the issues.

What's your favorite macro package in groff? Which macro set do you use the most?

I've come to appreciate each macro set that I've learned, and I've worked on all of those that groff ships except for mom(7), which is stewarded admirably by Peter Schaffter. mom(7) certainly produces nicer output than any of the others; that's a benefit of having a trained typographer instead of a Unix nerd drive macro package development.

If I have a favorite, it's man(7). That's the one I most want to improve and in which bugs alarm me most. I want software engineers to write high-quality documentation and I want the man package to help them do it.

I've pitched more reforms and innovations to the man(7) package than to any of the others, in spite of the considerable inertia of its user base. Time will tell if I'm tilting at windmills.

What kinds of documents do you write with groff?

I haven't done anything I consider visually arresting; my graphic design skills are nearly nonexistent.

For groff 1.23, I spent substantial effort resurrecting Larry Kollar's 20 year old document introducing the groff version of the ms(7) package. I expanded it in practically every aspect, and got encouraging feedback on the mailing list. The package is small and simple enough that one can document it in about 26 pages - comprehensively in terms of technical detail and copious examples.

In Git, as a post-1.23.0 development, I've also added an example of pic(1) usage with ms(7). The diagram is not complex or impressive and is not intended to be. My objective, pedagogically, is to show people brief but non-trivial examples alongside the inputs that create them so that they can take them and progressively refine them toward their desired goal.

One trick I was pleased to pull off was in support of Deri James's new "sboxes" package. We thought it would be a good idea if the document describing it could include its own source code as an example. At the same time I'm a fervent adherent of the DRY (Don't Repeat Yourself) philosophy. Human-maintained copies of things get out of sync. A bit of make(1) and sed(1) got the job done, including a magic footnote that expands differently depending on whether it appears in the narrative part of the document or the quoted code.

What tools do you use to write groff documents? Do you use specific tools to find errors in your documents?

I'm a basic boy: vim(1) and make(1).

I'm a big believer in -w w. I'd like to make it groff's default, but doing so might attract torches and pitchforks in my direction.

I did add a sort of linter to our man(7) package. It's not documented because it's not designed properly; it is an ad hoc collection of validation checks that I needed to keep myself from making boneheaded mistakes in groff's own man pages. You use it by passing -rCHECKSTYLE=n to groff(1), where n is a small integer. Make it larger for more persnickety diagnostics.

The Linux man-pages project maintainer, Alex Colomar, uses it too, and on a much larger corpus of documents than groff provides. Alex proved helpful in the groff 1.23.0 development cycle with respect to man(7) and tbl(1) feature development and bug squashing; he has been attentive and demanding in the best ways.

I can't let this question pass without noting that I fixed a 30-year-old bug in groff's backtrace feature, one which sadly rendered it nearly useless when used with macro packages, which is when people most need it. If you use -w w, try -b (backtrace) as well.

GNU groff just passed its 33 year anniversary (June 1990). What are some interesting milestones from groff history?

A handful of milestones occur to me:

1. In November 1991, Jörgen Hägg developed our implementation of the memorandum macros; see groff_mm(7). I don't think many people were still using the mv ("viewgraph") macro package or van Wyk's ideal(1) preprocessor by that time, so Jörgen's contribution made groff a practical replacement for probably at least a few "nines" of AT&T troff usage (that is, 99.9% or more of them, keeping in mind the outsized influence of man(7) and even mdoc(7), once the latter was developed).

2. In 1999, Gaius Mulley and Werner Lemberg started development of the grohtml(1) output driver. It has come in for some criticism, and certainly added some complexity to the formatter given the major differences in approach that *roffs and HTML take to document structure. Nevertheless it was necessary. A document generator to this day simply must have an HTML story, and their work gave groff one.

3. In 2011, Deri James contributed the gropdf(1) output driver, which was innovative in a couple of ways: First, it gave groff "a PDF story," which we needed even more than HTML. While we could generate PostScript and then convert it to PDF, I don't know of any other PDF generation tool chain that works that way or regards it as optimal. Second, gropdf is written in Perl, not C++. This has forced us to think more carefully about what the interface actually is between GNU troff, the formatter program, and the output drivers. Beyond that, Deri is a talented visual designer and his sensibilities help us retain our focus on mission. We don't want to merely ape what AT&T troff did; we want to exceed it in capability and quality, empowering people to solve problems today. His knowledge of PDF and the possibilities it affords have helped me to see avenues for enhancement.

We just saw the release of GNU groff 1.23.0 (July 7). What new features will help technical writers who use groff?

We have a release announcement that covers the new features, great and small. Speaking only for myself, I'm proud of quality-of-life improvements to the everyday man page reading experience. For many years, some terminal emulators like gnome-terminal have pattern-matched their inputs to contrive hyperlinks to man pages. But because the hyperlinks weren't really there, the emulators would make silly ones for inputs like "while(1)" and "time(0)." If a user clicked on one of those, they wouldn't be happy with the result. So, I arranged for grotty(1), our output driver for terminal devices, to emit OSC 8 escape sequences to put real hyperlinks in the data stream going to the terminal. This feature can be applied to text in general, so you can have hyperlinks without the URL appearing in the terminal window. That's nice because URLs tend to be long and ill-behaved with respect to typographical filling and adjustment.

Next, I wanted to expose this feature to man(7) document authors, so I brought an idea that is both old and new to groff man(7): a macro specifically for marking up man page cross references. DEC Ultrix man(7) had this under the name "MS" before there was a World Wide Web, and Plan 9 from User Space introduced an "MR" macro in 2020. I decided that groff man(7) should have it too. Having a semantic macro for these cross-references is also good for searchability and indexing. With a bit of sed scripting, I migrated all of groff's man pages to use it, and Alex Colomar is patiently awaiting my delivery of another sed script to take care of the 1,000+ man(7) documents he ships in the Linux man-pages project.

How portable is GNU groff? What systems does it run on?

I don't have an exciting answer to that, because POSIX, gnulib, and Autoconf have made the world into a place where this isn't the sort of problem that it used to be. One still needs to be seasoned enough to recognize and anticipate areas where portability problems arise, so that one can check for and use portable interfaces.

I'd love to see groff running on the seL4 microkernel and on RISC-V hardware - but by the time that platform has a POSIX layer, groff will be just one of thousands of applications that runs without fanfare. And to be fair, POSIX isn't the most comfortable fit with seL4's highly compartmentalized and paranoid philosophy. But even Microsoft, with all its market power and resources, has (repeatedly) come to the conclusion that lack of a POSIX layer is a barrier to success.

groff 1.23.0 is already available for Windows in both Cygwin and MSYS2 environments, and that's about as exotic a deployment as I know about. Bruno Haible helped us tremendously, verifying our portability to numerous flavors of GNU/Linux and the Hurd and locating problems on proprietary Unices. Ingo Schwarze keeps us honest in the *BSD department. I tested our builds on macOS 12 and Solaris 10 and 11 myself, and learned all over again just how exciting a tool like sed(1) can be when written with only a grudging adherence to POSIX. Late in the day, Bruno discovered some exciting linker behavior on AIX that crops up only with groff's X clients, gxditview(1) and xtotroff(1), and I think the same issue might affect HP-UX. They already offer groff 1.23.0 for the HP-UX 11.31 release!

As far as I know, groff 1.23.0 is the most portable release yet. And with over 400 bug fixes relative to 1.22.4, I hope it is the most satisfying for our users as well.

Thanks to Branden for this insightful interview about GNU groff. You can find the latest release at gnu.org/software/groff