tools 4 ways to make HTML output with pandoc

I like to write the first draft of anything using Markdown. I find that Markdown makes it easier for me to focus on my content and not how it will appear later on.

When I’m done with the first draft, I convert it to whatever other format I need for my output. For that, I rely on pandoc. If I’m writing a web article, I’ll use pandoc to convert from Markdown to HTML. If I’m writing a book chapter, I’ll use a different command line to convert from Markdown to LibreOffice or Word format. In either case, I usually make my follow-up revisions in the other format.

When converting to HTML, I don’t always want to use the defaults. For example, pandoc converts to UTF-8 by default, which is appropriate for some projects, but not what I want for others. Here are four ways that I use pandoc to convert from Markdown to HTML.

Using the defaults

Let’s say I have this sample Markdown file as my starting point. To keep it simple, I’ve avoided using headings or special formatting; this is just a paragraph of body text. However, I intentionally used an apostrophe (don’t) and double quotes (“blank page”). I also used an em dash and an ampersand, even though I rarely do that in my writing:

I like to write the first draft of anything using
Markdown---I find that I can focus on my content &
I don't get the "blank page" problem.

To convert this into HTML, I use this pandoc command line:

$ pandoc --from markdown --to html sample.md -o sample.html

I’ve used several command line options here, but allow me to briefly explain what they do: The --from option tells pandoc what format the input is in; in this case, it’s Markdown. The --to option says what format pandoc should save the output to, such as HTML. The input file is sample.md and the output (-o) is saved to sample.html. This generates the following HTML file:

<p>I like to write the first draft of anything using Markdown—I find
that I can focus on my content &amp; I don’t get the “blank page”
problem.</p>

This is a very smooth conversion to HTML, and if I’m working on a website with a web content management system, I might copy and paste this content directly into the system.

Writing a complete page

However, this conversion generates an incomplete HTML page. That’s okay if I want to just copy and paste the output into a content management system. But because the generated HTML page is not complete, this can make it difficult to correctly preview the content in a web browser.

To generate a full HTML page, I often add the --standalone (or -s) option. From the pandoc manual, this will “Produce output with an appropriate header and footer” such as a standalone HTML file:

$ pandoc -s --from markdown --to html sample.md -o sample.html
[WARNING] This document format requires a nonempty <title> element.
  Defaulting to 'sample' as the title.
  To specify a title, use 'title' in metadata or --metadata title="...".

If your Markdown file doesn’t include metadata, you’ll get a warning that pandoc used a default value for the HTML document’s title. To avoid this, you can instead also provide the title as metadata on the command line:

$ pandoc -s --metadata title='A sample Markdown file' --from markdown --to html sample.md -o sample.html

This generates a full HTML document, including the <!DOCTYPE html> declaration at the top, plus a stylesheet that should make the document more readable in a web browser. I’ve removed the generated CSS for this sample, because that makes the page very long, but I’ve the other HTML:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <title>A sample Markdown file</title>
  <style>
...
  </style>
</head>
<body>
<header id="title-block-header">
<h1 class="title">A sample Markdown file</h1>
</header>
<p>I like to write the first draft of anything using Markdown—I find
that I can focus on my content &amp; I don’t get the “blank page”
problem.</p>
</body>
</html>

The sample HTML output, as viewed in a web browser

The sample HTML output, as viewed in a web browser

Only using ASCII characters

This conversion has some drawbacks. You might notice that the generated HTML uses UTF-8 encoding. Most websites today should use UTF-8, but in some cases where I prefer to use “plain old ASCII,” I would rather not use the extended characters. In those cases, I add the --ascii=true option, to force pandoc to only use “plain ASCII” characters. For this example, I’ll go back to the simple conversion, not the standalone version:

$ pandoc --ascii=true --from markdown --to html sample.md -o sample.html

This generates output that uses Unicode symbols encoded as HTML entities: U+2014 is an em dash, U+2019 is a right single quote, and U+201C and U+201D are left and right double quotes:

<p>I like to write the first draft of anything using Markdown&#x2014;I find
that I can focus on my content &amp; I don&#x2019;t get the &#x201C;blank page&#x201D;
problem.</p>

By default, pandoc uses UTF-8 character encoding for both input and output. I wish pandoc generated standard HTML entities like &mdash; for an em dash, &rsquo; for a right single quote, and &ldquo; and &rdquo; for left and right double quotes, but these are easily translated using a separate command like sed.

Using <q> tags for quotes

If your document includes a lot of inline quotes, you may prefer to use the HTML <q> tag instead of generating double quote characters. This makes your HTML documents more accessible, because the <q> tag indicates an inline quote. To do this with pandoc, add the --html-q-tags=true option to the command line:

$ pandoc --ascii=true --html-q-tags=true --from markdown --to html sample.md -o sample.html

While my sample Markdown file uses double quotes, this isn’t an actual inline quote, only a special phrase. But you can see how the quotes are translated to <q> in the HTML output:

<p>I like to write the first draft of anything using Markdown&#x2014;I find
that I can focus on my content &amp; I don&#x2019;t get the <q>blank page</q>
problem.</p>