4 ways to make HTML output with pandoc
I like to write the first draft of anything using Markdown. I find that Markdown makes it easier for me to focus on my content and not how it will appear later on.
When I’m done with the first draft, I convert it to whatever other format I need for my output. For that, I rely on pandoc. If I’m writing a web article, I’ll use pandoc to convert from Markdown to HTML. If I’m writing a book chapter, I’ll use a different command line to convert from Markdown to LibreOffice or Word format. In either case, I usually make my follow-up revisions in the other format.
When converting to HTML, I don’t always want to use the defaults. For example, pandoc converts to UTF-8 by default, which is appropriate for some projects, but not what I want for others. Here are four ways that I use pandoc to convert from Markdown to HTML.
Using the defaults
Let’s say I have this sample Markdown file as my starting point. To keep it simple, I’ve avoided using headings or special formatting; this is just a paragraph of body text. However, I intentionally used an apostrophe (don’t) and double quotes (“blank page”). I also used an em dash and an ampersand, even though I rarely do that in my writing:
I like to write the first draft of anything using
Markdown---I find that I can focus on my content &
I don't get the "blank page" problem.
To convert this into HTML, I use this pandoc command line:
$ pandoc --from markdown --to html sample.md -o sample.html
I’ve used several command line options here, but allow me to briefly explain what they do: The --from
option tells pandoc what format the input is in; in this case, it’s Markdown. The --to
option says what format pandoc should save the output to, such as HTML. The input file is sample.md
and the output (-o
) is saved to sample.html
. This generates the following HTML file:
<p>I like to write the first draft of anything using Markdown—I find
that I can focus on my content & I don’t get the “blank page”
problem.</p>
This is a very smooth conversion to HTML, and if I’m working on a website with a web content management system, I might copy and paste this content directly into the system.
Writing a complete page
However, this conversion generates an incomplete HTML page. That’s okay if I want to just copy and paste the output into a content management system. But because the generated HTML page is not complete, this can make it difficult to correctly preview the content in a web browser.
To generate a full HTML page, I often add the --standalone
(or -s
) option. From the pandoc manual, this will “Produce output with an appropriate header and footer” such as a standalone HTML file:
$ pandoc -s --from markdown --to html sample.md -o sample.html
[WARNING] This document format requires a nonempty <title> element.
Defaulting to 'sample' as the title.
To specify a title, use 'title' in metadata or --metadata title="...".
If your Markdown file doesn’t include metadata, you’ll get a warning that pandoc used a default value for the HTML document’s title. To avoid this, you can instead also provide the title as metadata on the command line:
$ pandoc -s --metadata title='A sample Markdown file' --from markdown --to html sample.md -o sample.html
This generates a full HTML document, including the <!DOCTYPE html>
declaration at the top, plus a stylesheet that should make the document more readable in a web browser. I’ve removed the generated CSS for this sample, because that makes the page very long, but I’ve the other HTML:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>A sample Markdown file</title>
<style>
...
</style>
</head>
<body>
<header id="title-block-header">
<h1 class="title">A sample Markdown file</h1>
</header>
<p>I like to write the first draft of anything using Markdown—I find
that I can focus on my content & I don’t get the “blank page”
problem.</p>
</body>
</html>
Only using ASCII characters
This conversion has some drawbacks. You might notice that the generated HTML uses UTF-8 encoding. Most websites today should use UTF-8, but in some cases where I prefer to use “plain old ASCII,” I would rather not use the extended characters. In those cases, I add the --ascii=true
option, to force pandoc to only use “plain ASCII” characters. For this example, I’ll go back to the simple conversion, not the standalone version:
$ pandoc --ascii=true --from markdown --to html sample.md -o sample.html
This generates output that uses Unicode symbols encoded as HTML entities: U+2014 is an em dash, U+2019 is a right single quote, and U+201C and U+201D are left and right double quotes:
<p>I like to write the first draft of anything using Markdown—I find
that I can focus on my content & I don’t get the “blank page”
problem.</p>
By default, pandoc uses UTF-8 character encoding for both input and output. I wish pandoc generated standard HTML entities like —
for an em dash, ’
for a right single quote, and “
and ”
for left and right double quotes, but these are easily translated using a separate command like sed.
Using <q>
tags for quotes
If your document includes a lot of inline quotes, you may prefer to use the HTML <q>
tag instead of generating double quote characters. This makes your HTML documents more accessible, because the <q>
tag indicates an inline quote. To do this with pandoc, add the --html-q-tags=true
option to the command line:
$ pandoc --ascii=true --html-q-tags=true --from markdown --to html sample.md -o sample.html
While my sample Markdown file uses double quotes, this isn’t an actual inline quote, only a special phrase. But you can see how the quotes are translated to <q>
in the HTML output:
<p>I like to write the first draft of anything using Markdown—I find
that I can focus on my content & I don’t get the <q>blank page</q>
problem.</p>