teaching-writing XML as data

XML can be used to describe any kind of data. Let's see how to assemble an XML file:

Technical writing is more than just writing the right words or editing a document to make it great. Technical writers need to rely on a variety of writing tools and technologies to get the job done.

With some digital writing technologies like LaTeX or groff, the presentation can be tightly bound to the markup language. Other technologies separate content from presentation, such as HTML for content and CSS for appearance. But HTML isn't the only markup language like this.

A brief history

HTML is derived from a larger document format called SGML, or Standard Generalized Markup Language. SGML became an industry standard in 1986 to provide a markup system for text processing. HTML was branched from SGML for web browsers in the 1990s.

XML, or the Extensible Markup Language, emerged as a subset of SGML in the late 1990s as a markup language to define data. Prior to XML, exchanging data between systems required an implicit agreement about line endings and data encoding. Line endings were an issue because in the 1990s, Windows, Mac, and Unix systems all used different codes to indicate the end of a line: Unix used newline, Mac used carriage return, and Windows used both carriage return and newline.

With XML, valid data could only exist within tags. Other data outside of tags were meaningless. This is similar to how multiple spaces and empty lines in HTML are the same as a single space. In an XML file, an extra carriage return in a carriage return + newline pair (from a Windows system, as read on a Unix system) would be easily ignored.

XML data structure

XML can be used to describe any kind of data. Let's see how to assemble an XML file:

First, every XML file starts with a declaration that tells whatever system that receives the data file that this is XML data:

<?xml version="1.0" encoding="UTF-8"?>

The version attribute indicates that this is an XML file. There are two versions of XML currently available: XML versions 1.0 and 1.1. XML version 1.1 contains several updates for Unicode and relaxes a few other rules; under XML 1.1, "names are designed so that everything that is not forbidden (for a specific reason) is permitted."

The encoding attribute tells the system reading the data how to interpret ASCII codes above 127. The original American Standard Code for Information Interchange (ASCII) defined characters for all codes from 0 to 127. UTF-8 extends this to cover 8-bit values, from 0 to 255.

After the XML declaration, the XML file contains a single data block, which may contain other data blocks or elements. A minimally valid XML file that defines a data block called <data>, which is empty, might look like this:

<?xml version="1.0" encoding="UTF-8"?>
<data>
</data>

Sample XML data file

A minimally valid XML file contains just that: a declaration and a single block of data. However, a more typical example of XML data would contain sub-elements with other data.

Consider an RSS file to syndicate news items from a website. An RSS file is just an XML file with an <rss> parent data block containing a single <channel> data block. The RSS data can capture lots of information about a news feed, but minimally must include these specifiers, which can occur in any order:

  • <title> is usually the name of the website
  • <description> provides a user-readable description of the feed
  • <link> is a link to the website

The RSS channel must also include at least one news item, captured as an <item> data block. The <item> must also specify its own <title> for the news item, <description> of the news item, and <link> to the news item. For example, an RSS file to describe this article on Technically We Write might look like this:

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>

<title>Technically We Write</title>
<description>A website about technical writing, technical editing,
and all things "technical communication."</description>
<link>https://technicallywewrite.com/</link>

<item>
<title>XML as data</title>
<description>XML can be used to describe any kind of data. Let's see
how to assemble an XML file:</description>
<link>https://technicallywewrite.com/2023/10/03/xmldata</link>
</item>

</channel>
</rss>

In technical writing, XML can be a very useful tool. Flexible to apply to most problems, XML has been leveraged as a markup language (such as DocBook), an image format (such as SVG), and as a document language (such as DITA). Learning a bit of XML will get you far in technical communication.