Counting words from the command line
How I used the Unix command line to count how many articles I wrote.
When we’re working in a word processor such as Google Docs, LibreOffice, or Word, it’s easy to find the current word count in the document: that’s usually displayed on screen somewhere or can be quickly displayed from a menu action. But the desktop word processor is a fairly recent invention, dating back to the late 1970s or early 1980s, depending on your definition of “desktop word processor.”
In the days of classic Unix, technical writers used Unix to format documents for printing, usually with a document preparation system like nroff, troff, or LaTeX. These files were saved as plain text, which makes it easy to process them with a program, or to perform statistics such as counting words.
Here’s one example that demonstrates how I used the Unix command line to count how many articles I wrote, and how many words in those articles, using classic Unix tools:
How many articles
I ran this example on the Technically We Write website. The back-end of this system stores all article content as files, organized in directories. Each article has a file called content that contains the article text, and another file called author that lists the people who contributed to the article. The files are organized by year, such as 2025 for all articles published in 2025.
To start, I need to separate what I wrote from what others wrote. I can do this using the standard Unix commands: find and grep to separate the content, and wc to count the words.
To get a list of articles that I wrote, I can run the find command to look for all of the author files, and use grep to search for my username. If my name is there, then I wrote it; if not, someone else wrote it.
$ find 2025 -type f -name author -exec grep -q jhall {} \; -print > my-articlesIf you haven’t used find before, it may seem there’s a lot going on in this command line, so let’s walk through it:
- The
-type foption says to look for files - Adding the
-name authoroption says to look for files calledauthor - The
-execoption tellsfindwhat to do when it matches a file; in this case, it runsgrepwith some options - The
{}braces are a placeholder for the matching filename - Use
;to terminate the-execstatement (because this is a special character to Bash, I’ve “escaped” it) - The
-printoption prints the matching filename; since this comes after an-execstatement, the filename will only be printed if thegrepcommand succeeds
After this command, the my-articles file has a list of entries; each is separate path to an author file, and the author file has my username in it. I can use the wc command with the -l option to count the lines in this file, to see that I wrote 34 articles:
$ wc -l my-articles
34 my-articlesHow many words
I’m also curious to see how much I wrote. For that, I need to examine the content file for each article. Counting words in this file will be close to the article count, although not exact. For articles with paragraphs and simple formatting, the word count should be pretty close, although not exact. But for my needs, this is close enough.
To count the words that I wrote, I need to run the wc command for every article written by me. I don’t have that list of article content, but I can get it by editing the list I already have.
The body text for each article is stored in the content file. The my-articles file contains a list of paths to the author files, for articles that I wrote. If we replace the word author with content, we will end up with a list of the HTML content files. The sed command can make that replacement for us, using the s edit instruction to replace or “swap” the string author (the $ means “at the end of a line”) with content, for each line in the file:
$ sed -e 's/author$/content/' my-articles > my-contentTo process each content file with the wc command, I can run wc against the list of files. But for a very long list, this might “overload” the command line with too many files. Instead, use the xargs command to run a command against each file in the list:
$ xargs wc -w --total=only < my-content
40611The --total=only option is a GNU wc extension to only print the total, and nothing else. Without it, wc would also print the word count for each file in the list.
Using the command line
With just the find and wc commands, I can quickly count how much I wrote, both by article count and word count. I wrote 34 articles, and over 40,000 words.
Running the commands took only a few moments to identify all 34 articles and count all of the words. By comparison, imagine doing the same for 34 word processor documents. To count the total words across these documents, you would need to load each document into your word processor, and identify the word count in each document. Then you can manually tally the word count for all articles. But that’s a lot of work. I’d rather let the system do the work for me, by running a few classic Unix commands.
*Adapted from Counting files and words from the command line by Jim Hall, with the author's permission
