Generating your own random text
Use this Bash script to make your own placeholder text.
There are many reasons you need to create placeholder text. For example, if you are building a new website, you may not have all of the content ready as you're creating the design; placeholder text helps you see what the design will look like after you've added the content.
For years, my "go-to" to generate sample content for documents was the lipsum.com website, to insert Latin-like meaningless text. Most people are able to ignore the placeholder content if they immediately recognize that it's just meaningless words, and Lorem Ipsum can do that very well. If I want placeholder text in English, I sometimes use other placeholder generators to do the same job, by inserting random content from Star Wars, Doctor Who, and Star Trek.
Depending on what I'm working on, I might also use scribble fonts with placeholder text such as the Flow font or Redacted Script font. Using a combination of "dummy" text with a scribble font hides the placeholder content, so it doesn't matter what text you use on the sample website or in the sample document. Instead, people only "see" lines and squiggles to suggest text on the page. And that's great if I'm just trying to work out a design without getting bogged down by the details of what specific text will be on the page.
If you want to insert placeholder text like Lorem Ipsum, you can install any number of text generators for your system. Some text editors can also insert Lorem Ipsum placeholder text for you. But there's another way to create placeholder text without installing an app or copying from a website: you can make your own text generator.
Scripting with Bash
I wrote my own Bash script on Linux to generate a few paragraphs of random text. Here's how it works.
Word lists
Every Linux system includes a default dictionary of correctly-spelled
words, usually saved in /usr/share/dict/words
. These words
are in sorted order, and contain both uppercase and lowercase words. If
you use the head
command to print the first ten lines of
the words
file, you will see "words" that start with
numbers:
$ head /usr/share/dict/words
1080
10-point
10th
11-point
12-point
16-point
18-point
1st
2
20-point
If you add the grep
command to search for all lines that
start with a lowercase letter a
, you can see the first ten
examples:
$ grep '^a' /usr/share/dict/words | head
a
a'
a-
a.
a1
aa
aaa
aah
aahed
aahing
The grep
command is an old Unix command that operates
with regular expressions. You can think of a regular expression
as search text that can include special text that indicates the start of
a line (^
), the end of a line ($
), or other
markers. You can specify repeating examples of text by using
+
for one or more or *
to mean
zero or more of the previous character. If you want to specify
certain classes of characters, you can use special brackets
like [[:upper:]]
to mean the uppercase letters A to Z, or
[[:lower:]]
for the lowercase letters.
This is a very flexible command that makes it possible to search for all kinds of text in a file. For example, to print all lines that start with an uppercase letter followed by one or more lowercase letters, you would use this regular expression:
grep '^[[:upper:]][[:lower:]]\+$' /usr/share/dict/words
However, this can find some very long words, if they are in the
words
file. On my system, the longest words that start with
an uppercase letter followed by one or more lowercase letters
are Prorhipidoglossomorpha, Pseudolamellibranchia, and
Pseudolamellibranchiata. Those are too long if I want to generate some
random placeholder text for a website. I think good placeholder text is
a reasonable length, maybe 2 to 8 letters long for lowercase words, or 4
to 8 letters for uppercase words.
To limit the length of the words, I can send the output of the
grep
command to another classic Unix command called
awk
, implemented as gawk
(GNU
awk
) on most Linux systems. The awk
command
takes pairs of patterns and actions; for each matching
pattern, it executes the action. In my case, to print just the words
that start with an uppercase letter followed by one or more
lowercase letters, and are more than 2 letters and less than 8 letters
long, I would use this command:
grep '^[[:upper:]][[:lower:]]\+$' /usr/share/dict/words | gawk 'length($0)>4 && length($0)<8 {print}'
That's a long line, but it's just a grep
command to find
lines of text, and sending that to the gawk
command.
But a gawk
pattern can also be a regular
expression, using basically the same syntax as the
grep
command. That allows us to rewrite the command to
search the /usr/share/dict/words
file for all words that
start with an uppercase letter followed by one or more
lowercase letters, more than 2 letters and less than 8 letters, as a
single gawk
command:
gawk '/^[[:upper:]][[:lower:]]+$/ {if ((length($0)>2) && (length($0)<8)) {print}' /usr/share/dict/words > upper.tmp
This moves the length test inside the action, using if
to determine if the word's length is greater than 2 and less than 8.
Other than using a redirector (>
) to save the output to
a temporary file called upper.tmp
, the command is
essentially the same, but doing it all inside gawk
instead
of using grep
then gawk
.
We can generate a list of all lowercase words of a certain length
with a similar gawk
command:
gawk '/^[[:lower:]]+$/ {if ((length($0)>2) && (length($0)<8)) {print}}' /usr/share/dict/words > lower.tmp
Loops
The script generates 5 paragraphs of text, each consisting of a
random number of sentences, each with a random number of words. I do
this with several for
loops, to iterate over a set of
values. For example, to print out the text "Hello" 4 times, I would
write this for
loop:
for word in 1 2 3 4; do echo "Hello"; done
If you type this at the Bash command line, or save it to a "script" file and run it, you should see "Hello" printed back to you 4 times:
Hello
Hello
Hello
Hello
At every "pass" through the loop, the variable word
is
assigned the value 1, 2, 3, or 4. You can print out the value of the
word
variable by writing it with a "dollar sign" in front,
like this to print the numbers 1, 2, 3, and 4:
for word in 1 2 3 4; do echo $word; done
If you run this at the Bash command line, Bash will print the values 1, 2, 3, and 4 on separate lines:
1
2
3
4
You can also put one for
loop "inside" another; this is
called nested loops. It's easiest to show nested loops by
writing it in a script, where I can split up the lines to make the
instructions more clear. For example, this prints the values A1, A2, B1,
and B2 to the screen using nested loops:
for letter in A B ; do
for number in 1 2 ; do
echo $letter$number
done
done
I've also added some extra spacing so you can see the nested loops in
action, and to make clear what is "inside" each loop. When I write
for
loops like this, I usually write the ;
with spaces on either side. This is just a personal style, you don't
need to use the extra space.
If you save this to a script and run it, you should see the values A1, A2, B1, and B2 printed to the screen. That's because the "outer" loop iterates through the letters A and B; for each "letter" loop, the "inner" loop iterates through the numbers 1 and 2. The effect is the loop generates the four values in order:
A1
A2
B1
B2
Random lines
To generate random words, either all lowercase words or words that
start with an initial uppercase letter, we need to print random lines
from a word file. We can use gawk
to find the words we
need; the next step is to pick random words from the temporary file.
Linux provides a command called shuf
that can
shuffle a text file and generate a file with the lines in a
random order. For example, let's print the numbers 1, 2, 3, and 4 in a
random order with the shuf
command:
for num in 1 2 3 4; do echo $num; done | shuf
Every time you run this command, Bash will print the list of 4
numbers, and "send" them to the shuf
program, which prints
the lines in a random order. But there's an easier way to generate a few
numbers, using the seq
command to print a sequence
of numbers. For example, to print the sequence from 1 to 4, but in a
random order, send the output from seq
into the
shuf
command:
seq 4 | shuf
The random order changes every time you run this command, but it might create a list that looks like this:
4
1
2
3
If you have a longer list, but only want to see the first few lines
from the shuffled list, send the output to the head
command. This prints only the first ten lines by default; use a hyphen
with a number to print that many lines, such as this to shuffle a list
of ten numbers but print only 4 lines of output:
seq 10 | shuf | head -4
Putting it all together
With these Bash scripting commands, plus a few extra Bash features that I'll show you, you can generate a few paragraphs of random text. Each paragraph contains a random number of sentences, between 5 and 8 sentences. Each sentence has a random number of words, between 6 and 9 words.
#!/bin/bash
words=/usr/share/dict/words
lower=/tmp/lower.tmp
upper=/tmp/upper.tmp
gawk '/^[[:lower:]]+$/ {if ((length($0)>2) && (length($0)<8)) {print}}' $words > $lower
gawk '/^[[:upper:]][[:lower:]]+$/ {if ((length($0)>2) && (length($0)<8)) {print}}' $words > $upper
for para in $(seq 5) ; do
s=$((RANDOM % 5 + 3))
for sent in $(seq $s) ; do
w=$((RANDOM % 6 + 3))
( shuf -n 1 $upper ; shuf -n $w $lower ) | tr '\n' ' ' | sed 's/ $/. /'
done
echo -e '\n'
done
rm -f $lower $upper
On my system, I saved this script to a file called
mkwords.bash
. Let's look at this in more detail to
understand how it works:
The first few lines save some values to a few variables; a
variable is just a way to access a value later on. In this
case, I've saved the path to the word list in a words
variable, the path to a list of lowercase words in the
lower
variable, and a list of uppercase words in the
upper
variable. I can use these at any time in the Bash
script with a "dollar sign" like $words
to get the full
path to the word list, at /usr/share/dict/words
.
After that, the script runs the two gawk
commands to
generate the list of all-lowercase words and the list of words that
start with an uppercase letter.
Then, the script uses a nested for
loop to
print 5 paragraphs. This also sets a variable called s
that
is a random number between 3 and 7. That's because the
$(( ))
brackets create an arithmetic expansion, so
Bash can do simple arithmetic. You probably know the basic arithmetic
operators like add (+
), subtract (-
), multiply
(*
) and divide (/
). You can also use
%
to mean modulo, or the remainder after division.
For example, 9 % 4
is 1, because 9 divided by 4 is 2
with 1 left over. The arithmetic expansion to assign a value to
s
uses RANDOM
to mean a random
number, and taking the modulo of 5 will give a value in the range
0, 1, 2, 3, or 4. That means s
can be in the range 3 (0 +
3) to 7 (4 + 3).
The next loop generates that many random sentences, from 1 to
s
, using a similar trick to pick a random number of words
(w
) between 3 and 8.
The last line inside the "inner" loop uses two shuf
commands to print 1 random word (-n 1
) from the uppercase
words, then the random number of words (-n $w
) from the
list of lowercase words. The random words are printed one per line, so
I've added the tr
command to translate the newline
(\n
) to a space. The sed
command makes
line-by-line edits to add a period to the end of the line. These
commands generate a series of "sentences" that begin with an uppercase
word followed by a random number of lowercase words, plus a period.
After each sentence, the script uses an echo
command to
print an extra newline. Actually, the echo
command itself
generates a newline, so this command effectively prints 2 newlines.
The last line in the script cleans up my temporary files by deleting them.
Random placeholder text
Whenever I need to generate some placeholder text for a project, I can just run this Bash script to print out a few paragraphs. Every time I run the script, it prints 5 paragraphs of a few sentences, each with a reasonable number of words. This is somewhat representative of text that I might include in a document.
The script prints each paragraph on a single line. To make it more
readable, I'll send the output through the fmt
program to
"wrap" the lines:
$ bash mkwords.bash | fmt
Birkett lainer palmus sault momo platten. Jonas besugo rerake hisself
pont mappers rethaw. Eurasia mousees unlobed unogled plumcot pit conchal
bohawn. Duparc ova ethnic fainty peened erbiums.
Gargan arustle memoria tertio goonies violins chablis raku fautor. Gustin
markery sabalo garvock therms ginete darshan. Murdock ead shaving
lifeway atveen aswash woorali bauckie bruiser. Nolitta culture luteway
lehrs hostel gelofer ahmedi thecla fondish. Kona upcrawl coadmit toronto
devilet cropper jinni.
Pinto hearten toit cazimi upmast mastmen. Lynnell altho kitties twains
coden zaffirs enflesh hippic. Lati math sledger rivets. Alphons achene
vexers sago oops nookery. Palilia desugar orguil laddie. Bascom kowtows
milage rematch. Buber frizzy remue giddily micas chays.
Kula pans hatch shucked murly. Nivre cucupha fizzed boding villein
tortive. Esidrix pagan grinnie muggily. Templia weste unget shellum
fungoid mansard aequor slinker dernly.
Jan cockeye elfwife erosely. Hendel yett mouched isaac. Seami iocs cedar
adonite junt rappage doatish.
Every time I run the mkwords.bash
script, it generates
new random words, sentences, and paragraphs:
$ bash mkwords.bash | fmt
Anomura sardel asks nongod abacus postage belder. Haydn nad munjeet
nth subcool galjoen sipper defiber. Zapus nasutus honkie slour. Joshuah
alunite sideman colarin opiate pound mibound. Mozart snivy tchu storied
opine flated darnix calx.
Aquilid somaten gaucher pedro glaucus skellum. Luhe justles break
mattins mids baryte. Smoos buirdly oenone perrier terefah kuskus autarky
barms. Parris dishpan yoicks juddock lab bechalk aidful. Lorado sieur
croft civets ritards lezzies deaned.
Mikana rowdy strond taxicab braye leprosy beachy opiner. Qld dowse
morally tissual. Disa goad alinit sirex soddier.
Rozelle helloes shamash ideally truced ascitb lobo. Suzanna teetee trull
didymis spumier tackety joul pilule. Guaymas lochy gurgeon hyson. Concho
chivies adfix urazole whumped teca seor tamein. Unkelos balded jambul
pudenda. Dunlo subsign atheize beroll reduced smelted urunday regloss
gingers.
Achsah beastly notcher gifted pows tilaka. Dobb neascus tardy goos rads
sarum unlived creped pucksey. Anselmi lizard hamline mandala archy indigen
vac. Kaiser rhet verdins bustian grigs scotale. Herald solein nilghau
rigel smarm vena. Jourdan sakeber apologs simians hlqn seenie cubhood
keened glime. Dannica agnails pulser ungum sedent awns transom visit.
This script works well for me, but you can still improve it. For
example, every time the script runs, it generates the same list of words
from the /usr/share/dict/words
file. Since the system's
word list doesn't change very often, you can make this script run faster
if you save the list of temporary words somewhere in your home
directory, and only regenerate the lists if they are not there.
Also, the /usr/share/dict/words
file contains some words
that are not work-friendly. So instead of using the system's word list,
you might make your own list of words to use. One way to create a list
like this is to use the words from other documents you have already
written, and use that word list as the starting point.
But if you just want to generate a few paragraphs of random-length sentences with random words, this script will do the job. And you can do it on your own with a Bash script.