Make writing easier with speech recognition
Tired of typing? Try speech recognition with this open source tool.
I love interviews. They can be a great way to get to know someone and often a great way to learn. One of the most challenging aspects of interviews is capturing exactly what the interview subject had to say. I have used a few methods to capture a subject’s voice, including my mobile phone and Audacity. In both cases, I am left to transcribe that content into a written document.
Now the paradigm is changing with the advent of Whisper, an openly-licensed program developed by OpenAI. According to OpenAI’s website introducing Whisper:
Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web.
It’s an amazing software and easy to install on Linux, which is my daily driver. I used Pop!_OS but you can easily install Whisper on other Linux distributions, such as Fedora. You need to make sure that Python is installed, which you can test by entering the following command:
$ python3 –version
In my case the result was:
Python 3.10.6
With Python on the system, you then need to install a Python virtual environment:
$ sudo apt install python3.10-venv
Next, you need to install pip3, the package installer for Python:
$ sudo apt install python3-pip
Initialize the Python virtual environment for Whisper with this command:
$ python3 -m venv whisper
I then changed into the new whisper
directory and installed the Whisper package with this pip3 command:
$ cd whisper
$ pip3 install whisper
Now with all the pieces in place, I was ready to use this amazing new tool to transcribe mp3 and mp4 audio files into readable text. If you don’t have any audio files and you would like to try out Whisper, you can download a free book or part of one from the free repository at Librivox. I chose Robert Frost’s Mending Wall as a test, and used Whisper from the command line to convert the mp3 audio file into a text document:
$ whisper 04_mending_wall_frost_bc.mp3 –model base
In a little over a minute, Whisper converted the mp3 into text, as five files. One of them is a text file with the text of the mp3 audio. Here are the first few lines as converted by Whisper from the file 04_mending_wall_frost_bc.mp3.
Mending Wall by Robert Frost, read for libravox.org by Becky Crackle, November 16, 2006, Canal Winchester, Ohio. Something there is that doesn’t love a wall that sends the frozen groundswell under it and spills the upper boulders in the sun, and makes gaps even too can pass abreast. The work of hunters is another thing. I have come after them and made repair where they have left not one stone on a stone, but they would have the rabbit out of hiding to please the yelping dogs.
As you can see the results are accurate!
You can create a Python script to automate the process:
import whisper
model = whisper.load_model(“base”)
result = model.transcribe(“04_mending_wall_frost_bc.mp3”)
print(result[“text”])
Using the Python script provides a much cleaner output:
Mending Wall by Robert Frost, read for Librevox.org by Becky Crackle, November 16th, 2006, Canal Winchester, Ohio. Something there is that doesn’t love a wall that sends the frozen groundswell under it and spills the upper boulders in the sun, and makes gaps even too can pass abreast. The work of hunters is another thing. I have come after them and made repair where they have left not one stone on a stone, but they would have the rabbit out of hiding to please the yelping dogs. The gaps I mean, no one has seen them made or heard them made, but at spring-mending time we find them there. I let my neighbor know beyond the hill, and on a day we meet to walk the line and set the wall between us once again.
Whisper is open source software, available under the MIT license.
This article is adapted from Introducing Whisper and is republished with permission from Don Watkins.