How to Easily Create Audiobooks From Text Files in OS X

Generate Spoken Samples using Voiceover on OS X

The primary use of the Voiceover system on the Mac is accessibility for blind or partially sighted users. But it may have occurred to you that it would be a fine idea to use the Voiceover system to produce audiobooks and spoken word prompts for your media, videos, presentations and music.

The problem is the voice is generated in real-time and is not saveable, so if you want to use the voice in any other program, you have to rip it somehow. The good news is that this facility is built in. The bad news is that it’s hidden unless you know where it is.

In this article, we will be showing you how to capture voiceover speech and use it in your own projects.

Getting Started

In any program on the Mac, there is a Program Menu, and within that there is a rarely used menu item called “Services”. Normally when you select it, it has nothing of any interest to offer.

The 'Services' menu in Mac OS X.

But this menu is contextual. Select an area of text and go to the menu again. In many programs, it will now have a new range of options, one in particular offering: “Add to iTunes as a Spoken Track.”

Say, for example, you want to make a spoken word audiobook version of this article. Follow these steps:

1. Select the text of this article, then press “Command + C” to copy the text to the clipboard.

2. Open a text program like TextEdit and press “Commnand + V” to paste it in.

3. If it’s not already selected, select the text you want to have as a sound sample.

4. Go to “Program Menu -> Services -> Add to iTunes as Spoken Track”.

Go to Add to iTunes as Spoken Track.

5. A panel will pop up asking you which voice you would like to use and where you want to store the temporary AIFF file and what you want it to be called.

Save your temporary AIFF file.

This defaults to the Music folder, or you can change that to somewhere more convenient like Desktop.

Choose where to save the AIFF file.

Choosing a Voice

The most advanced and natural sounding voice is Alex (even including breath sounds), but the choice is yours. Be sure to preview the other voices before use, however, because some have a foreign accent when they are fed English text. (Obviously, if English is not your first language, you will already know which international Mac voice is best for your text language.)

Choose a voice; Alex is the most natural.

6. Press Continue.

Once in progress, the Voiceover engine will pronounce the words silently and not in real-time and redirect the sound to a file stored where you chose in the last step. A cog in the menu bar will revolve while this is in progress.

Once the file is saved to disc, it will then be automatically imported into iTunes and the AIFF temp file deleted. (Unless an error occurs.) This may take some time depending on the length of the file.

The file will be automatically imported into iTunes.

The system works flawlessly. It works on short or even very long texts. The upper limit of how much text you can convert to usable audio is insane. For example, a text file of Moby Dick took only a few hours to process, made a 3.76Gb AIFF temp file and generated an audiobook 23 hours and 31 minutes long.

Finally, Import

The only thing you have to bear in mind with very long files is that you may get an error message once the enormous file has processed and before it imports to iTunes. The process will usually complete despite the warning though. Just have patience and let it finish.

While the import to iTunes completes, the “song” will be grayed out with the word “incomplete” beside it. Once it’s done, the completed track on iTunes can then be exported, either by dragging it out of the iTunes window onto the desktop or by right-clicking with the mouse and selecting “Reveal in Finder.” It will be an M4A file.

Export the completed iTunes track.

If you need the file to be in another format, you can always load it into a free sound editor like Audacity and convert it to MP3, WAV, etc.

You can convert the file to MP3, WAV, etc.

Note: Incidentally, it’s good to remember when doing text to speech to use capital letters at the beginning of words which use them and normal punctuation. Although these elements are not spoken they affect the way the voice pronounces the surrounding words.

Phil South Phil South

Phil South has been writing about tech subjects for over 30 years. Starting out with Your Sinclair magazine in the 80s, and then MacUser and Computer Shopper. He's designed user interfaces for groundbreaking music software, been the technical editor on film making and visual effects books for Elsevier, and helped create the MTE YouTube Channel. He lives and works in South Wales, UK.