jump to navigation

Narration in eLearning July 20, 2007

Posted by B.J. Schone in eLearning.
Tags: , , ,

I often use narration in my eLearning courses because I think it adds an additional element of interest, and also because there’s evidence that “presenting words in audio rather than onscreen text results in significant learning gains.”1 However, I ran into some issues recently that made me reconsider how I obtain the narration audio in the first place. Below is a recap of what happened, and what I decided to do about it.

I was in the habit of asking (begging) co-workers to record audio for eLearning courses. We would usually go into a conference room, record audio using a microphone and Audacity, and then I would import the audio files into Flash, Captivate, or whatever eLearning development tool I was using. This worked well, but I eventually ran into problems. First, if content changed a few days later, we would have to re-record the audio. There’s only so many times you can ask a favor from a co-worker before it becomes an issue. Second, our voices are more inconsistent than we realize; they change depending on the time of day, if you’re getting over a cold, etc. Our audio recordings would often sound like two different people (when it was only one person), especially if we recorded with several days in between sessions. Third, the process takes longer than I would like. If tiny content changes had to be made, it often took an hour or two to work through the process. Finally, I was concerned about turn-over. If I have a co-worker record hours of voiceover work, and then they leave the company, I would probably have to re-record an entire course’s audio the next time changes were made. I decided to look for other options.

I discovered several text-to-speech programs, and they seem like they will do the trick for me. These programs allow you to enter (type in) text and then they output audio files of a person (the computer) reading the text out loud. These programs generally aren’t very expensive, and they output to .wav and/or .mp3 format.

A few text-to-speech programs (most have free demos on their web site):

The programs above come with a standard set of voices, but you can purchase higher-quality voices and plug them into your text-to-speech program:

Voices are generally available for these languages: U.S. English, U.K. English, Spanish, Canadian French, Parisian French, German, Italian, Japanese, Korean, and Chinese.

I did notice some oddities when it comes to pronunciation. Every once in a while the text-to-speech program would stumble on a word, or pronounce a word different that what I wanted. To remedy this, try spelling words phonetically or just try to think differently about pronunciation in general. For example, I had the year "1939" in one of my courses, and the text-to-speech program read it as "one thousand, nine hundred thirty-nine." That’s not what I wanted. So, I changed "1939" to "19 39" (adding a space between the numbers), and it read it properly after that ("nineteen thirty-nine").

I doubt that these programs will ever be able to output speech that sounds as natural as a real (live) person. However, considering how much time they save and how easy they are to use, it is a pretty good solution. Give them a shot if you’re ever in a similar situation.

1 From e-Learning and the Science of Instruction, by Ruth Clark and Richard E. Mayer, page 83.



1. Joel Harband - August 9, 2007

B. J.,

On the topic of adding narration with text-to-speech voices, I’d like to add our product Speech-Over (www.speechover.com) to your list. Speech-Over is an add-in to PowerPoint that easily adds voice clips directly to animated objects for a perfectly-timed voice track in an animated presentation. The voice track converts to Flash for e-learning with standard PPT2Flash converters.

Speech-Over also lets you use microphone recording and prerecorded files – as well as any of the SAPI 5 TTS voices.

The TTS voices seem to have found their niche in in-house training where you have a “captive audience” and the main thing is a clear explanation – which TTS voices can deliver.

Joel Harband
Tuval Software Industries

2. Philippe Pernelle - August 20, 2007


Try this text to speech engine :

We have been using it for one year, it is pretty good (and not so expensive !)

Philippe Pernelle

3. Ryan - September 21, 2007

Do you have any follow up with this? I think it’s an excellent idea, and am considering this route myself.


4. Using Audio in eLearning « eLearning Weekly - November 16, 2007

[…] post about using audio and narration in eLearning. It’s worth a read. I’ve discussed narration in eLearning before, but Cathy does a better job of breaking down the different ways audio and narration can […]

5. Chris - April 22, 2008

B.J., I know this post is old, but I just found it while surfing.

Personally, I think using synthetic voices for eLearning is a bad, bad idea. Sure, it’s fast. Sure, it’s easy. Sure, your learners will hate it.

Not worth it.


6. B.J. Schone - April 22, 2008

Hi Chris,

Thanks for your feedback; I certainly understand your points. I agree that synthetic voices aren’t the ideal solution, but they’re a solution that can help in some cases… I just wanted to make people aware of the technology so that they can evaluate it for their particular situation. But if your organization has a decent budget and is willing to spend the money, I’d use professional voice talent.


7. Steve Anthony - April 23, 2008

Let me preface this by admitting that I’m a voiceover person. Most of my work is for elearning and web based training projects. With that said, for in-house material, a text to speech program may work fine. But keep in mind that adding voice adds not only sound, but personality and credibility to a project. I truly believe that effective communication is not only about just the words, but the message behind them and the way they’re delivered and any passion put behind them. We can speak the word’s to Hamlet’s famous soliloquy, ‘To be, or not to be . . . ‘ But when Kenneth Branagh performs them, I get the message. I understand. And I believe him. I think that’s an important part . . . not what’s being said, but what is the learner is hearing and understanding. If the teacher sounds detached, what subtle message is being sent to the learner?


8. Vanessa Rose - March 8, 2009

Hi B.J.,

I appreciate your balanced analysis of text to speech use in eLearning. Let me preface my comment as well; I am with iSpeech.org, a text to speech Web service. That being said, I believe that text to speech is the perfect solution for many occasions, but it does not completely replace a human narrator (however, our voices are the best available, and are nearly human).

The benefits of text to speech are:
Fast, inexpensive, helps auditory learners
The cons of text to speech are:
Imperfect pronunciation, imperfect inflection, lack of emotion

Text to speech Web services like iSpeech.org are not meant to replace a human voice talent, but rather provide an alternative. It is simply not time or cost effective in many cases to pay for voice talent to record audio for eLearning.

It is also important to note that many learners do not benefit from any audio. According to one of our studies, approximately 1/3rd of learners consider themselves audio learners, 1/3rd consider themselves visual and the remaining are unsure or a combination (multi-sensory learners).

On the other end, our CEO developed iSpeech because he benefited from text to speech audio versions during his time at Rutgers University. There are a few articles on the Web if you search for iSpeech and Rutgers. So, to respond to the people who say it is not useful for learning, that is simply untrue. If you still don’t believe it is helpful, I would happily forward you some of the hundreds of customer comments we receive via the iSpeech.org feedback form.

Could you suggest any features that our company could add for eLearning? Our CEO is very eager to develop software that helps people learn. Please email me if you have any suggestions.

-V. Rose

Ps. B.J., you can speech enable your blog with our WordPress widget.

9. Viral Notebook | Michael M. Grant, Ph.D. - February 1, 2010

[…] conditions discussed above might not be effective if the quality of recording is bad. In his blog, Narration in eLearning, Schone describes some of the issues faced in producing narration.    The same applies to a […]

10. Jim Kinneer - December 29, 2010

Are you aware of any research that compares human voiceover to tts? Would be interestng to know if and how it impacts learning and engagement.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: