Having received a record two comments, I thought I would continue my thoughts on merging eBook and Text-to-Speech technology. (Rest assured this has nothing to do with my inability to think of a new topic at the moment.)
After a little more searching, I’ve found what I consider to be the best Text-to-Speech voice engine: Neospeech. It only barely edges out Apple’s Alex from my previous post, but I believe the Neospeech voices sound a bit less robotic. Better yet, these voices aren’t exclusive to Mac users. Go ahead and paste part of this blog entry into the free online demo. (IE only.) Also of note, and perhaps amusement, are the foreign language voices. I know a bit of Japanese and those voices sound just as good as the English ones. What’s fun though, is having it read English text in another language. While a few words are unintelligible, the majority of them sound as if they’re spoken by someone with a heavy accent. To me, this speaks volumes on how sophisticated and flexible the technology is becoming.
Bonnie mentioned how boring it would be to listen to a single droning voice read a work of fiction. I do agree it rather unpleasant with the current technology. I've been using it a bit when reading articles online and multitasking, and it's serviceable for that. It's especially nice for more snooze-worthy readings. I find I can absorb more of the article than I would if I attempted to just plow through the reading.
As for fiction, it would be possible to have different voices for a Text-to-Speech audiobook, but the text file would need to be tagged to indicate when different characters are talking. While this would be a simple solution, it would also greatly hinder the flexibly of generating an audiobook automatically from any text file. Of course, it might be possible for the software to detect different characters automatically one day, but this seems unlikely to me. One mistake would be quite distracting, or even confusing, to the listener. Not to mention having to assign the appropriate gender to each character. How much can we really trust a program to figure out if “Alex” is a man or woman?
Another problem is the large storage requirement for a high-quality voice. It looks like the best voices currently take up about 200MB each. This definitely hurts the idea of producing full audiobooks from relatively small text files. Still, even with large voice files, the text itself would only take up a fraction of the disc space a full audio dictation would. An audio book of Alice in Wonderland weights in at 86MB, while an ebook is only 160KB. After the initial space investment with the voice file, it rapidly becomes much more efficient to use ebooks to generate the audio. And let’s not forget the huge saving in server costs this would be to libraries!
For now, it seems the practical applications are a bit limited, but my interest is certainly piqued. Now if you’ll excuse me, I believe a friend of mine is about to receive a hilarious call from a confused Chinese man. ;-)
-Joe