Tuesday, September 18, 2007

A Follow-up on eAudiobooks

Having received a record two comments, I thought I would continue my thoughts on merging eBook and Text-to-Speech technology. (Rest assured this has nothing to do with my inability to think of a new topic at the moment.)

After a little more searching, I’ve found what I consider to be the best Text-to-Speech voice engine: Neospeech. It only barely edges out Apple’s Alex from my previous post, but I believe the Neospeech voices sound a bit less robotic. Better yet, these voices aren’t exclusive to Mac users. Go ahead and paste part of this blog entry into the free online demo. (IE only.) Also of note, and perhaps amusement, are the foreign language voices. I know a bit of Japanese and those voices sound just as good as the English ones. What’s fun though, is having it read English text in another language. While a few words are unintelligible, the majority of them sound as if they’re spoken by someone with a heavy accent. To me, this speaks volumes on how sophisticated and flexible the technology is becoming.

Bonnie mentioned how boring it would be to listen to a single droning voice read a work of fiction. I do agree it rather unpleasant with the current technology. I've been using it a bit when reading articles online and multitasking, and it's serviceable for that. It's especially nice for more snooze-worthy readings. I find I can absorb more of the article than I would if I attempted to just plow through the reading.

As for fiction, it would be possible to have different voices for a Text-to-Speech audiobook, but the text file would need to be tagged to indicate when different characters are talking. While this would be a simple solution, it would also greatly hinder the flexibly of generating an audiobook automatically from any text file. Of course, it might be possible for the software to detect different characters automatically one day, but this seems unlikely to me. One mistake would be quite distracting, or even confusing, to the listener. Not to mention having to assign the appropriate gender to each character. How much can we really trust a program to figure out if “Alex” is a man or woman?

Another problem is the large storage requirement for a high-quality voice. It looks like the best voices currently take up about 200MB each. This definitely hurts the idea of producing full audiobooks from relatively small text files. Still, even with large voice files, the text itself would only take up a fraction of the disc space a full audio dictation would. An audio book of Alice in Wonderland weights in at 86MB, while an ebook is only 160KB. After the initial space investment with the voice file, it rapidly becomes much more efficient to use ebooks to generate the audio. And let’s not forget the huge saving in server costs this would be to libraries!

For now, it seems the practical applications are a bit limited, but my interest is certainly piqued. Now if you’ll excuse me, I believe a friend of mine is about to receive a hilarious call from a confused Chinese man. ;-)

-Joe

2 comments:

Mary Alice Ball said...

Joe, you are so quiet in class but when you get your hands on a keyboard you have a very strong voice. It's so interesting then that you are in to text-to-speech translation. This is a fascinating topic and one I would really like to pursue more. It has such wide application when you think of reaching populations that for whatever reason have problems with traditional library information resources and technologies.

Joe Murray said...

Heh, yeah I'm told that sometimes. I'm sure you'll hear me more once we start working on the project more directly. I'm more of an observer in larger groups. Not sure why that is, it's just my nature. In small group work I joke around and get right into the thick of things with everyone, often even inadvertently assuming a leadership role.

As for my TTS obsession, it's a combination of my love of technology and multi-tasking. I'm interested in the technology behind it but I also love the freedom it gives me. As I said previously, I've been listening to audiobooks in all manner of situations now when I wouldn't have found the time to read directly. While the TTS programs aren't quite that handy yet, I do use it read articles and even homework assignments on occasion, when I'm otherwise occupied.