The text-to-speech (TTS) capabilities of Alexa, Google Assistant, Bixby, etc. are impressive to say the least. The voice assistants also provide the ability to use different voices in your applications, but at this time, artificially generated voices still tend to sound, well, artificial. They’re just not capable yet of easily making subtle changes in intonation the way humans take for granted.
Human voices improve user experience.
While it is possible to tweak artificial voices pretty significantly using SSML tags, that can be quite a bit of work. Amazon is enabling emotive features for Alexa, but it may take some time for it to become part of a normal conversation with the virtual assistant. Why is this important?
You’ve decided to go for it and create your first voice application. You’ve started to write the copy for the audio that the voice application will present to the user and are pretty happy listening to the voice assistants read it back, but for one of your messages something just feels like it’s missing.
You can’t quite put your finger on it, but something about the delivery of the message by the voice assistant just doesn’t seem to be working. You try playing with other voices the assistants provide and even try playing SSML tags, but for some reason, the delivery just doesn’t feel right.
Who needs a celebrity voice?
We ran into just this scenario when we were creating a voice application for our company. For most of the messaging, the virtual assistant voices were fine, but for a message we wanted to present about the company, we decided that instead of using the standard virtual assistant’s voice, we would deliver the message using the voice of one of the founders.
Sure, you can use one of the trendy celebrity voices. But why not make the experience more personal with someone from the company / brand behind the voice app instead?
Following is an example of the original message played back using Alexa’s voice. Note the use of third person.
Next is an example of the audio we used. In this case, the same message is delivered by one of the founders. Note the use now of first and second person.
Notice any differences? Beyond the change in person, that message now stands out from the rest of the voice application – humanizing the message and the overall application. Now, when a user asks for information about our company, they hear it personally from one of the founders.
Maintenance of Voice Apps with Human Recordings
Before you go all-in and use recorded audio for your entire voice application, consider that if you want to change a message later, you will need to re-record the audio, vs. just changing text and letting TTS take care of the rest. That being said, proper use of recorded voices in a voice application can dramatically change a user’s experience.
Another consideration is the availability of celebrity voices that replace the AI voice on a smart device. How will your recorded voice sound with the variety of celebrities a consumer could set as their default voice?
Have further questions on this sort of topic or voice applications in general? Feel free to reach out to Whetstone Technologies via our Contact page to schedule a free one-hour consult.