What is SSML in Voice Technology?

While the text-to-speech capabilities of virtual assistants are really quite amazing, every once in awhile, there’s a hiccup and Alexa or Google Assistant’s default behavior will not pronounce a word correctly. Fortunately, we have Speech Synthesis Markup Language (SSML) as a fallback.

Alexa, Phoneme Home

SSML is an XML standard used by virtual assistants for providing additional instructions to assist in providing a good audio experience. SSML tags can be used to control pitch, play sound effects or audio files, control which voice to use, and in the case of faulty pronunciation, provide hints on how to pronounce a problematic word.

I encountered an issue recently while working on our company’s voice application, where under certain circumstances, both Alexa and Google Assistant were having issues correctly pronouncing the name of our platform “Sonibridge”. The platform name is pronounced with a short “o” as in “top”, which both assistants generally handled perfectly well, but in a couple of cases, they decided to use a long “o” as in “sole”.

I was able to patch this on both platforms using SSML tags. Now, although SSML is supposed to be a standard, the actual support by virtual assistants varies.

<cough> Anyone remember writing cross-platform javascript for IE and other browsers in the early 2000’s? </cough>

On Alexa, there is support for a phoneme tag where you wrap the word or phrase you want to substitute pronunciation for, and then specify an alphabet and symbols that provide pronunciation instructions to the assistant. To enable the proper pronunciation of “Sonibridge”, I used the following tag:

<phoneme alphabet="ipa" ph="sɔnɪbrɪd͡ʒ">SoniBridge</phoneme>

On Google Assistant, though, as of this writing, there is no support for phoneme, in hopes of getting the same effect, I used the “sub” tag. You wrap the word or phrase just as you would do with a phoneme tag, but then specify an alias and a phonetic pronunciation to the assistant. In this case, to attempt to correct the pronunciation of “Sonibridge”, I used the following tag:

<sub alias="saunabridge">SoniBridge</sub>.

Sometimes, despite your best efforts, the voice assistants will find it difficult to pronounce a word correctly. Just remember, with SSML as a fallback, it doesn't have to be as challenging as dancing on skates. And, you don't have to call the whole thing off. Watch the video...


