• Sanjeev Surati

How are Voice Apps for Alexa and Google Assistant Made?

As brands experiment with ways in which to interact with their customers using virtual assistants such as Amazon Alexa and Google Assistant, understanding the basic building blocks of a voice application can help determine how to properly design it. This article describes some of the key components of Alexa Skills and Google Actions.



At the core of every voice interaction and voice application is Natural Language Processing (NLP). When a person interacts with a traditional graphical user interface (GUI) application, they press buttons, swipe, enter text into text boxes and select items from lists. NLP applications offer a new way to accomplish the same results, but by leveraging the user’s voice. At the core of nearly every voice application are basic concepts that define a language model for your application: intents, utterances and slots.




Intents

At the heart of any user interaction is “intent” – meaning what is the user attempting to accomplish? A user may be supplying a “yes” or “no” answer or attempting to find information on your company – e.g. “Tell me about Whetstone Technologies”. There are different ways users may be expressing their intent – an NLP engine will take in what the users say and, based on hints provided by your voice application, attempt to translate that into intent.


Utterances

There are many different ways a person can express intent. For example, “yeah”, “yes”, “yep”, and “sure” when provided in response to a yes / no question are all different ways of expressing a “Yes” intent. The different ways of expressing the intent are called utterances. They may select what they want from a short list of choices – eg. Whetstone or SoniBridge. Note: a long list of choices does not work as well in voice technology as they do when written, on a web site, for example.


Slots

Slots are placeholders in an utterance representing different values that provide more context for the intent. For example, I may have an intent called “FindInformation” which means that a user is trying to find out more information about my company. A voice application may have a couple of major topics it provides information on and rather than define a separate “FindInformation” intent for each topic, we define a slot “CompanyData” that can have multiple possible values. If we define a CompanyData slot to have two values: “Whetstone Technologies” and “SoniBridge”, we can then define an utterance “Tell me about {companydata}”, where company data can either be “Whetstone Technologies” or “SoniBridge”. The NLP engine will then pass our application a “FindInformation” intent along with a “CompanyData” slot set to the appropriate value and we can return a response with data appropriate to “Whetstone Technologies” or “SoniBridge” or any other slot value we add later.


There’s more to it, of course, but by understanding these three concepts, it becomes possible to create a language model that can then be used to back a voice application for your company or brand.


Bottom line: It’s not only what the users say, but how they say it, in what context and what they are intending to get out of the voice technology interaction. Taking these into consideration when designing your voice app makes it easier for both the developers to code and the NLP to understand. Most importantly, it makes an accurate and enjoyable user experience that customers will share with others.


#voicetechnology #amazonalexa #googleassistant #voicefirst #voiceassistants #voicetechexperts #voicetechnologyexperts #conversationaldesign #SoniBridge #WhetstoneTechnologies

102 views0 comments