Introducing Amazon's Alexa - How does Alexa work?

18 November 2016
Artificial Intelligence

At Kin + Carta, we have been experimenting and building a multitude of Skills to both increase our own knowledge, and validate our thinking as to how Alexa, and the Echo, could be used to help our clients better serve their customers. You can see a sneak preview of one such experiment, prototyping Skills for personal banking.

Rolling up our sleeves and building Skills has been critical: after all, there are some things Alexa can’t tell us herself - things that our strategists, designers, and engineers needed to discover.

This includes the fact that some services and interactions fit this new channel perfectly, while others are simply not suited to an invisible interface at all. In that time, we have also learned a great deal about what works for Alexa, what does not - and how to build a robust, great Skill.

In an upcoming post, I am going to share the ten things that Alexa didn’t tell us but that we have uncovered in the course of our work. However, before we share some of these lessons with you, let’s introduce ourselves to Alexa properly. What is Alexa? And how does Alexa work?

The Amazon Echo is a hands-free, in-home speaker. The Echo device connects to the cloud-hosted Alexa Voice Service (AVS), which allows you to complete a range of smart tasks simply by beginning a question or issuing a command with the Wake Word, ‘Alexa’. Think of Alexa as a voice-enabled personal assistant, and the Amazon Echo as one of her physical ‘homes’. Alexa also lives inside two smaller speakers: Amazon Tap (at the time of writing, currently unavailable in the UK), and the Amazon Dot, as well as a growing range of devices from other manufacturers.

Alexa is the core of the service, and Amazon has been keen to make it as accessible to a wide range of use cases by offering software libraries that can be embedded within any suitable custom hardware - from an iPhone to a Raspberry Pi.

How does the Amazon Echo work?

If you take an Echo apart, you will find a large powerful speaker, seven small microphones, and… well, that’s about it. When the microphones pick up the trigger word, ‘Alexa’, it wakes the device up, records the sentence, and sends it to Amazon’s Alexa Voice Service (AVS).

At the heart of the service, the user’s voice passes through a sophisticated speech recognition engine, which trains itself on the user’s voice over time. The unique part, however, about the service is how it turns a user’s spoken sentence into commands that relate to a particular Skill.

In AVS, every skill has a Voice Interaction Model. This model is what is used to filter what the user is saying, and break it down into a command and parameters that the triggered skill understands.

The ‘brain’ of each Skill lives in a separate web service, which the Skill is configured with the location of. It is here that the Skill processes the command and its parameters, and decides what the response should be. This response is then fed back to the voice service and sent to the Amazon Echo. This workflow means that while an Amazon Echo lives in your kitchen, Alexa herself lives in the cloud.

Understanding the syntax of an Alexa Skill

Alexa can complete a wide variety of tasks for you - from adjusting your smart lighting, to playing your favourite Spotify playlist or calling an Uber. She uses ‘Skills’ to achieve this: you can think of a Skill as Amazon’s version of a voice-only app, and each feature of a particular Skill is called an ‘Intent’. Since Alexa recognises natural language, there could be several ways that these questions or commands can be phrased. Each spoken phrase that can be used to activate a single Intent is called an ‘Utterance’.

Importantly, the Echo doesn’t actually run any of these Skills locally - it’s Alexa who runs them, in the cloud. This means there is no installation process, but rather the user enables a new Skill through a single tap on the Alexa app from their smartphone. Thereafter, the Skill is accessible via voice through the Echo.

Some come pre-baked into the Echo, and Amazon has provided an Alexa Skills Kit (ASK), essentially an SDK, for third party developers to create their own. Users activate Alexa Skills through Utterances, which is simply a sentence that invokes a command via AVS.

Confused? OK, let’s break down what each part of the diagram above means:

An ‘Intent’ is the action a user requests Alexa complete - like a feature on an app.
Each spoken phrase that can be used to activate a single Intent is called an ‘Utterance’.
The Wake Word activates Alexa and can be either Alexa, Echo or Amazon (set by the user).
The Invocation Name is a trigger which tells Alexa to use a specific Skill.
The Slot is a defined place in the sentence where Alexa needs to capture user input (the city Seattle, in this example).

Additionally, there are two types of responses that Alexa can give back to the user:

Tell Response: Alexa says one phrase, and then the session ends and the context is lost.
Ask Response: Alexa says one phrase back to the user and awaits an answer. In this case, the session is held open and context is retained.

In addition to spoken responses, a Skill can ask AVS to display a ‘card’ in the supporting Amazon Alexa app for iOS and Android devices. A card can be thought of as a sticky note for the answer that was provided - for example, if the Skill allowed you to find the cheapest flight for Christmas, the card would give you extended details of the date, time, price and flight number, along with a link to make a booking on a relevant website.

Developing Skills for Alexa

The process of developing Skills is split into two parts.

Firstly, you have to code the main functionality that the Alexa Skill will achieve, and deploy it to a web service. We have been using Amazon’s Lambda serverless compute platform, which works very well with Alexa.

Secondly, you need to set it up on the Amazon Developer Services portal as an Alexa Skill. This part is called the voice interaction model, and involves defining your Intents, as well as the Slots they will use. There are a number of built-in Slots for recognising common concepts such as numbers and dates, but you can define custom ones as well.

In addition, you also need to define sample Utterances for your Skill in the portal. Essentially, these are examples of expected phrases that people will use in order to engage an Intent.

Wrapping it up

So, now we’ve outlined how Alexa works: the fundamentals of Alexa, Skills and the Echo itself. Keep an eye out for our second post on the technical lessons we have gathered during our experimentation, and further thoughts on how to approach developing Skills for Alexa.

Interested in learning more?

Get in touch

Contact us

* First name

* Last name

* Email

* Company

* Country

Phone number

Job title

* Message

By submitting, you consent to Kin and Carta processing your information in accordance with our Privacy Notices and Terms and Conditions.

I would like to be contacted with news and updates about your events and services.

Introducing Amazon's Alexa - How does Alexa work?

How does the Amazon Echo work?

Understanding the syntax of an Alexa Skill

Developing Skills for Alexa

Wrapping it up

Interested in learning more?

Contact us

More like this

Testing for tomorrow: Incorporating AI in quality assurance testing

Intelligent Experiences with Composable Commerce

AI in FS: Is your focus in the right place?