...
Wake word detection will not be supported. As of today, AGL Audio 4a, doesn’t yet support storing a persistent audio input buffer can be shared between multiple consumers. In this case wake word module and high-level voice services. We discussed in detail about the audio design to support wake word use cases. However there is no further information on the timeline for having this support to be baked into AGL Audio framework.
Major Tasks
# | Component | Ownership |
1 | Voice Service High Level (VSHL) Development | Amazon to deliver first draft the can be open sourced and submitted to AGL repository. |
2 | Alexa Voice Agent Development | Amazon |
3 | QT based App for Template Rendering | Amazon |
4 | Native App Development and Integration With Voice Service High Level | Linux Foundation / IOT.BZH |
5 | Audio Input Output Support | Linux Foundation / IOT.BZH |
6 | Application Framework Support | Linux Foundation / IOT.BZH |
External Dependencies
Applications should be able to launch themselves when they receive intents from Voice Service Interaction Manager.
Audio High level needs to create 4 audio roles for Alexa to do audio output.
Audio High Level needs to create 1 audio role for High Level Voice Service to do audio input.
Speech Chrome Application needs to be implemented to display different dialog states (IDLE, LISTENING, THINKING, SPEAKING) of the voice agent.
Template Runtime Application to show the templates that are delivered as responses by each voice agents. If we can't standardize the language for this template, then as a workaround, Amazon will implement Alexa UI Template Runtime Application that can render Alexa templates for CES Demo 2019.
...
User starts speaking and says, “Alexa, Set an Alert for next 1 minute” or “Set an Alert for next 1 minute.”
Alexa Voice Agent will prompt TTS that Alert is set.
Alexa Voice Agent will call Voice Interaction Manager’s Alerts::Publish API to publish new Alert state.
Alexa Voice Agent will play Alerts Audio after one min.
...
Phone Call Control
User starts speaking and says "Alexa, Call Bob"
Alexa Voice Agent will prompt TTS to disambiguate the contact request.
Alexa Voice Agent will call Voice Interaction Manager's Call:: Publish API to publish a DIAL event.
Dialer app on the AGL reference platform will pick the event and initiate call based on the event payload.
Dialer app will call Voice Interaction Manager's Call:: Publish API to publish a CALL_ACTIVATED downstream event for Alexa Voice Agent to update it's context.