Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
confi

Background

Automotive Grade Linux (AGL) is a collaborative open source project that is bringing together automakers, suppliers and 
technology companies to accelerate the development and adoption of a fully open software stack for the connected car. 
Being a part of speech expert group, Amazon (Alexa Automotive) team intends to collaborate to help define the voice 
service APIs in AGL platform.

...

This design makes no assumptions on the mode in which the high level voice service component is configured and running. 

1)

...

agl-service-

...

voice-

...

high

This binding has following responsibilities.

  • Structurally follows a bridge pattern to abstract the functioning of the specific voice agent software from the application layer.
  • The request arbitrator is main entry point to the system. It is responsible for routing the utterance to the correct voice agent based on various parameters like configuration, wake word detection etc.
  • Registers for dialog, connection, auth etc events from voice agents. Maintains the latest and the greatest state of the voice agents.
  • Audio/Visual Focus management. Provides an interface using which the voice agents can request audio or visual focus before actual rendering the content. In multiple active voice agent scenario, we can imagine that each agent would be competing for audio and visual focus. Based on the priority of the content, the core should grant or deny focus to an agent. In cases where it grants the focus, it has to inform the agent currently rendering the content to duck or stop. And make audio and visual focus decisions on behalf of the voice agents its managing.

State Diagram

Image Removed

API

Code Block
languagecpp
titlevshl/startListening
collapsetrue
vshl/startListening


Starts listening for speech input. As a part of request, common configuration related information is passed. 


Note: The config inputs below are just examples and not the final list of configurations. 


"permission": "urn:AGL:permission:speech:public:audiocontrol" 


Request: { 
  "language":"string"
  "location":"string"
  "preferred_network_mode":"string" // online, offline, hybrid
}


Responses: { 
  "jtype":"afb-reply", 
  "request": { 
    "status":"string" // success or bad-state or bad-request
  }
  "response":{
    "request_id": "string" // Request created by this call.
    "agent_id": "string" // Agent to which the request has been proxied.
  }
}

Current Architecture

The following diagrams dives a litter deeper into the low level components of high level voice service (VSHL)

and their dependencies depicted by directional arrows. The dependency in this case can be either through an

association of objects between components or through an interface implementation relationships.

For e.g.,

A depends on B if A aggregates or composes B.

A depends on B if A implements an interface that is used by B to talk to A.

Gliffy Diagram
nameVSHL Architecture
pagePin2

VoiceAgents Module

Gliffy Diagram
nameVoiceAgentsModule
pagePin1

Core Module

Gliffy Diagram
nameVSHL Core Module
pagePin1


Capabilities Module

Gliffy Diagram
nameVSHL Capabilities Module
pagePin1

Sequence Diagrams

OnLoad

On load the controller will instantiate the entry level classes of each module and inject their dependencies. For e.g Core module observers changes to voiceagent data in VoiceAgent module.

Image Added


StartListening to Audio Input & Events

Image Added



State Diagram

Image Added

API

Code Block
languagecpp
titlevshl/startListening
collapsetrue
vshl/startListening


Starts listening for speech input. As a part of request, common configuration related information is passed. 


Note: The config inputs below are just examples and not the final list of configurations. 


Request: { 

}


Responses: { 
  "jtype":"afb-reply", 
  "request": { 
    "status":"string" // success or bad-state or bad-request
  }
  "response":{
    "request_id": "string" // Request created by this call.
    "agent_id": "string" // Agent to which the request has been proxied.
Code Block
languagecpp
titlevshl/cancel
collapsetrue
vshl/cancel


Cancels the speech recognition processing in the chosen agent.
If agent id is not passed then the cancel request is sent to the default voice agent.

"permission": "urn:AGL:permission:speech:public:audiocontrol"


Request:
{
  "agent_id" : "integer"
}

Responses:
{
  "jtype":"afb-reply",
  "request":{
    "status":"string" // success or bad-state or bad-request
  }
}


Code Block
languagecpp
titlevshl/isAvailablecancel
collapsetruetrue
vshl/cancelListening


Cancels the speech recognition processing in the chosen agent.
If agent id is not passed then the cancel request is sent to the default voice agent.


vshl/isAvailable

Check if the voice agents are available and running on the platform.

"permission": "urn:AGL:permission:speech:public:accesscontrol"

Request:
{

}

Responses:
{
  "jtype":"afb-reply",
  "request":{
    "status":"string" // success or bad-state or bad-request
  }
  "response": {
    "available":"boolean"-request
  }
}



Code Block
languagecpp
titlevshl/subscribe
collapsetrue
vshl/subscribe

Subscribe/Unsubscribe to voice service high level events.

"permission": "urn:AGL:permission:speech:public:accesscontrol"

Request:
{
  {
    "type":"array",
    "items" : [{
        "type":"string" // List of events to subscribe to
      }
    ]
  },
  {
    "subscribe":"boolean"
  }  
}

Responses:
{
  "jtype":"afb-reply",
  "request":{
    "status":"string" // success or bad-state or bad-request
  }
}

...

Code Block
languagecpp
titlevshl_dialogstate_event
collapsetrue
Dialog state describes the state of the currently active voice agent's dialog interaction. 

Event Data:
{
  "name" : "vshlvoice_dialogstate_event"
  "state":"string"
  "agent_id": "integer"
}


Values for state are
1) IDLE
High level voice service is ready for speech interaction.

2) LISTENING
High level voice service is currently listening.

3) THINKING
A customer request has been completed and no more input is accepted. In this state, Voice service is working on a response.

4) SPEAKING
Responding to a request with speech.

...

Code Block
languagecpp
titlevshl_connectionstate_event
collapsetrue
Connection state describes the state of the voice agent along with errors. 

Event Data:
{
  "name" : "vshlvoice_connectionstate_event"
  "state":"string"
  "agent_id": "integer"
}

1) DISCONNECTED
Voice agent is not connected to its voice service endpoint.

2) PENDING
Voice agent is attempting to establish connection to its endpoint.

3) CONNECTED
Voice agent is connected to its endpoint.

4) CONNECTION_TIMEDOUT
Voice agent connection attempt failed due to excessive load on its server endpoint.

5) CONNECTION_ERROR
Captures other network related errors.

...

Code Block
languagecpp
titlevshl_authstate_event
collapsetrue
Auth state describes the state of the authorization of the voice agent with its cloud endpoint. 

Event Data:
{
  "name" : "vshlvoice_authstate_event"
  "state":"string"
  "agent_id": "integer"
}

1) UNINITIALIZED
Authorization not yet acquired.


2) REFRESHED
Authorization has been refreshed.

3) EXPIRED
Authorization has expired.


4) ERROR
Authorization error has occurred.

...

vshl/phonecontrol/subscribe  -  For subscribing to phone control messages below.


Messages

DownstreamUpstream

{

  Topic   : "PhoneControlphonecontrol"

  Action : "DIALdial"

  Payload : {

      "callId": "{{STRING}}", // A unique identifier for the call

...

  }

}


Upstream

{

  Topic   : "PhoneControlphonecontrol"

  Action : "CALLcall_ACTIVATEDactivated"

  Payload : {

      "callId": "{{STRING}}", // A unique identifier for the call

      "required": "callId"]

  }

}


{

  Topic   : "PhoneControlphonecontrol"

  Action : "CALLcall_FAILEDfailed"

  Payload : {

      "callId": "{{STRING}}", // A unique identifier for the call

...

4xx range: Validation failure for the input from the @c dial() directive
500: Internal error on the platform unrelated to the cellular network
503: Error on the platform unrelated to the cellular network
503: Error on the platform related to the cellular networkthe platform related to the cellular network


{

  Topic   : "phonecontrol"

  Action : "call_terminated"

  Payload : {

      "callId": "{{STRING}}", // A unique identifier for the call

      "required": [ "callId"]

  }

}


{

  Topic   : "PhoneControlphonecontrol"

  Action : "CALLconnection_state_TERMINATEDchanged"

  Payload : {

      "callId": "{{STRING}}", // A unique identifier for the call

...

vshl/navigation/subscribe  -  For subscribing to navigation messages.


Messages

DownstreamUpstream

{

  Topic   : "Navigationnavigation"

  Action : "SETset_DESTINATIONdestination"

  Payload : {

      "destination": {
          "coordinate": {
             "latitudeInDegrees": {{DOUBLE}},
            "longitudeInDegrees": {{DOUBLE}}
          },
          "name"name": "{{STRING}}",
         "singleLineDisplayAddress": "{{STRING}}"
         "multipleLineDisplayAddress": "{{STRING}}",
     }

  }

}


{

  Topic   : "singleLineDisplayAddressNavigation"

  Action : "{{STRING}}"
         "multipleLineDisplayAddress": "{{STRING}}",
     }cancel_navigation"

}


GuiMetadata
API

vshl/guimetadata/publish      -  For publishing ui metadata messages for rendering.

vshl/guimetadata/subscribe  -  For subscribing ui metadata messages for rendering.

Messages

Upstream

{

  Topic   : "guimetadata"

  Action : "render_template"

  Payload : {

     <Yet to be standardized>

  }

}


{

  Topic   : "Navigationguimetadata"

  Action : "CANCELclear_NAVIGATIONtemplate"

}

Template Rendering
API

vshl/templates/publish      -  For publishing template rendering messages.

vshl/templates/subscribe  -  For subscribing to template rendering messages.

Messages

  Payload : {

     <Yet to be standardized>

  }

}Downstream


{

  Topic   : "Templatesguimetadata"

  Action : "RENDERrender_player_info"

  Payload : {

     <Yet to be standardized>

  }

}


{

  Topic   : "Templatesguimetadata"

  Action : "CLEARclear_player_info"

  Payload : {

     <Yet to be standardized>

...

Code Block
languagecpp
titlevshl/enumerate_agents
collapsetrue
vshl/enumerate_agentsenumerateVoiceAgents


"permission": "urn:AGL:permission:speechvshl:voiceagents:public:accesscontrol"


Enumerates and return an array of voice agents running in the system. This might be need for the applications like settings to be able to present some UI with a list of agents to enable/disable, show status etc.


Request: 
{ } 


Responses: { 
  "jtype":"afb-reply",
  "request": { 
    "status":"string" // success or bad-state or bad-request
  } 
  "response": { 
    "type":"array", 
    "items" : 
      [ 
        { 
          "name":"string", 
          "description":"string", 
          "agent_id":"integer" // Voice agent ID 
          "status":"string" // enabled, disabled 
        } 
      ] 
  }
}

...

Code Block
languagecpp
titlevshl/setActive
collapsetrue
vshl/setActivesetDefaultVoiceAgent

Activate or deactivate a voice agent.

"permission": "urn:AGL:permission:speechvshl:voiceagents:public:accesscontrol"


Request:
{ 
  "agent_id":"integer"
  "is_active":"boolean"
}


Responses: { 
  "jtype":"afb-reply",
  "request":
  {
    "status":"string" // success or bad-state or bad-request }
  }
} 

afb-voiceservice-wakeword-detector

  • Provides an interface primarily for the core afb-voiceservice-highlevel to listen for wakeword detection events and make request routing decisions.

  • This binding will internally talk to or host voice assistant vendor specific wake word solutions to enable the wake word detection.


...