Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The diagram below summarizes discussion in AGL Santa Clara F2F (Sept 2018) about wakeword. Feasibility of this proposed flow has not yet been ascertained.

Gliffy Diagram
namewakeword-interaction-draft
pagePin3


Some of the open questions:

  • How do we ascertain control of buffer between voice agents to ensure voice agent X can access audio buffer only when it is supposed to (currently: startListening API sent by VSHL)
  • Different voice agents may have different requirement about time of silence before speech for ASR calibration - we need configuration established for that
  • Wakeword detection, caching, and voice agent ASR recognition are happening in 3 separate processes. How do we make sure all the 3 processes are in sync in terms of buffer position? For example, ahl-softmixer needs to know the exact wakeword position to make sure it is not included when the ASR recognition begins.
  • How do we accommodate voice barge-in in this scenario?
  • Do we need to accommodate the scenario if voice agent also needs access to wakeword uttered as a part of the cached buffer?
  • Event subscription flow and other definitions need to be formalized
  • We need to decide if it is safe for Wakeword engine to close audio buffer on wakeword detection without the risk of ahl-softmixer dropping audio packets