Currently an Alexa skill models consist of intents. Each intent can be invoked by predefined utterances.
It doesn't matter to Alexa at which point the user is in the dialog with the skill. Alexa's NLU searches all existing intents for the best match and does not favor those intents that were actually created to handle the reponse preceding Alexa immediate.
Larger skills can contain many intents,utterances, custom slot types slot type values.
We experience that the mapping between a user's speech input to intents and slot values degrades with larger models.
At some point Alexa even fails to match user utterances to the correct intent but prefers intents with similar utterances even though they are not an exact match while an exact match exists in the model.[br]
Example 1:
A skill with a cuple of intents. Two of them are:
- ManOrWomanIntent with the single slot utterance: "{manOrWoman}"
where manOrWoman is of the custom slot type MAN_OR_WOMAN which has the values "man" and "woman".
- LocationIntent with the utterance "{country}"
where location is of type AMAZON.
Country.
At some point the Skill asks a question to which the only meaningful answers are "man" or "woman". But when the user says "man" Alexa maps it to the Locations intent's country slot with the value "mann". Even though it is totally out of context. (Sidenote:This also raises the questions why the word "mann" is prefered over the word "man" and what country is called "mann"?)
Workaround:
In this case, or where we observe the same intent call with the same slot value we add a handler for the undesired intent to the currenct "dialog state" and handle the invocation as if it was an invokation of the actually expected intent. In the case of example 1 we would implement a handler for LocationIntent which would check if the {country} slot value is "mann" and if so handle it as if the ManOrWoman intent was invoked.
[br]Example 2:
A skill with a couple of intents. Two of them are:
- RememberIntent with the utterance: "merken" (german for "remember")
- CityIntent with the utterance: "{city}"
where city is of the custom slot type CITY which contains the names of nearly all german cities,towns and villages.[br]
When we added the CITY type to the skill's model Alexa stopped recognizing the utterance "merken" and therefore failed to invoke the RememberIntent. In many attempts to trigger the RememberIntent we observed that Alexa mapped the user speech to different names of cities which in many cases aren't even similar to the word "merken".
Workaround
As opposed to example 1 we don't know what to expect. Since we observe many different values for {city} when we say "merken" we can't check for all of them. We might miss some. And we should probably not default any invokation of the CityIntent to the "merken" behavior. The user might have intended something else which for some reason was mapped to the CityIntent.
What we effect is we replace the utterance "merken" with something that doesn't collide with other values. But that can lead to rather unnatural dialogs.[br]
Suggestion for improved speech recognition
A great improvement would be if we could inform Alexa which intents we expect to be called in the next user response. Alexa could then either favor those intents over the others or even not consider other intents at all. (That should probably not affect intents like the AMAZON.
StopIntent.) That is more or less what we already effect in the code. When an unexpected intent is received we handle it as it is unknown to the skill in that situation. We only handle invokations to the expected intents.
Source: amazon.com