ASR Management
The information on this help page applies to both CXone Studio and Desktop Studio.
This section provides information about managing your automatic speech recognition (ASR)-enhanced IVR Automated phone menu that allows callers to interact through voice commands, key inputs, or both, to obtain information, route an inbound voice call, or both. menu system in CXone.
An existing understanding of automatic speech recognition and the Nuance ASR engine is crucial for creating an effective ASR-enabled IVR system. Complete documentation for using this engine is available from Nuance.
Tuning
Required permissions: ASR Tuning Report On
Tuning allows you to improve your ASR system based on data about how ASR Studio actions are currently performing. It's an important part of developing and maintaining your ASR system.
The ASR Tuning report provides information you can use in your tuning process. It provides response rates for ASR actions that fire in a script and lets you view a list of utterances the ASR couldn't recognize. This report is broken down by action and each confidence branch setting.
If you have tuning enabled, you can expand these sections and listen to recorded audio files from that segment. This gives you information about the responses that the ASR system wasn't able to understand. You can add these to your grammar files and phrase lists.
When you're tuning your ASR system, you can :
- View the ASR Tuning report and evaluate the data it provides.
- Listen to recordings from the ASR Tuning report to understand what's the same about the interactions.
- Identify what your contacts are saying and how they're saying it.
- Update grammar files based on what you learn.
- Adjust confidence values, if needed.
Enable Tuning
If your IVR captures any PII (Personal Identifiable Information) data, you may want to carefully choose which sections of your IVR to record during tuning. This will help you avoid issues with capturing personal data. For example, if you have an Asrdigits action that collects a sensitive ID number, you can start the tuning after that action. This would keep the ID number from being recorded.
Disable tuning when you're finished actively tuning your IVR. Leaving the tuning feature on causes immense bloat and stress on the server, as each interaction creates a new audio file.
- In Studio, open your ASR script and add a Voiceparams action. It should be located before the ASR actions you want to work with while tuning.
- If the only purpose of this action in your script is to turn tuning on and off, change the Caption to indicate this purpose. For example, Tuning On and Off. If you also use action to change the language of your IVR, you might need a different caption.
- In the Voiceparams action, set the ASRTuningEnabled property to True.
- When you're finished tuning, set the ASRTuningEnabled property to False.
Tuning Parameters
You can assign script-specific tuning parameters for your Nuance ASR actions. To do this, set a dynamic data object in a Snippet action. Call the object nuanceTuningParamsJson. Its value must be a valid JSON string containing the Nuance parameters to be defined from their defaults. For example:
DYNAMIC asrParams
ASSIGN asrParams.sensitivity = "87"
ASSIGN asrParams.Speech_Complete_Timeout = "1000",
ASSIGN asrParams.Speech_Incomplete_Timeout = "1000"
ASSIGN asrParams.No_Input_Timeout = "1000"
ASSIGN global:nuanceTuningParamsJson = "{asrParams.asjson()}"
If any parameters are set to invalid values, the invalid value is replaced with the default for that parameter, and a variable called invalidParamsList is returned listing those values that were changed.
The following tables provide the tuning parameters that are supported in Studio:
Nuance Parameter | Description | Studio Support |
---|---|---|
Speech_Complete_Timeout | How long to wait before concluding that a caller is finished speaking. |
Supported using nuanceTuningParamsJson. Default: "Speech-Complete-Timeout" : "0" |
Speech_Incomplete_Timeout | Duration of silence to determine that callers have finished speaking. |
Supported using nuanceTuningParamsJson. Default: "Speech-Incomplete-Timeout": "1500" |
No_Input_Timeout |
How long to wait for speech after a prompt ends. Note: This parameter starts counting when the prompt starts playing. This may mean that the script reaches the timeout too soon. The TimeoutSeconds property of the |
Supported using nuanceTuningParamsJson. Default: "No-Input-Timeout": "7000" |
sensitivity | Sensitivity of the speech detector when looking for speech. | Default: 50 (scale of 0-100) |
The following Nuance parameters are not supported in CXone because the Studio Asr action plays prompts instead of Nuance.
Nuance Parameter | Description | Default Value |
---|---|---|
swiep_suppress_barge_in_time | Disables barge-in briefly at the beginning of a prompt. | 0 (no delay) |
swiep_in_prompt_sensitivity_percent | Controls how loudly callers must speak to interrupt prompts (barge-in) and detect speech. | 50 (percent) |
swirec_barge_in_mode | Sets special recognition mods in the recognizer. | normal |
Grammar Files
Grammar files allow you to list many possible utterances that a contact might speak in response to a prompt. The Nuance ASR engine attempts to match the contact's response with an entry in the grammar file. Because the ASR engine must find a match for the whole utterance, grammar files give Nuance a focused list of utterances to choose from.
A grammar file is one of the most effective ways to increase the accuracy of your ASR-enhanced IVR. ASR analyzes actual human interactions, so there are countless numbers of options that the system must recognize. This makes an ASR system much more complex than one that only responds to DTMF Signaling tones that are generated when a user presses or taps a key on their telephone keypad. tones. DTMF recognizes 12 tones, which means there are 12 options the IVR system must recognize. Human speech contains exponentially more options and combinations of sounds, words, and phrases the ASR system must recognize.
For example, a contact might respond to a prompt asking for their member number with this sentence: "My member number is 123456789." An ASR-enhanced script would recognize the entire phrase, but other scripts would fail when the contact began with "My member number is..." instead of only the number.
When updating a grammar file, rename the file before using it in production scripts. This helps avoid conflicts during the update process. It also leaves the original file as a backup in case you need to revert for any reason. You can use variable substitution when specifying the grammar file name in ASR actions in your scripts.
Enhanced Accuracy
Grammar files enhance the accuracy of ASR systems. You can add words and phrases to a grammar file that contacts are likely to say in addition to the expected information. For example, if the prompt asks the contact for a member number, you can add phrases to the grammar file such as "my member number is", "I think it's", "hang on, let me find my card", and so on.
The focused list in a grammar file helps limit the number of permutations in utterances. The longer an expected response is, the more possible responses there could be. Grammar files help limit the scope of possible responses by including the ones that are common and most likely to be used.
You don't need to think of all the possible responses to add. Use the tuning process to learn how contacts actually speak. You can add to your grammar files based on what you learn while tuning. Creating grammar files should be an iterative process as contacts use the system and you learn from the places where the ASR fails to understand responses.
Pronunciation Variations
When tuning your ASR system, listen for variations in pronunciation. It can be helpful to add multiple entries to your phrase lists and grammar files with various phonetic spellings.
This can be especially helpful if the prompt could elicit responses that are often mispronounced or have alternate pronunciations. An example could be "fungi" (plural of fungus). You could add the following additional phonetic entries in addition to the "fungi" entry: "fun guy", "fun gee", "fun jee".
Multiple Language Support
ASR supports multiple languages. Grammars are language-specific. Reference the name of the language in the header of the file so that the engine specifically looks for utterances in that language.
In any grammar file, the entries must use the same alphabet, sentence structure, and so forth as the referenced language. For example, if you were to use the word "piñata" for a Spanish-specific grammar, your entry must use the tilde symbol (~) over the "n" so that the entry is "piñata" and not "pinata."
ASR versus Natural Language Processing
ASR and grammar files can create a result that is similar to a natural language processing (NLP Also called NLP, this process understands human speech or text and responds with human-like language.) system, but they are not the same. ASR is like a bridge between DTMF and NLP. It's not meant to capture everything, but it can capture most things. This is why grammar files are so important. The better the grammar file is, the more responses the ASR system can successfully recognize.
Key Facts About Grammar Files
- Grammar files should be used for most ASR Studio actions.
- The Asralphanum, Asrcurrency, Asrdate, Asrdigits, Asrnumber, Asrtime, and Asryesno actions have built-in grammar files. You can create and use your own grammar files in addition to the built-in ones.
- The Asr and Asrmenu actions do not have built-in grammar files. You must create your own for these actions.
- The Asrcompile and Asrsql actions allow you to build custom grammar files from an existing database.
- Symbols cannot be used in the utterance of a grammar file, but can be returned with the value.
- Creating grammar files should be an iterative process. Each time you tune your ASR system, you discover new items to add to your grammars.
Example Grammar Files
Three example grammar files are provided for your to download:
Color_Grammar_Example.grxml (in a ZIP file)
Digits_Grammar_Example.grxml (in a ZIP file)
Format_Grammar_Example.grxml (in a ZIP file)
These examples illustrate the "rule approach" for creating the structure of a grammar file. This approach uses three rules: a prefix, the main grammar, and a suffix. Prefixes are utterances people often say before giving the main body of info, like "it is", "um", or "I think it is." Suffixes are little additions at the end of an utterance, like "I guess" or "maybe." The middle rule is the actual grammar where you can define all of the possible entries for the data that you want to collect, like colors, numbers, or models.
ASR Parameters and Settings
This section provides information about some important ASR parameters and settings.
Confidence Parameters
When the ASR engine recognizes a phrase spoken by a caller, it returns a percentage that indicates how confident it is in its matching of the utterance to an item in the phrase list or grammar file. The confidence percentage can be used to route calls to different branches in your ASR-enabled IVR script.
The confidence levels used in the CXone are:
- High: High confidence; typically 75% or greater. Set the confidence value with the HighConfidence property in ASR actions. The contact can be routed through the OnHighConfidence branch without any further confirmation of the utterance.
- Medium: Mid-range confidence, in between high and minimum. The contact can be routed through the OnMedConfidence branch and asked to confirm the utterance. This category doesn't have a property. Everything that falls between the configured minimum and high levels can be routed by this branch.
- Minimum: The minimum acceptable level of confidence. Set the confidence value with the MinConfidence property in ASR actions. This value sets the number for the lower range of the OnMedConfidence branch.
- No Confidence: The utterance was unrecognizable and the ASR engine cannot interpret it. Anything that is less than the MinConfidence value falls in this range. The contact can be routed through the OnNoConfidence branch and asked to repeat the utterance.
Most ASR actions have branches for different confidence levels. This allows you to customize the user experience and deal with variability in accuracy. Confidence variables are system variables and therefore do not appear in a script trace unless you enable system variables to appear in the trace.
Confidence is affected by factors like background noise or conversations, accents, or spelling of grammar file entries.
MAX offers a method of sensitivity-customization if an agent is assigned a Personal Connection skill though the voice threshold setting to assist in measuring and filtering out levels of background noise, the agent's voice detection, and so forth.
Timeout Setting
The length of time that the action will sense an utterance and attempt to find a match; the default duration is 10 seconds.
Intervoice Timeout Setting
This is the amount of time that the system will wait after a contact stops speaking. The system waits to make sure that the contact does not continue speaking. It's similar to the InterDigitTimeout setting for DTMF Signaling tones that are generated when a user presses or taps a key on their telephone keypad..
For example, when providing an account number, contacts generally group the numbers together with pauses in between: "123 <pause> 456 <pause> 789 <pause>". The <pauses> in the preceding example represent the intervoice timeout. The default value is 3 seconds. When creating or tuning a script, remember to account for the time it takes for the contact to speak, the intervoice timeout time, and a small amount of time for processing. Too many timeout settings may stack on top of each other to result in a failed action.
Errors
Error | Description |
---|---|
ASR Initialization Failed | The media server is unable to contact the ASR server. This could be caused by several reasons, including the ASR service not running or ports that are not open. |
Grammar File Error: Grammar could not be compiled. Please check your grammar for syntax errors. | Typically caused by XML issues with the grammar. |
URL Failure. Recognizer was unable to access the specified URL. | Grammar does not exist, was not referenced correctly, or the file server could not be reached. |
ASRRESULT | Determines if ASR was detected. |
ASRCONF | The resulting ASR confidence value, 0-100. |
ASRCOMPLETIONCAUSECODE | Indicates ASR completion. |
ASRERRORMESSAGE | A textual description of the error as reported by Nuance. |
ASRSTATUSCODE | Indicates the status with one of the following values:
|