NiCE CXone Transcription
NiCE CXone Transcription is a native option for transcription. It converts spoken words into text and has two modes:
-
Real-time transcription: Provides transcription in real time as the interaction happens. This is used with applications such as Automated Summary (AutoSummary) and NiCE CXone Agent applications. Setup is required to use this mode. Additionally, this mode supports vocabulary customizations.
-
Post-call transcription: Provides a complete transcript at the end of an interaction. Interaction Analytics (CXone) uses this mode. No transcription-specific setup is required to enable post-call transcription and it also supports vocabulary customizations.
Both modes of Transcription provide the following benefits:
-
The current version of Transcription is v11. It uses an AI-driven Large Language Model (LLM) when processing spoken audio. This allows it to produce higher-quality output than previous versions of Transcription. It has improved entity recognition, lower word error rates, and readability.
-
Transcription produces non-deterministic results. This means there may be differences in the output when the Transcription engine processes the same audio multiple times. This is expected behavior for LLM models.
-
The Transcription engine removes words and sounds uttered while speaking that don't contribute to meaning. This includes um, uh, repeated words, and so on. These unproductive words and sounds make transcripts difficult to read. Removing them improves the user experience by making transcripts more readable.
-
Contacts may speak more than one language in a single interaction or even the same sentence. Transcription supports this code switching as long as it supports both languages. If a non-supported language is spoken during the interaction, it is not transcribed.
-
Transcription provides data about offsets. The offset is a measure of when a specific word or phrase is spoken in relation to the start of the audio. Transcription measures offsets at the word level and at the utterance level. An utterance is what the speaker says during one turn in the conversation.
-
Support custom vocabulary tuning.
-
Support all languages available for NiCE CXoneTranscription.
You can access Continuous Stream Transcription transcripts with the Analyzed Transcript API
.
Language and Code‑Switching Behavior
NiCE Transcription supports code switching between the configured transcription language and other supported languages.
Code switching behavior depends on the character set used by the configured language:
-
When a language that uses the Latin character set is configured (For example English or Spanish), transcription supports code switching only with other Latin based languages. Non Latin character sets are not supported in this situation.
-
When a language that uses a non-Latin character set is configured (For example Japanese), that character set is enabled for transcription. In this case, code switching is supported between:
-
The configured non-Latin language, and
-
All supported Latin-based languages.
-
This design ensures accurate recognition while preventing unintended character-set mixing that could degrade transcription quality.
Post-Call Transcription
Post-call transcription provides high-accuracy speech-to-text
Also called STT, this process converts spoken language to text. transcription after a call ends. The transcript includes both the agent and contact
The person interacting with an agent, IVR, or bot in your contact center. sides of the call. Agents can use these transcripts in the agent application to confirm details from past calls. Supervisors can use them for training and quality checks.
Post-call transcription uses NiCE Transcription v11, the latest and most accurate transcription engine in NiCE CXone.
NiCE CXone does not support third-party transcription services for post-call transcription. However, if a real-time transcript is generated by a third party transcription service, it can be used in post-call applications like Interaction Analytics (CXone).
Confidence Scores
Transcription provides confidence scores at the utterance
What a contact says or types. and word level. The confidence score indicates how certain the transcription engine is in its overall transcription of what the contact
The person interacting with an agent, IVR, or bot in your contact center. or agent said. Confidence scores are available in the transcription data, but are only visible to users if the applications that use the transcripts can display the scores.
The confidence scores for Transcription:
- Are ordinal, where a greater score only indicates a greater level of confidence. A score of .4 does not imply that the confidence level is twice that of a score of .2.
- Cannot be used to determine the accuracy of the transcription system. A transcript with a generally low confidence score does not imply a high word error rate.
- Cannot be used to filter or threshold a transcript. Removing low confidence scores from a transcript will not make it more accurate.
Custom Vocabulary
Transcription uses an ASR
Automatic Speech Recognition. Allows contacts to respond to prompts by speaking, pressing phone keys, or both. model that has been trained on large data sets. However, every organization uses words that are unique, or that are used in unique contexts. This can affect the accuracy of transcription results.
You can customize the model that Transcription uses. This allows you to configure it to recognize terminology that's unique to your organization or that has a unique context in your line of business.
Custom vocabulary configurations are made in Interaction Analytics (CXone), but they do not require a license for that application. You only need to have a license for custom vocabulary.
Custom vocabulary configurations are shared across both real-time and post-call transcription. The same configuration settings will be applied in both transcription modes.
Custom vocabulary is available for all languages that Transcription supports.
Supported Languages
Transcription handles many dialects within each of the supported languages. The transcribed output for each conversation targets a single dialect, but may include aspects of more than one dialect.
Transcription supports the following languages:
- Dutch
- English
- French
- German
- Italian
- Portuguese
- Japanese
- Spanish
- Welsh