CXone Mpower Transcription

CXone Mpower Transcription is a native option for transcription. It converts spoken words into text and has two modes:

Real-time transcription: Provides transcription in real time as the interaction happens. This is used with applications such as AutoSummary and CXone Mpower Agent applications. Setup is required to use this mode. Additionally, this mode supports vocabulary customizations.
Post-call transcription: Provides a complete transcript at the end of an interaction. Interaction Analytics uses this mode. No setup is required to enable post-call transcription and vocabulary customization are not supported. This feature is part of a Controlled Release program. Contact your Account Representative if you're interested in knowing more.

Both modes of Transcription provide the following benefits:

The current version of Transcription is v11. It uses an AI-driven Large Language Model (LLM) when processing spoken audio. This allows it to produce higher-quality output than previous versions of Transcription. It has improved entity recognition, lower word error rates, and readability.
Transcription produces nondeterministic results. This means there may be differences in the output when the Transcription engine processes the same audio multiple times. This is expected behavior for LLM models.
The Transcription engine removes words and sounds uttered while speaking that don't contribute to meaning. This includes um, uh, repeated words, and so on. These unproductive words and sounds make transcripts difficult to read. Removing them improves the user experience by making transcripts more readable.
Contacts may speak more than one language in a single interaction or even the same sentence. Transcription supports this code switching as long as it supports both languages. If a non-supported language is spoken during the interaction, it is not transcribed.
Transcription provides data about offsets. The offset is a measure of when a specific word or phrase is spoken in relation to the start of the audio. Transcription measures offsets at the word level and at the utterance level. An utterance is what the speaker says during one turn in the conversation.

You can access Continuous Stream Transcription transcripts with the Analyzed Transcript API Icon of a square with an arrow point from the center through the top right corner. .

Post-Call Transcription

Content in this section is for a product or feature in controlled release (CR). If you are not part of the CR group and would like more information, contact your Account Representative.

Post-call transcription provides high-accuracy speech-to-text Also called STT, this process converts spoken language to text. transcription after a call ends. The transcript includes both the agent and contact The person interacting with an agent, IVR, or bot in your contact center. sides of the call. Agents can use these transcripts in the agent application to confirm details from past calls. Supervisors can use them for training and quality checks.

Post-call transcription uses Transcription v11. This is the newest and most accurate engine available in CXone Mpower. It is powered by CXone Mpower Transcription. CXone Mpower does not support third-party transcription services for post-call transcription.

The following table summarizes differences between the real-time and post-call modes of Transcription:

Real-Time Transcription	Post-Call Transcription
Supports word-level and utterance What a contact says or types.-level confidence scores when it uses CXone Mpower Transcription.	Supports word-level and utterance What a contact says or types.-level confidence scores.
Supports all languages available for CXone Mpower Transcription.	Supports all languages available for CXone Mpower Transcription.
Supports custom vocabulary tuning.	Does not currently support custom vocabulary tuning.
Generally available.	In controlled release for Interaction Analytics users. Contact your Account Representative for more information.

Your Account Representative must enable post-call transcription for you.

Confidence Scores

Transcription provides confidence scores at the utterance What a contact says or types. and word level. The confidence score indicates how certain the transcription engine is in its overall transcription of what the contact The person interacting with an agent, IVR, or bot in your contact center. or agent said. Confidence scores are available in the transcription data, but are only visible to users if the applications that use the transcripts can display the scores.

The confidence scores for Transcription:

Are ordinal, where a greater score only indicates a greater level of confidence. A score of .4 does not imply that the confidence level is twice that of a score of .2.
Cannot be used to determine the accuracy of the transcription system. A transcript with a generally low confidence score does not imply a high word error rate.
Cannot be used to filter or threshold a transcript. Removing low confidence scores from a transcript will not make it more accurate.

Custom Vocabulary

Transcription uses an ASR Automatic Speech Recognition. Allows contacts to respond to prompts by speaking, pressing phone keys, or both. model that has been trained on large data sets. However, every organization uses words that are unique, or that are used in unique contexts. This can affect the accuracy of transcription results.

You can customize the model that Transcription uses. This allows you to configure it to recognize terminology that's unique to your organization or that has a unique context in your line of business.

Custom vocabulary configurations are made in Interaction Analytics, but they do not require a license for that application. You only need to have a license for custom vocabulary.

Custom vocabulary is available for all languages that Transcription supports. It is currently not supported for post-call transcription.

Supported Languages

Transcription handles many dialects within each of the supported languages. The transcribed output for each conversation targets a single dialect, but may include aspects of more than one dialect.

Transcription supports the following languages:

Dutch
English
French
German
Italian
Portuguese
Japanese
Spanish
Welsh