Transcription and TTS

The AI-based applications used in CXone Mpower work with text from interactions with contacts The person interacting with an agent, IVR, or bot in your contact center.. The audio from interactions on voice channels must be converted to text so the AI applications can work with it. After analyzing the text, AI applications can provide the responses they're designed to give. This is done using transcription Written form of all or part of a voice or digital interaction. services, also known as speech-to-text (STT).

The AI applications' responses are provided in text format. However, virtual agents need to convert this text to audio that can be played for the contact. This allows the virtual agents to "speak" with contacts. This conversion is done using text-to-speech Allows users to enter recorded prompts as text and use a computer-generated voice to speak the content. (TTS) services.

Working with transcription and TTS in CXone Mpower requires custom Studio scripting. The script manages the capture of interaction audio and sends it to the transcription service and the destination application. The script also manages the application's responses, including sending them to the STT service, if needed. The required scripting varies by use case. It's described in the online help for setting up each virtual agent or agent assist integration.

Transcription

Transcription services convert audio to text, creating transcripts of spoken language. They use AI to accurately interpret audio as human language. AI helps with the accuracy of transcription when speech patterns, accents, and background noise create differences in the way the audio sounds. It also helps by applying Natural Language Understanding This process expands on Natural Language Processing (NLP) to make decisions or take action based on what it understands. (NLU) to improve decision-making about which word the speaker used in cases where the audio is indistinct or when words sound alike. In CXone Mpower, transcription is used for ASR Automatic Speech Recognition. Allows contacts to respond to prompts by speaking, pressing phone keys, or both. menus and integrations with agent assist applications and virtual agents to convert the contact's speech into text for the AI engine to analyze.

CXone Mpower supports two options for transcription. The first is Turn-by-Turn Transcription. This option provides transcription utterance What a contact says or types. by utterance during an interaction. Audio is transcribed to text, then sent to the AI-based application. Virtual agent integrations use this type of transcription. Additionally, some virtual agent providers offer transcription services you can use instead. When you use a provider's transcription service, the interaction audio is sent to the provider, then converted to text.

The second transcription option is Continuous Stream Transcription. This option sends a continuous stream of transcription in small segments. The AI application receives the transcribed text in real time and is able to provide appropriate responses that are relevant to the current conversation. Agent assist applications use this type of transcription service.

Both options support third-party transcription services. CXone Mpower also offers a native continuous stream transcription service called CXone Mpower Transcription.

TTS

Text-to-speech converts written words to audio in the form of computer-generated voices. AI helps the computer-generated output sound more human by reproducing natural-sounding intonation, stress, pacing, and pronunciation. In CXone Mpower, TTS is used in IVR Interactive Voice Response. Automated phone menu contacts use via voice or key inputs to obtain information, route an inbound voice call, or both. menus and virtual agent A software application that handles customer interactions in place of a live human agent. integrations.

For TTS, you can use third-party TTS services or the native TTS service.