Building your next generation customer service with speech recognition

For many decades the human-machine interaction depended solely on pushing symbol buttons. The late speech revolution driven by technological breakthroughs, digital transformation and A.I. made it possible to interact with machines using our most basic yet most evolved way of communication: speech.

Speech technology such as TTS (text to speech) and STT (speech to text) have become mainstream in customer service applications. They’re used for saving time and cutting costs by routing to the right agents or departments, enhancing the customer service by monitoring quality and design of communication flows and allowing for self-service by generating automated answers and databases and last but not least identify customers by user authentication.

In a series of blogs we will dive into each of these applications and share how Zoom Media speech to text models help customer services to maximize the value of communication for customers, employees and the organization.

Press #1 for Dutch – inbound IVR

Let’s start with the very first customer-machine-business interaction through phone, IVR. IVR (interactive voice response) made it possible to communicate with a system using the keypad which triggered pre-programmed actions in the host’s system. We all recognize phrases like: “Press #1 for English OR press #9 if you have questions about your account”, well that’s inbound IVR. For small confined possible outcomes IVR using keypads is quite efficient. The downfall of the IVR system is that it doesn’t support large menus. After all, for a complex set of actions one needs to define all possible outcomes, preprogram the answers and ‘rout’ the customer to the right service. You can imagine this is quite a time-consuming endeavor. Moreover the system doesn’t support outcomes beyond the scope of the menu, the ‘black swans’. Systems without a breakout possibility to human agents soon end up in an endless loop of frustration and dissatisfaction.

Pinpoint on pain points

Nowadays speech recognition models like Zoom Media offers are being applied in customer services as an upgrade for the old-fashioned IVR. It’s easier to answer the question “How can I help you?” than pressing an endless series of buttons. Speech recognition in IVR systems allows the business to efficiently ask the customer the right question, receive the right answer, and rout or escalate to the right agent. Furthermore, extraction of phrases like: “I already called yesterday”, or “I am still not helped”, allow the host system to pinpoint on pain points, find solutions for the customer and learn from interactive conversations. The available data extracted and processed in realtime is used to trigger automated text-to-speech answers or to dig up crucial information from the host’s database to satisfy the customer’s need. Analysis of recorded audio is used for training the agent individually and host system as a whole.

Conversational Analytics in IVR and data security

Modern IVR-systems also analyze emotions in real-time. Audio can contain words that red-flag a conversation. One might think of “lawsuit”, “angry”, “legal” etc.  In reality, just like in human-human communication, it’s often not a matter of what was said but how someone said it. Analyzing and following up on this type of conversations can thus be done more proactively by routing the call to a “peace-keeper” within the team. The opportunities speech technology offers for IVR are huge. Where audio is recorded and voice is analyzed, privacy issues come to play. Legislation in most countries is strict. Which is exactly why Zoom Media doesn’t store any data customers process on its API and additionally is held by security demands from its partner Microsoft when customers purchase the solution through Azure Marketplace.

In this blog we have focused on IVR-systems and touched on some topics that intertwine with IVR. The next blog we will dive into the topic of building customer databases using speech recognition.

Want to learn more about the application of speech-to-text in your customer service application? Please feel free to reach out to me when encountering any challenge in this regard. You can reach me at

Share this post

Share on twitter
Share on linkedin
Share on email