Now Available: Deepgram Aura’s Websocket Interface for Faster Text to Speech Input Streaming


tl;dr:
Deepgram Aura now offers a WebSocket interface to support fast input streaming.
Optimized for AI voice agents and real-time conversational AI applications, Aura WebSocket TTS is 3 times faster at generating speech than ElevenLabs Turbo 2.5.
Be among the first to build it into your product today! Sign up now to get started and receive $200 in credits absolutely free!
Announcing Deepgram Aura's WebSocket Text to Speech: Optimized for Real-Time Conversational AI
At Deepgram, we’ve heard feedback from our users developing conversational AI agents about the challenges of building an end-to-end solution with discrete modules for streaming speech-to-text (STT) input, large language model (LLM) processing, and text-to-speech (TTS) output. These voice-powered AI agents are used in customer service, sales calls, appointment booking, food ordering, and more through the phone, web, and other devices, and the pain points surfaced by our customers inspired us to develop our newly released WebSocket interface for the Deepgram Aura Text-to-Speech API that automatically solves for:
Minimizing latency: Waiting for the entire sentence to be generated before sending it to a batch API leads to frustrating delays. With our WebSocket TTS, you can send tokens from the LLM to the TTS as soon as they’re generated, reducing latency and creating a smoother conversation experience.
Taking inputs from any LLM: Forget about building additional logic to manage sentence chunking. Our WebSocket TTS eliminates this step, simplifying your workflow and enhancing efficiency. Our websocket takes any partial text or token.
Seamlessly handling interruptions: With real-time interruption handling, you can stop the TTS as soon as a human interrupts. This ensures your conversational bot can immediately process new input and generate a relevant response without missing a beat.
Scaling simultaneous conversations: Handling multiple conversations simultaneously? Our WebSocket TTS supports 40+ concurrent websocket connections, meaning you can scale without worrying about hitting concurrency limits for individual TTS requests.
“Latency is crucial in our experience of building real-time video-based virtual assistants for online e-commerce. We reached out to Deepgram looking for a solution that can allow us to send text token by token from any LLM to Deepgram’s TTS. The biggest latency blocker we’re seeing building video voice-assistant agents is not from video rendering, but from the LLM. Using streaming as much as possible helps lower our latency and we’ve been satisfied with the latency of Deepgram’s websocket API.”
– Edwin Chiu, Chief Architect at FireworkHQ
Our new Websocket API has the following advantages compared to your custom solution:
Speed: On average, save 70% in LLM to TTS latency with token-by-token transmission, ensuring your conversational agents are more responsive than ever.
Naturalness: Enjoy consistent, low-latency, and natural-sounding voice outputs without the hassle of managing tokens.
Flexibility: Whether it’s handling interruptions or scaling conversations, our WebSocket TTS adapts to your needs, supporting all voices in Aura.
Simplicity: Easy integration with a straightforward setup process.
“The websocket is a great feature. It takes me 15 mins to set up my conversational bot with the websocket TTS to get it up and running. Without the TTS websocket, I would have to batch the tokens from the LLM myself. Using Deepgram Websocket TTS gives me faster results than OpenAI’s TTS. With the Websocket TTS, it’s faster and more convenient than ever.”
– Brandon Wheat, CEO of B.J. Wheat Co., creator of BarBot
Our WebSocket interface saves users 70% or more time by allowing immediate token transmission from the LLM, compared to using our REST API without sentence chunking for conversations averaging 50-150 characters. Most of the time saved comes from eliminating the need to wait for the LLM to fully generate text before sending it to the TTS API. The longer the text, the greater the time savings. Fig. 1 plots benchmark results of the time savings for different conversation lengths when using our new websocket interface with GPT-4o compared to our original REST API without sentence chunking:


Fig. 1: Deepgram Aura's WebSocket interface provides faster response times compared to sentence chunking with the REST API and yields increasing time savings as the length of the text response from the LLM grows.


Fig. 1: Deepgram Aura's WebSocket interface provides faster response times compared to sentence chunking with the REST API and yields increasing time savings as the length of the text response from the LLM grows.
We also benchmarked the performance of Aura’s WebSocket interface against ElevenLabs' (Turbo v2.5) websocket using GPT-4o to provide the text input. As seen in Fig. 2, Deepgram’s WebSocket interface provides spoken output 3 times faster than ElevenLabs Turbo.


Fig. 2: Deepgram Aura's WebSocket API is 3 times faster than ElevenLabs' websocket interface.


Fig. 2: Deepgram Aura's WebSocket API is 3 times faster than ElevenLabs' websocket interface.
“Deepgram's text-to-speech websocket has improved our real-time voice agents' conversation experience significantly. Testing Aura websocket was exciting; ability to stream LLM output directly to TTS resulting in lower latency and natural sounding voices are perfect for our AI agents. The many voice options, competitive pricing, and excellent support from Deepgram’s team have made the journey of building our voice agents much easier. Deepgram continues to be a crucial partner for us in building top-notch AI solutions for our customers.”
– Tarun Rathore, Co-founder Hooman Labs
Getting started
Ready to build with Deepgram Aura’s WebSocket TTS? Dive into our Getting Started Guide and see how easy it is to revolutionize your conversational AI.
Plus, check out the documentation and sample code of our Twilio Example with STT + TTS Streaming WS to create your own end-to-end conversational demo using Deepgram and Twilio.
If you have any feedback about this post, or anything else regarding Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions, Discord community, or contact us to talk to one of our product experts for more information today.
Unlock language AI at scale with an API call.
Get conversational intelligence with transcription and understanding on the world's best speech AI platform.