Text-to-Speech API for Voice Agents

Aura-2 delivers sub-200ms streaming text-to-speech built for voice agents with domain-specific accuracy and secure, scalable deployment across cloud and on-prem environments.

70 / 1,000
ThaliaFeminine, English (US) 🇺🇸
OdysseusMasculine, English (US) 🇺🇸
HarmoniaFeminine, English (US) 🇺🇸
TheiaFeminine, English (AU) 🇦🇺
ElectraFemale, English (US) 🇺🇸
ArcasMasculine, English (US) 🇺🇸
AmaltheaFeminine, English (PH) 🇵🇭
HelenaFeminine, English (US) 🇺🇸
HyperionMasculine, English (AU) 🇦🇺
ApolloMasculine, English (US) 🇺🇸
LunaFeminine, English (US) 🇺🇸

Aura-2 Text-to-Speech features

Aura-2 is engineered for the demands of real-time voice agents: low latency, cost-effective scale across thousands of concurrent sessions, and the reliability production workloads require.

icon

Domain-tuned pronunciation

Ensures accurate pronunciation for industry-specific terminology in healthcare, finance, legal, and beyond.

Learn More

icon

Authentic, Natural Voices

Features 40+ English voices with localized accents, delivering natural, business-appropriate speech for professional settings.

Learn More

icon

Context-aware delivery

Adjusts pacing, tone, and expression to ensure smooth, coherent communication in any context.

Learn More

icon

Real-time performance

Delivers sub-200ms latency for ultra-responsive interactions, while efficiently handling thousands of concurrent requests.

Learn More

icon

Cost-effectiveness at scale

Achieves enterprise-grade speech at $0.030 per 1,000 characters—no hidden fees, with volume discounts for large deployments.

Learn More

icon

Flexible deployment options

Supports public, private cloud, and on-premises deployments, ensuring compliance and security.

Learn More

Enterprise-ready AI voices

Voice agents don't need cinematic range. They need clarity, consistency, and low listener fatigue across thousands of turns. Aura-2's 40+ voices are tuned for professional conversations in support, sales, healthcare, and finance, with consistent pacing and enunciation that builds trust on every call.

Scalable infrastructure for Text-to-Speech

Powered by the Deepgram Enterprise Runtime, Aura-2 delivers real-time text-to-speech using the same infrastructure that powers our trusted speech-to-text and speech-to-speech capabilities, providing builders with the control, adaptability, and performance needed to deploy and scale production-grade voice AI.

TTS | Aura-2 | DER

Speech-to-Text leadership enhances Text-to-Speech

When STT and TTS run on the same streaming infrastructure, the entire speech loop gets faster. Fewer handoffs, lower latency, and consistent pronunciation across what the agent hears and what it says. Deepgram's unified architecture means improvements in speech recognition directly sharpen text-to-speech accuracy.

Aura - 2 Thumbnail

Start building with Aura-2 today

Real-time, streaming-first text-to-speech that's ready for production voice agents. From first prototype to thousands of concurrent calls.