Article·Dec 15, 2025

Aura-2 (TTS) Now Speaks Dutch, French, German, Italian, and Japanese

Deepgram expands its high-precision text-to-speech API to bring natural, business-ready voice infrastructure to global markets.

5 min read
Headshot of Hasan Jilani

By Hasan Jilani

Director of Product Marketing

Last Updated

Text-to-speech has become a cornerstone of voice AI, powering everything from scheduling assistants to customer support agents and multilingual automation. Aura-2 delivers the natural and realistic speech users expect, but it is also engineered for the rigorous demands of business use cases. Beyond human-like intonation, it prioritizes clarity, precision, and responsiveness. These qualities are essential for real-time voice agents where accuracy and speed are just as critical as voice quality.

Today, we are excited to announce that Aura-2 now supports five additional languages:

🇳🇱 Dutch 🇫🇷 French 🇩🇪 German 🇮🇹 Italian 🇯🇵 Japanese

These join our existing English and Spanish models to provide a robust infrastructure for global voice applications.

This expansion helps developers deliver consistent, multilingual voice experiences via our API without sacrificing naturalness, pronunciation accuracy, or low-latency performance.

Why These Languages Matter for TTS

Each new Aura-2 language inherits unique phonological and prosodic challenges that make high-quality TTS difficult. However, getting these details right is critical for business applications where clarity prevents costly errors.

Here is why these languages are impactful additions:

Dutch Sample (nl)

00:00
00:00

Dutch

Sample: “Uw reserveringsnummer is 8 4 9 2 B.”

EN: Your reservation number is 8492B.

🇳🇱 Dutch: Vowel richness and compound-heavy words Dutch has long vowels, diphthongs, and ultra-long compound nouns. Natural TTS must handle stress placement and smooth glides between complex vowel forms while keeping numeric data clear.


French Sample (fr)

00:00
00:00

French

Sample: “Veuillez appeler le 01 23 45 67 89 pour confirmer.”

EN: Please call 01 23 45 67 89 to confirm.

🇫🇷 French: Liaison, elision, and continuous flow French uses fluid connected speech (liaisons) and dropped sounds (elisions). This requires a TTS engine to master subtle transitions without sounding choppy, especially when reading strings of numbers like contact information.


German Sample (de)

00:00
00:00

German

Sample: “Ihr Termin ist für den 24. Mai um 14:30 Uhr bestätigt.”

EN: Your appointment is confirmed for May 24th at 14:30.

🇩🇪 German: Precision with long compounds and consonant clusters

German’s long words, clustered consonants, and consistent stress rules demand a voice model that enunciates clearly without sounding robotic. This is vital for complex scheduling and time-based data.


Italian Sample (it)

00:00
00:00

Italian

Sample: “Il numero di riferimento è 1250.”

EN: “The reference number is 1250.”

🇮🇹 Italian: Open vowels and musical intonation Italian TTS must preserve melody and vowel openness without exaggeration. Aura-2 maintains the natural rhythm of Italian speech even when delivering transactional updates involving currency.


Japanese Sample (ja)

00:00
00:00

Japanese

Sample: “認証コード 7 3 9 1 を入力してください。”

EN: Please enter authentication code 7391.

🇯🇵 Japanese: Politeness markers, pitch accent, and mixed scripts Japanese blends kanji, kana, and loanwords while relying heavily on pitch accent. Aura-2 ensures smooth phrasing and consistent tone. This is also a strong example of structured-speech correctness, where the model must seamlessly switch between Japanese script and alphanumeric codes.


Unified Infrastructure for Global Scale

Aura-2 continues to evolve as a unified voice synthesis infrastructure for global products and workflows. Instead of applying a single prosodic pattern to every language, Aura-2 adapts to each language’s unique phonology, whether it involves pitch accents, liaisons, or compound stress rules.

For developers and enterprise teams, this means:

  • Consistent performance across diverse global markets via a single API.
  • High-precision pronunciation for structured data like IDs, currency, and times.
  • Sub-200ms latency that ensures fluid, real-time conversational flow.
  • High reliability under streaming loads with stable performance even during high-volume concurrency.
  • Simplified infrastructure that eliminates the need to stitch together different vendors for different languages.

Getting Started

Switching to any of the newly supported languages is simple. Update your API request with the appropriate language code.

Bash

curl https://api.deepgram.com/v1/speak \
  -H "Authorization: Token YOUR_DEEPGRAM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "aura-2",
        "language": "fr",
        "text": "Bonjour, comment puis-je vous aider ?"
      }'

New language codes:

nl, fr, de, it, ja

For a complete list of available voices and to hear audio samples for each region, refer to the TTS Voices and Languages documentation. You can also visit the Deepgram Playground to input your own text and test performance in real-time.

Looking Ahead

With five new languages now live, Aura-2 continues its progress toward full global coverage. Accuracy, adaptability, and real-time reliability continue to improve across language families and acoustic environments.

The goal is clear: voice AI that works everywhere, for everyone. We are building text-to-speech that sounds natural, responds instantly, and works globally, regardless of the complexity of the business use case.

Unlock Enterprise-Grade Voice AI Today

Sign up free and unlock $200 in credits, enough to generate over 13 million characters of synthesis. Explore details on our TTS Voices and Languages page and hear Aura-2’s natural, high-precision performance for yourself.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.