Deploy a Serverless Transcription Workflow with AWS Lambda + Deepgram STT

Why Serverless STT on AWS (Deepgram + Lambda)?
1️⃣ Event-Driven by Design
2️⃣ Predictable, Minimal Cost
3️⃣ Built-in Resilience and Burst Control
4️⃣ Zero-Ops Scaling
5️⃣ Single-Purpose Functions = Clean Code
6️⃣ Deepgram Developer Experience
Scenarios and Cost Assumptions for Serverless STT on AWS (Deepgram + Lambda)
Case A: Async Workflow (recommended for 5‑minute files)
Case B: Sync Workflow (okay for short clips)
Architecture Overview
What happens, step by step
Why this Architecture?
Get Prerequisites (S3 Bucket, API Keys, Tooling)
Account Setup and Tools
Deepgram Credentials
Stage 1: Set Up S3 Bucket with Source Audio and Transcript Prefixes/Folders
From the Console
Or Use Your CLI
Stage 2: Create and Use Amazon SQS as an S3 Buffer and Add DLQ
Step 1: Create two queues
Step 2: Allow S3 to send messages to the main queue
Step 3: Point S3 Event Notifications to the SQS queue
CLI Option
Stage 3: Set Up AWS Lambda for Serverless Transcription
Step 1: Log in and Navigate to Lambda
Step 2: Create a New Function
Step 3: Configure Function Settings
Step 4: Add Environment Variables (Lambda)
Step 5: Connect the main SQS queue to your Lambda Function (event source mapping)
Stage 4: Add Required Permissions to IAM Execution role for Lambda
💡 Common issues to watchout for:
Stage 5: Add Handler Code to Lambda Function
Stage 7: Monitor and Alert (5 minutes)
What to watch
Alarms (typical thresholds)
Useful Logs Insights query (p50/p95 over time)
Troubleshooting Tips for the Serverless Transcription App
Conclusion: Serverless Audio Transcription with AWS Lambda and Deepgram STT
Where to go next

Share this guide

By Stephen Oladele

Last Updated

Sep 3, 2025

Audio transcription powers modern products (e.g., podcast platforms, customer support analytics, accessibility features, knowledge search) and usage is rarely steady. Some hours are quiet; others spike with uploads. Traditional, server‑based pipelines force you to provision for peaks, maintain machines, and pay for idle time.

Enter serverless. Pairing AWS Lambda with Deepgram’s speech‑to‑text (STT) API, you get a push‑button, event‑driven workflow that scales to meet bursts and drops to zero when idle. Instead of running a fleet, you wire an S3 upload to trigger a Lambda function; that function calls Deepgram for accurate, low‑latency transcription and saves the results right back to S3.

In this guide, you’ll deploy a production‑ready foundation: when audio lands in S3, Lambda sends a secure presigned URL to Deepgram’s /v1/listen endpoint and writes both the raw JSON response and the cleaned transcript text to a transcripts/ prefix.

Along the way, you’ll see how to keep costs predictable, add retries and dead‑letter queues for resilience, and extend the pipeline for search or analytics (all without maintaining servers or paying for idle compute time).

What you’ll build: A production‑ready pattern: S3 (incoming audio) → S3 Event → Lambda → Deepgram REST → S3 (transcripts), plus tips for costs, observability, and hardening.

Who it’s for: Platform engineers and developers who want to build a hands‑off, speech‑to‑text serverless transcription app on AWS.

>> 💻 Here’s the code for this technical guide in this repository.

Why Serverless STT on AWS (Deepgram + Lambda)?

Modern audio pipelines must transcribe at burst scale and sleep at idle. Yet most teams don’t want to babysit servers, autoscaling groups, or Kubernetes clusters just to move bytes from A to B.

Here are the key advantages in practice:

1️⃣ Event-Driven by Design

S3 Event Notifications fire the moment a file lands (no polling loops or cron jobs).
Each invocation handles one object, so concurrency naturally matches workload.

2️⃣ Predictable, Minimal Cost

AWS free tier: 1 M Lambda requests + 400k GB-s monthly.
Typical 5-min MP3 (≈5 MB) ≈ $0.022.

3️⃣ Built-in Resilience and Burst Control

Optional SQS buffer smooths sudden floods; a DLQ captures hard failures for replay.
Automatic retries on Lambda errors; you can add exponential back-off for Deepgram 429/5xx responses.

4️⃣ Zero-Ops Scaling

Lambdas launch in < 100 ms warm start; cold starts are minor for I/O-bound jobs.
No autoscaling rules or idle EC2 instances to watch.

5️⃣ Single-Purpose Functions = Clean Code

One Lambda = one responsibility: fetch audio, call Deepgram, persist transcript.
Easy to swap languages (Python, Node.js, Go) or hand off to Step Functions if you bolt on post-processing.

6️⃣ Deepgram Developer Experience

/v1/listen REST accepts URLs or byte streams, returns JSON you can drop into S3.
Choose models (e.g., nova-2, nova-3), languages, smart formatting, summarisation (all via query params).
Generous (200 USD) trial credits let you test thousands of minutes for free.

Scenarios and Cost Assumptions for Serverless STT on AWS (Deepgram + Lambda)

Pricing References:

Lambda duration: ~$0.0000166667 per GB‑second; requests ~$0.20 per 1M. Free tier: 1M requests + 400k GB‑s/month. (Source: AWS)
Deepgram (Nova‑3, pre‑recorded): ~$0.0043/minute (varies by plan/volume). (Source: Deepgram)
S3 egress (to internet): starts ~$0.09/GB in us‑east‑1; S3 request costs are tiny (PUT ~$0.005/1k, GET ~$0.0004/1k). (Source: AWS)

📝 Important: For clips ≥ ~1–2 minutes, prefer async Deepgram + webhook so Lambda runs for hundreds of ms (submit job) rather than seconds/minutes (wait for transcript).

Case A: Async Workflow (recommended for 5‑minute files)

Assumptions: Lambda memory 1024 MB, runtime 600 ms to create a presigned URL + submit async job; Deepgram processes in the background and posts results to your webhook (or you poll).
Lambda compute: 1.0 GB × 0.6 s × $0.0000166667 ≈ $0.000010 per file (plus $0.0000002 request).
Deepgram: 5.0 min × $0.0043/min ≈ $0.0215 per file.
S3 egress (example): 5‑min MP3 @ 128 kbps ≈ ~4.8 MB → 0.0048 GB × $0.09/GB ≈ $0.00043.

👉 Estimated total per 5‑minute file: ≈ $0.022

Case B: Sync Workflow (okay for short clips)

Assumptions: 30‑second clip; Lambda memory 1536 MB; Lambda waits for synchronous /v1/listen to return—~2 s Lambda time end‑to‑end.
Lambda compute: 1.5 GB × 2.0 s × $0.0000166667 ≈ $0.000050 per file (plus $0.0000002 request).
Deepgram: 0.5 min × $0.0043/min ≈ $0.00215 per file.
S3 egress (example): 30‑sec MP3 @ 128 kbps ≈ 0.5 MB → 0.0005 GB × $0.09/GB ≈ $0.000045.

Estimated total per 30‑sec file: ≈ $0.00225.

👉 Takeaway: In both workflows, Deepgram usage is the slightly dominant cost; Lambda duration and S3 request charges are negligible at this scale. Data egress is tiny for compressed audio but non‑zero.

Deepgram minutes are the primary cost driver. For multi‑minute files, use async + webhook so Lambda remains sub‑second.

Architecture Overview

When a client uploads an audio file is uploaded to Amazon S3 (for example, under audio-incoming/), S3 emits an ObjectCreated event. That event invokes an AWS Lambda function, which generates a presigned S3 URL for the object and calls Deepgram’s /v1/listen REST API with that URL.

Deepgram transcribes the audio; the function then writes both the raw JSON response and a clean text transcript to a transcripts/ prefix in the same bucket.

Serverless Transcription on AWS (S3 → Lambda → Deepgram Nova-3 STT → S3)

Why presigned URLs? The audio stays in your private bucket; Deepgram fetches it securely via a time-limited URL. That keeps Lambda fast and memory-light and avoids base64 overhead.

What happens, step by step

Step	Action	Why it matters
1. Upload	An audio file lands in audio-incoming/ in Amazon S3 (e.g., uploads/my_audio.mp3).	S3 gives you 11 × 9s durability, versioning, and infinite scale.
2. Event trigger	The ObjectCreated event fires. You configure S3 to push that event either directly to Lambda or through SQS if you need burst smoothing.	Purely event-driven—nothing polls for work.
3. Lambda orchestration	Lambda receives the object key, generates a presigned-S3 URL (good for a few minutes), and calls Deepgram’s /v1/listen REST endpoint with JSON { "url": "<presigned-url>", … }.	- Keeps the bucket private (no public URLs) - Avoids loading the audio into Lambda memory - Presigned URLs cost $0.00.
4. Deepgram STT	Deepgram streams the object, transcribes it (sync for short clips, async + webhook for multi-minute audio), and returns JSON results.	Off-loads GPU/ASR complexity; you pay only per audio minute.
5. Persist outputs	Lambda writes two artifacts back to S3: transcripts/<basename>.json — full Deepgram response transcripts/<basename>.txt — best-alt plain text	Makes downstream search / analytics simple; idempotency check = “if transcript exists, skip.”
6. (Optional) Error path	On unhandled exceptions or > x retries, the S3 event is sent to a dead-letter SQS queue for later replay.	Gives ops a clear backlog instead of silent failure.

Why this Architecture?

Event-driven + pay-per-use: scales to spikes, drops to zero at idle.
Low latency and cost: no double-handling of bytes in Lambda; Deepgram fetches directly.
Operationally simple: S3 stores, Lambda orchestrates, Deepgram transcribes; SQS/DLQ make bursts and failures easy to manage.

>> 💻 Here’s the code for this technical guide in this repository.

Get Prerequisites (S3 Bucket, API Keys, Tooling)

Before wiring events and code, make sure you have the following in place.

Account Setup and Tools

Tool	Why you need it	Quick-start link/note
AWS account with permission to create S3, Lambda, IAM, (optional) SQS	You’ll provision the entire pipeline in your own account.	Sign up or log in at https://aws.amazon.com/
S3 bucket (or two)	- Stores incoming audio and finished transcripts. - Create a bucket in the Region you’ll run Lambda. - Enable block public access. - Turn on server-side encryption (SSE-S3 or SSE-KMS).	AWS Console → S3 → Create bucket
Deepgram account and API key	Authenticates requests to /v1/listen. New accounts get $200 free credit.	https://console.deepgram.com/signup → API Keys → Create Key
IAM role for Lambda	Grants least-privilege access to S3 (and SQS if used).	See sample policy below
Local dev tooling (optional)	- AWS CLI or AWS SAM/Terraform if you want IaC. - curl or Postman to test webhooks. - A short audio clip (.mp3, .wav, or .m4a) to verify the flow.	AWS CLI install: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
Runtime choice	Guide explains Python 3.10, but the pattern is identical in Node.js, Go, etc.	Pick the language your team supports.

💡Tips:

Throughout this guide, we use us-east-1; ensure you use a consistent regon for all your resources
Keep Lambda outside a private VPC unless you have a NAT gateway. The function must reach Deepgram over the public internet.

Deepgram Credentials

Create an API key and keep it server-side only. We’ll pass it to Lambda via:

Simple: Encrypted environment variable (good for demos)
Best: AWS Secrets Manager (Lambda reads it at runtime)

Stage 1: Set Up S3 Bucket with Source Audio and Transcript Prefixes/Folders

From the Console

Create an S3 bucket. You can use one bucket (e.g., serverless-audio-transcription) with separate prefixes or two buckets—your choice.

Input prefix: audio-incoming/ (where you upload audio)
Output prefix: transcripts/ (where the function writes results)

Here’s the naming convection this guide uses:

s3://serverless-audio-transcription/
└── audio-incoming/       # <-- S3 Event Notification source
└── transcripts/          # <-- Lambda writes JSON + TXT here

📝 Notes:

You don’t require bucket policy for presigned URLs. Do not add Deny rules that restrict GetObject by VPC endpoint or IP, or Deepgram won’t be able to fetch via the presigned link.
We’ll filter S3 events to only trigger on audio types and the audio-incoming/ prefix so transcript writes don’t re-trigger Lambda.

Or Use Your CLI

# set your vars
BUCKET=serverless-audio-transcription
REGION=us-east-1   # change as needed; just enure you use a consistent region throughout

# 1) create the bucket
if [ "$REGION" = "us-east-1" ]; then
  aws s3api create-bucket --bucket "$BUCKET"
else
  aws s3api create-bucket --bucket "$BUCKET" \
    --create-bucket-configuration LocationConstraint="$REGION" \
    --region "$REGION"
fi

# (optional but recommended) lock it down a bit
aws s3api put-public-access-block --bucket "$BUCKET" \
  --public-access-block-configuration BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
aws s3api put-bucket-encryption --bucket "$BUCKET" \
  --server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'
aws s3api put-bucket-versioning --bucket "$BUCKET" \
  --versioning-configuration Status=Enabled

# 2) create the “folders” (prefixes)
aws s3api put-object --bucket "$BUCKET" --key "audio-incoming/"
aws s3api put-object --bucket "$BUCKET" --key "transcripts/"

>> 💻 Here’s the code for this technical guide in this repository.

Stage 2: Create and Use Amazon SQS as an S3 Buffer and Add DLQ

Step 1: Create two queues

Head over to Amazon SQS to create two queues:

Dead-letter queue audio-dlq (Standard is fine). You’ll add the DLQ to the Lambda func for failures.

Main queue: audio-incoming-queue (Standard queue for max throughput)

Visibility timeout: set > your Lambda timeout (e.g., Lambda 60s ⇒ visibility 120–180s)
Redrive policy (DLQ): send to audio-dlq with MaxReceiveCount (start with 5)

📝 Note: Queue visibility timeout > your Lambda timeout because if your function runs near its timeout or you add retries/backoff inside, SQS needs enough time to keep the message hidden from other pollers. Too short and the same message can get delivered to another Lambda while the first is still working.

Step 2: Allow S3 to send messages to the main queue

S3 can only publish to SQS if the queue access policy allows it. In the SQS console → your main queue → Access policy → add a statement like:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowS3SendMessage",
      "Effect": "Allow",
      "Principal": { "Service": "s3.amazonaws.com" },
      "Action": "sqs:SendMessage",
      "Resource": "arn:aws:sqs:REGION:ACCOUNT_ID:audio-incoming-queue",
      "Condition": {
        "ArnEquals": { "aws:SourceArn": "arn:aws:s3:::YOUR_BUCKET" }
      }
    }
  ]
}

> Replace REGION, ACCOUNT_ID, YOUR_BUCKET with the actual values.

Step 3: Point S3 Event Notifications to the SQS queue

S3 bucket → Properties → Event notifications → Create event
Event types: All object create events (s3:ObjectCreated) (or PUT, CompleteMultipartUpload).
Prefix: audio-incoming/
Suffix: .mp3, .wav, .m4a (you can make one per suffix or one that’s broad)
Destination: SQS queue → select audio-incoming-queue

This step sends “S3 event JSON” into your SQS queue as each audio file lands.

CLI Option

1) Create queues

aws sqs create-queue --queue-name audio-dlq

aws sqs create-queue --queue-name audio-incoming-queue \
  --attributes RedrivePolicy='{"deadLetterTargetArn":"arn:aws:sqs:REGION:ACCOUNT_ID:audio-dlq","maxReceiveCount":"5"}' \
               VisibilityTimeout=180

2) Allow S3 to send to SQS

MAIN_Q_URL=$(aws sqs get-queue-url --queue-name audio-incoming-queue --query 'QueueUrl' -o text)

cat > /tmp/sqs-access.json <<'JSON'
{
  "Version":"2012-10-17",
  "Statement":[
    {
      "Sid":"AllowS3SendMessage",
      "Effect":"Allow",
      "Principal":{"Service":"s3.amazonaws.com"},
      "Action":"sqs:SendMessage",
      "Resource":"arn:aws:sqs:REGION:ACCOUNT_ID:audio-incoming-queue",
      "Condition":{"ArnEquals":{"aws:SourceArn":"arn:aws:s3:::YOUR_BUCKET"}}
    }
  ]
}
JSON
aws sqs set-queue-attributes --queue-url "$MAIN_Q_URL" --attributes Policy="file:///tmp/sqs-access.json"

3) Wire S3 → SQS

cat > /tmp/s3-to-sqs.json <<'JSON'
{
  "QueueConfigurations": [
    {
      "Id": "audio-incoming",
      "QueueArn": "arn:aws:sqs:REGION:ACCOUNT_ID:audio-incoming-queue",
      "Events": ["s3:ObjectCreated:*"],
      "Filter": { "Key": { "FilterRules": [
        { "Name": "prefix", "Value": "audio-incoming/" }
      ]}}
    }
  ]
}
JSON

aws s3api put-bucket-notification-configuration \
  --bucket YOUR_BUCKET \
  --notification-configuration file:///tmp/s3-to-sqs.json

Stage 3: Set Up AWS Lambda for Serverless Transcription

Create the AWS Lambda function that orchestrates audio transcription requests to Deepgram.

Step 1: Log in and Navigate to Lambda

Log in to your AWS Console.
Search for "Lambda" in the top search bar and open the Lambda service page.

Step 2: Create a New Function

Click "Create function."
Choose "Author from scratch."
Set the following parameters:
Function Name: audio-transcriber
Runtime: Choose Python 3.10 (or later)
Architecture: Choose arm64 (recommended for ~30% lower cost) or x86_64.
Under Execution Role, select "Create a new role with basic Lambda permissions." (You’ll attach S3/IAM bits next)
Click "Create function."

Step 3: Configure Function Settings

Once created, adjust these important settings under the Configuration tab:

Setting	Recommended Value	Why?
Memory	1024–1536 MB	Ensures smooth performance
Timeout	60 seconds	Allows sufficient time to call Deepgram asynchronously
Ephemeral Storage	(default) 512 MB	Sufficient for temporary runtime storage
Reserved Concurrency	Optional (leave blank initially)	Limit concurrency to control costs during spikes

Step 4: Add Environment Variables (Lambda)

Under Configuration → Environment variables, add:

Name	Required	Example	Purpose
INPUT_PREFIX	Yes	audio-incoming/	S3 folder for source audio uploads
TRANSCRIPTS_PREFIX	Yes	transcripts/	S3 folder to save output
DEEPGRAM_API_KEY	Yes	(from Secrets/Env)	Auth for Deepgram REST API
DG_MODEL	Optional	nova-3	Deepgram model to use
DG_LANGUAGE	Optional	en	Language code (if needed)
DG_SMART_FORMAT	Optional	true	Improved readability of transcriptions (punctuation, casing, numbers, etc.)

📝 Note: We’ll set sensible defaults in thee Lambda code if these aren’t provided.

Step 5: Connect the main SQS queue to your Lambda Function (event source mapping)

Lambda → Add trigger → SQS → choose audio-incoming-queue
Batch size: start with 5
Batch window: 0–1s
Maximum concurrency (per trigger): set a cap (e.g., 10) to throttle burst spend
Visibility timeout (on the queue) should be greater than Lambda timeout × expected retries. Ensure it’s 2–3× Lambda timeout.

That’s it. Now S3 drops notifications in SQS; Lambda polls SQS at your chosen rate, and failures get retried up to MaxReceiveCount then land in audio-dlq for inspection.

Or use the CLI to connect the main SQS queue to your Lambda func:

aws lambda create-event-source-mapping \
  --function-name ADD_YOUR_FUNCTION_NAME \
  --event-source-arn arn:aws:sqs:REGION:ACCOUNT_ID:audio-incoming-queue \
  --batch-size 5 \
  --maximum-batching-window-in-seconds 1 \
  --maximum-concurrency 10

Stage 4: Add Required Permissions to IAM Execution role for Lambda

Step 1: Get your inline policies ready

Below are the following inline policies you’ll attach to the Lambda execution role with least privilege for this guide:

CloudWatch Logs (basic Lambda logging):

{
  "Version": "2012-10-17",
  "Statement": [
    { "Effect": "Allow", "Action": ["logs:CreateLogGroup"], "Resource": "arn:aws:logs:*:*:*" },
    { "Effect": "Allow", "Action": ["logs:CreateLogStream","logs:PutLogEvents"], "Resource": "arn:aws:logs:*:*:log-group:/aws/lambda/*" }
  ]
}

S3 input (read audio for presigned auth and optional bytes path):

{
  "Version": "2012-10-17",
  "Statement": [
    { "Sid": "ReadInputAudio",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:GetObjectTagging"],
      "Resource": "arn:aws:s3:::YOUR_BUCKET/audio-incoming/*"
    }
  ]
}

S3 outputs (read for idempotency + write transcripts):

{
  "Version": "2012-10-17",
  "Statement": [
    { "Sid": "ReadWriteTranscripts",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:GetObjectTagging", "s3:PutObject","s3:PutObjectTagging"],
      "Resource": "arn:aws:s3:::YOUR_BUCKET/transcripts/*"
    }
  ]
}

SQS (buffer or DLQ)

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sqs:ReceiveMessage",
                "sqs:DeleteMessage",
                "sqs:GetQueueAttributes",
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:sqs:REGION:ACCOUNT_ID:YOUR_QUEUE_NAME"
        }
    ]
}

👉 Shortcut: you can also attach AWS’s managed policy AWSLambdaSQSQueueExecutionRole instead of writing this inline.

(Optional) Secrets Manager
If you store the Deepgram API key in Secrets Manager:

{
  "Version": "2012-10-17",
  "Statement": [
    { "Effect": "Allow",
      "Action": ["secretsmanager:GetSecretValue"],
      "Resource": "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:YOUR_SECRET_NAME*"
    }
  ]
}

Step 2: Go to the AWS Console → Lambda → open your function.

Step 2: Configuration → Permissions → under Execution role, click the blue role link (e.g., audio-transcriber-role-abc123).

Step 3: You’re now on the IAM Role page. Click Add permissions → Create inline policy.

Step 4: Click the JSON tab.

Step 5: Paste the policy JSON you need (e.g., the “CloudWatch Logs” block, or the “S3 access” block) and Save.

If you already have an inline policy with a "Statement": [ ... ] array, you can add another statement to that array instead of creating a separate policy.

Step 6: Wait ~30–60 seconds for IAM to propagate. Re-test your Lambda.

💡 Common issues to watchout for:

Wrong role: Always click the role link from the Lambda page to edit the correct one.
Wrong ARN: S3 ARNs are arn:aws:s3:::BUCKET/PREFIX/* (no region in S3 ARNs).
Policy shape: JSON must have a single "Version" and a "Statement" array; don’t paste multiple top-level objects.
Inline policy limit: Keep each inline policy under the IAM size limits; create multiple policies if needed.
Bucket policy Deny beats Allow: If there’s an S3 bucket policy with a Deny, it will override your role’s Allow (not needed for this guide; avoid Deny rules that restrict GetObject on input).
Propagation delay: IAM changes can take ~30–60 seconds to take effect. Refresh and retry.

Stage 5: Add Handler Code to Lambda Function

This handler code (see repo):

Accepts S3 events directly or SQS→S3 wrapped events.
Generates a presigned URL and calls Deepgram /v1/listen.
Writes JSON and TXT to transcripts/.
Is idempotent (checks existing outputs).
Includes useful logs and retry on 429/5xx.

Paste the code in the Lambda Code editor and click Deploy.

"""
Lambda: transcribe S3 audio via Deepgram
Runtime: Python 3.12
Memory: 1024–1536 MB
Timeout: 60 s (actual runtime ~0.6 s for async submit)
"""

import os, json, time, logging
import boto3, botocore
import requests
from urllib.parse import unquote_plus

# ---------- Logging ----------
log = logging.getLogger()
log.setLevel(logging.INFO)

# ---------- AWS clients ----------
s3 = boto3.client('s3')
secrets = boto3.client('secretsmanager')

# ---------- Config (via env vars) ----------
DG_URL = "https://api.deepgram.com/v1/listen"
TRANSCRIPTS_PREFIX = os.environ.get("TRANSCRIPTS_PREFIX", "transcripts/")
INPUT_PREFIX = os.environ.get("INPUT_PREFIX", "audio-incoming/")  # changeable without code edits
DG_MODEL = os.environ.get("DG_MODEL")               # e.g., "nova-3"
DG_LANGUAGE = os.environ.get("DG_LANGUAGE")         # e.g., "en"
DG_SMART_FORMAT = os.environ.get("DG_SMART_FORMAT", "true").lower() in ("1", "true", "yes")

# Optional: bypass HEAD check while fixing IAM/bucket policy
SKIP_HEAD = os.environ.get("SKIP_HEAD_CHECK", "false").lower() in ("1", "true", "yes")

# ---------- Helpers ----------
def _get_api_key():
    """
    Resolve Deepgram API key from Secrets Manager (preferred) or env var.
    Supports a secret that is either a raw string or a small JSON doc with a key.
    """
    secret_name = os.environ.get("DEEPGRAM_SECRET_NAME")
    if secret_name:
        resp = secrets.get_secret_value(SecretId=secret_name)
        val = resp.get("SecretString") or resp.get("SecretBinary")
        try:
            j = json.loads(val)
            return j.get("DEEPGRAM_API_KEY") or j.get("deepgram_api_key")
        except Exception:
            return val
    key = os.environ.get("DEEPGRAM_API_KEY")
    if not key:
        raise RuntimeError("Deepgram API key not configured. Set DEEPGRAM_API_KEY or DEEPGRAM_SECRET_NAME.")
    return key

def _presigned_url(bucket, key, expires=300):
    """Create a short-lived URL so Deepgram can fetch the private S3 object directly."""
    return s3.generate_presigned_url(
        "get_object",
        Params={"Bucket": bucket, "Key": key},
        ExpiresIn=expires
    )

def _transcript_keys(input_key):
    """
    Derive output object keys from the input key.
    hello.mp3 -> transcripts/hello.json + transcripts/hello.txt
    (Keeps it flat; swap for a date-partitioned prefix if you prefer.)
    """
    base = input_key.rsplit("/", 1)[-1].rsplit(".", 1)[0]
    return (f"{TRANSCRIPTS_PREFIX}{base}.json",
            f"{TRANSCRIPTS_PREFIX}{base}.txt")

def _exists(bucket, key):
    """Check if an S3 object exists (for idempotency)."""
    try:
        s3.head_object(Bucket=bucket, Key=key)
        return True
    except botocore.exceptions.ClientError as e:
        code = e.response.get("Error", {}).get("Code")
        if code in ("404", "NoSuchKey"):
            return False
        if code in ("403", "AccessDenied"):
            # Treat as not present so we attempt to write; PutObject will fail if still denied.
            log.warning(f"HEAD access denied for s3://{bucket}/{key}; assuming not exists")
            return False
        raise

def _post_with_retries(url, headers=None, json_body=None, max_retries=3, backoff=0.5):
    """
    Call Deepgram with basic exponential backoff on 429/5xx.
    Keep Lambda under a small timeout; for multi-minute audio use Deepgram async+webhook.
    """
    last_err = None
    for i in range(max_retries + 1):
        try:
            resp = requests.post(url, headers=headers, json=json_body, timeout=30)
            # Retry on throttling or server errors
            if resp.status_code >= 500 or resp.status_code == 429:
                raise requests.RequestException(f"Retryable status {resp.status_code}: {resp.text[:256]}")
            if resp.status_code >= 400:
              # Log a small slice so we can see Deepgram's diagnostic
              log.error(f"DG {resp.status_code} body={resp.text[:500]}")
            resp.raise_for_status()
            return resp
        except requests.RequestException as e:
            last_err = e
            if i == max_retries:
                break
            time.sleep(backoff * (2 ** i))
    raise last_err or RuntimeError("Retries exhausted")

def _iter_s3_records(event):
    """
    Yield (bucket, key) tuples regardless of trigger type:
    - Direct S3 -> Lambda: event['Records'][*]['s3']...
    - SQS -> Lambda: event['Records'][*]['body'] contains the S3 event JSON
    - (Defensive) If SNS wrapped inside SQS, unwrap the 'Message' JSON too.
    """
    records = event.get("Records", [])
    for rec in records:
        # Case 1: Direct S3 event
        if "s3" in rec:
            s3rec = rec["s3"]
            yield (s3rec["bucket"]["name"], s3rec["object"]["key"])
            continue

        # Case 2: SQS message containing S3 event JSON
        body = rec.get("body")
        if body:
            try:
                inner = json.loads(body)
            except json.JSONDecodeError:
                log.error("SQS body was not valid JSON; skipping")
                continue

            # Some pipelines wrap the S3 event JSON as an SNS 'Message'
            if isinstance(inner, dict) and "Message" in inner:
                try:
                    inner = json.loads(inner["Message"])
                except Exception:
                    log.error("SNS Message not valid JSON; skipping")
                    continue

            for inner_rec in inner.get("Records", []):
                if "s3" in inner_rec:
                    s3rec = inner_rec["s3"]
                    yield (s3rec["bucket"]["name"], s3rec["object"]["key"])
            continue

        log.warning("Record had neither 's3' nor 'body'; skipping")

# ---------- Handler ----------
def lambda_handler(event, context):
    api_key = _get_api_key()

    # Helpful startup log (don’t log secrets/presigned URLs)
    log.info(json.dumps({
        "stage": "start",
        "has_records": "Records" in event,
        "records_count": len(event.get("Records", [])) if isinstance(event.get("Records", []), list) else None
    }))

    found_any = False

    for bucket, key in _iter_s3_records(event):
        found_any = True
        key = unquote_plus(key)

        # Defense-in-depth: only process objects under the configured input prefix
        if not key.startswith(INPUT_PREFIX):
            log.info(f"Skipping non-input key: {key}")
            continue

        json_key, txt_key = _transcript_keys(key)

        # Idempotency: skip if transcript already exists (unless SKIP_HEAD_CHECK=true)
        if not SKIP_HEAD and _exists(bucket, json_key):
            log.info(f"Transcript exists, skipping: s3://{bucket}/{json_key}")
            continue

        # Generate presigned URL and call Deepgram
        url = _presigned_url(bucket, key)
        headers = {"Authorization": f"Token {api_key}",
                   "Content-Type": "application/json"}
        payload = {"url": url}
        if DG_MODEL:
            payload["model"] = DG_MODEL
        if DG_LANGUAGE:
            payload["language"] = DG_LANGUAGE
        if DG_SMART_FORMAT:
            payload["smart_format"] = True

        t0 = time.time()
        try:
            resp = _post_with_retries(DG_URL, headers=headers, json_body=payload)
            dg = resp.json()
        except Exception as e:
            # If we got a response, log a small slice for debugging (safe)
            if 'resp' in locals():
                log.error(f"Deepgram error status={resp.status_code} body={resp.text[:300]}")
            log.error(f"Deepgram request failed for key={key}: {e}")
            raise

        # Extract best transcript text defensively
        alt = (dg.get("results", {})
                 .get("channels", [{}])[0]
                 .get("alternatives", [{}])[0])
        transcript_text = alt.get("transcript", "")

        # Persist outputs (same bucket). Add SSE-KMS if you require it.
        s3.put_object(Bucket=bucket, Key=json_key,
                      Body=json.dumps(dg).encode("utf-8"),
                      ContentType="application/json")
        s3.put_object(Bucket=bucket, Key=txt_key,
                      Body=transcript_text.encode("utf-8"),
                      ContentType="text/plain; charset=utf-8")

        log.info(json.dumps({
            "stage": "done",
            "audio_key": key,
            "json_key": json_key,
            "txt_key": txt_key,
            "dg_request_ms": int((time.time() - t0) * 1000)
        }))

    if not found_any:
        # You invoked with an event that didn’t contain S3 records (e.g., empty Test event).
        log.info(json.dumps({"note": "no S3 records found in event", "event_keys": list(event.keys())}))
    return {"statusCode": 200}

💡 Bytes fallback (optional): If your org enforces very strict bucket policies that block presigned URLs, add a temporary env var UPLOAD_BYTES=true and send bytes instead. Replace the Deepgram call in the handler code to post raw S3 bytes (Content-Type from the file extension) instead of sending a {"url": ...} JSON payload.

Stage 7: Monitor and Alert (5 minutes)

What to watch

Lambda: Invocations, Errors, Throttles, Duration (avg/p95), IteratorAge (if SQS)
SQS: Visible messages, AgeOfOldestMessage
DLQ: Visible messages

Alarms (typical thresholds)

Lambda Errors ≥ 1 for 2×5-min periods
Lambda p95 Duration > 5s for 2×15-min periods
SQS AgeOfOldestMessage > 60s for 2×5-min
DLQ Messages ≥ 1 (alert immediately)

Useful Logs Insights query (p50/p95 over time)

fields @timestamp, @message
| parse @message /"dg_request_ms":\s*(\d+)/
| filter ispresent(@1)
| stats count() as requests, avg(@1) as avg_ms, pct(@1,50) as p50_ms, pct(@1,95) as p95_ms by bin(5m)
| sort @timestamp desc

💡 Tips to hardening the app for production

Idempotency: already implemented via HEAD check; keep it enabled (fix IAM so no warnings).
Security: Use Secrets Manager for the API key; keep bucket private
KMS: use SSE-KMS for transcripts if required (add kms:Encrypt/GenerateDataKey to role).
Lifecycle: Transition old transcripts to IA; expire when appropriate.
Separate buckets for input vs. transcripts if you prefer stricter access boundaries.
Async transcription for multi-minute files: submit job and receive via webhook to keep Lambda sub-second (variant pattern).

Reserved concurrency: Set a cap to protect spend during unexpected spikes.

Troubleshooting Tips for the Serverless Transcription App

No Lambda logs after upload? S3 event miswired; check bucket Properties → Event notifications and Lambda Resource-based policy.
KeyError 's3': You’re using SQS; unwrap record.body (the handler in the repo does this).
ModuleNotFoundError: requests: add a layer/vendor dependency.
403 on HeadObject (idempotency): add s3:GetObject on transcripts/*.
Deepgram REMOTE_CONTENT_ERROR 403: Lambda role lacks s3:GetObject on audio-incoming/* so presigned URL isn’t authorized; fix IAM.
Timeouts/no network: Keep presigned pattern, increase memory (faster CPU), switch to SQS trigger and raise max concurrency.
Trigger creation error (SQS visibility < Lambda timeout): Set visibility to ≥ 2× Lambda timeout (e.g., 120–180s).

>> 💻 Here’s the code for this technical guide in this repository.

Conclusion: Serverless Audio Transcription with AWS Lambda and Deepgram STT

You just stood up a production-ready, pay-per-use transcription pipeline: S3 catches audio, Lambda orchestrates, Deepgram transcribes, and results land back in S3; no servers to babysit, no idle spend. The architecture scales from a trickle to a flood, and every moving part is observable, permissioned with least privilege, and easy to extend.

A few takeaways worth underlining:

Cost tracks usage. Lambda stays hot for ~sub-second submit work while Deepgram minutes do the heavy lifting. You’re not paying for idle CPUs or overprovisioned nodes.
Operationally boring (in a good way). S3 events or SQS buffering handle bursts; DLQs and CloudWatch alarms give you fast feedback loops; idempotency prevents dupes.
Security stays first-class. Private buckets + presigned URLs keep audio in your account; IAM scopes exactly what the function can read/write. (If you tighten bucket policies later, bytes-upload fallback is a safe escape hatch.)
Developer-friendly. The code is small, explicit, and testable with sample events. Swapping models or languages is an env-var edit, not a rewrite.

Where to go next

Long files/live traffic: Switch to Deepgram’s async + webhook flow to keep Lambda < 1s end-to-end for multi-minute content or batch jobs.
Post-processing: Enrich transcripts with timestamps/diarization, push summaries/keywords to DynamoDB or OpenSearch, or run Comprehend for entities and sentiment.
Lifecycle & analytics: Partition transcripts by date, add S3 lifecycle rules, and query with Athena for reporting.
Hardening: Add WORM/versioning on transcripts, KMS for outputs, and formalize everything with Terraform/SAM and a CI pipeline.
Observability: Pin the Log Insights query + p95 duration on a dashboard; keep DLQ alarms on at all times.

If you’re integrating this into an existing product, start by pointing your current upload path at audio-incoming/, enable the SQS buffer, and let the pipeline shoulder the load. From there, it’s a few env vars to tune accuracy, language, and formatting—and you’ve got accurate, fast, and massively scalable speech-to-text without the ops tax.

Ready to ship it? Sign up for Deepgram and start building with a developer-focused STT API—and you’ll be transcribing reliably at serverless scale in minutes.

Deploy a Serverless Transcription Workflow with AWS Lambda + Deepgram STT

Table of Contents

Table of Contents

Why Serverless STT on AWS (Deepgram + Lambda)?

1️⃣ Event-Driven by Design

2️⃣ Predictable, Minimal Cost

3️⃣ Built-in Resilience and Burst Control

4️⃣ Zero-Ops Scaling

5️⃣ Single-Purpose Functions = Clean Code

6️⃣ Deepgram Developer Experience

Scenarios and Cost Assumptions for Serverless STT on AWS (Deepgram + Lambda)

Case A: Async Workflow (recommended for 5‑minute files)

Case B: Sync Workflow (okay for short clips)

Architecture Overview

What happens, step by step

Why this Architecture?

Get Prerequisites (S3 Bucket, API Keys, Tooling)

Account Setup and Tools

Deepgram Credentials

Stage 1: Set Up S3 Bucket with Source Audio and Transcript Prefixes/Folders

From the Console

Or Use Your CLI

Stage 2: Create and Use Amazon SQS as an S3 Buffer and Add DLQ

Step 1: Create two queues

Step 2: Allow S3 to send messages to the main queue

Step 3: Point S3 Event Notifications to the SQS queue

CLI Option

Stage 3: Set Up AWS Lambda for Serverless Transcription

Step 1: Log in and Navigate to Lambda

Step 2: Create a New Function

Step 3: Configure Function Settings

Step 4: Add Environment Variables (Lambda)

Step 5: Connect the main SQS queue to your Lambda Function (event source mapping)

Stage 4: Add Required Permissions to IAM Execution role for Lambda

💡 Common issues to watchout for:

Stage 5: Add Handler Code to Lambda Function

Stage 7: Monitor and Alert (5 minutes)

What to watch

Alarms (typical thresholds)

Useful Logs Insights query (p50/p95 over time)

Troubleshooting Tips for the Serverless Transcription App

Conclusion: Serverless Audio Transcription with AWS Lambda and Deepgram STT

Where to go next

Unlock language AI at scale with an API call.

Unlock language AI at scale with an API call.