By Stephen Oladele
Last Updated
Audio transcription powers modern products (e.g., podcast platforms, customer support analytics, accessibility features, knowledge search) and usage is rarely steady. Some hours are quiet; others spike with uploads. Traditional, server‑based pipelines force you to provision for peaks, maintain machines, and pay for idle time.
Enter serverless. Pairing AWS Lambda with Deepgram’s speech‑to‑text (STT) API, you get a push‑button, event‑driven workflow that scales to meet bursts and drops to zero when idle. Instead of running a fleet, you wire an S3 upload to trigger a Lambda function; that function calls Deepgram for accurate, low‑latency transcription and saves the results right back to S3.
In this guide, you’ll deploy a production‑ready foundation: when audio lands in S3, Lambda sends a secure presigned URL to Deepgram’s /v1/listen endpoint and writes both the raw JSON response and the cleaned transcript text to a transcripts/ prefix.
Along the way, you’ll see how to keep costs predictable, add retries and dead‑letter queues for resilience, and extend the pipeline for search or analytics (all without maintaining servers or paying for idle compute time).
What you’ll build: A production‑ready pattern: S3 (incoming audio) → S3 Event → Lambda → Deepgram REST → S3 (transcripts), plus tips for costs, observability, and hardening.
Who it’s for: Platform engineers and developers who want to build a hands‑off, speech‑to‑text serverless transcription app on AWS.
>> 💻 Here’s the code for this technical guide in this repository.
Why Serverless STT on AWS (Deepgram + Lambda)?
Modern audio pipelines must transcribe at burst scale and sleep at idle. Yet most teams don’t want to babysit servers, autoscaling groups, or Kubernetes clusters just to move bytes from A to B.
Why Serverless STT on AWS (Deepgram + Lambda)?
Here are the key advantages in practice:
1️⃣ Event-Driven by Design
- S3 Event Notifications fire the moment a file lands (no polling loops or cron jobs).
- Each invocation handles one object, so concurrency naturally matches workload.
2️⃣ Predictable, Minimal Cost
- AWS free tier: 1 M Lambda requests + 400k GB-s monthly.
- Typical 5-min MP3 (≈5 MB) ≈ $0.022.
3️⃣ Built-in Resilience and Burst Control
- Optional SQS buffer smooths sudden floods; a DLQ captures hard failures for replay.
- Automatic retries on Lambda errors; you can add exponential back-off for Deepgram 429/5xx responses.
4️⃣ Zero-Ops Scaling
- Lambdas launch in < 100 ms warm start; cold starts are minor for I/O-bound jobs.
- No autoscaling rules or idle EC2 instances to watch.
5️⃣ Single-Purpose Functions = Clean Code
- One Lambda = one responsibility: fetch audio, call Deepgram, persist transcript.
- Easy to swap languages (Python, Node.js, Go) or hand off to Step Functions if you bolt on post-processing.
6️⃣ Deepgram Developer Experience
- /v1/listen REST accepts URLs or byte streams, returns JSON you can drop into S3.
- Choose models (e.g., nova-2, nova-3), languages, smart formatting, summarisation (all via query params).
- Generous (200 USD) trial credits let you test thousands of minutes for free.
Scenarios and Cost Assumptions for Serverless STT on AWS (Deepgram + Lambda)
Pricing References:
- Lambda duration: ~$0.0000166667 per GB‑second; requests ~$0.20 per 1M. Free tier: 1M requests + 400k GB‑s/month. (Source: AWS)
- Deepgram (Nova‑3, pre‑recorded): ~$0.0043/minute (varies by plan/volume). (Source: Deepgram)
- S3 egress (to internet): starts ~$0.09/GB in us‑east‑1; S3 request costs are tiny (PUT ~$0.005/1k, GET ~$0.0004/1k). (Source: AWS)
📝 Important: For clips ≥ ~1–2 minutes, prefer async Deepgram + webhook so Lambda runs for hundreds of ms (submit job) rather than seconds/minutes (wait for transcript).
Case A: Async Workflow (recommended for 5‑minute files)
- Assumptions: Lambda memory 1024 MB, runtime 600 ms to create a presigned URL + submit async job; Deepgram processes in the background and posts results to your webhook (or you poll).
- Lambda compute: 1.0 GB × 0.6 s × $0.0000166667 ≈ $0.000010 per file (plus $0.0000002 request).
- Deepgram: 5.0 min × $0.0043/min ≈ $0.0215 per file.
- S3 egress (example): 5‑min MP3 @ 128 kbps ≈ ~4.8 MB → 0.0048 GB × $0.09/GB ≈ $0.00043.
👉 Estimated total per 5‑minute file: ≈ $0.022
Case B: Sync Workflow (okay for short clips)
- Assumptions: 30‑second clip; Lambda memory 1536 MB; Lambda waits for synchronous /v1/listen to return—~2 s Lambda time end‑to‑end.
- Lambda compute: 1.5 GB × 2.0 s × $0.0000166667 ≈ $0.000050 per file (plus $0.0000002 request).
- Deepgram: 0.5 min × $0.0043/min ≈ $0.00215 per file.
- S3 egress (example): 30‑sec MP3 @ 128 kbps ≈ 0.5 MB → 0.0005 GB × $0.09/GB ≈ $0.000045.
Estimated total per 30‑sec file: ≈ $0.00225.
👉 Takeaway: In both workflows, Deepgram usage is the slightly dominant cost; Lambda duration and S3 request charges are negligible at this scale. Data egress is tiny for compressed audio but non‑zero.
Deepgram minutes are the primary cost driver. For multi‑minute files, use async + webhook so Lambda remains sub‑second.
Architecture Overview
When a client uploads an audio file is uploaded to Amazon S3 (for example, under audio-incoming/), S3 emits an ObjectCreated event. That event invokes an AWS Lambda function, which generates a presigned S3 URL for the object and calls Deepgram’s /v1/listen REST API with that URL.
Deepgram transcribes the audio; the function then writes both the raw JSON response and a clean text transcript to a transcripts/ prefix in the same bucket.
Serverless Transcription on AWS (S3 → Lambda → Deepgram Nova-3 STT → S3)
Why presigned URLs? The audio stays in your private bucket; Deepgram fetches it securely via a time-limited URL. That keeps Lambda fast and memory-light and avoids base64 overhead.
What happens, step by step
Step |
Action |
Why it matters |
1. Upload |
An audio file lands in audio-incoming/ in Amazon S3 (e.g., uploads/my_audio.mp3). |
S3 gives you 11 × 9s durability, versioning, and infinite scale. |
2. Event trigger |
The ObjectCreated event fires. You configure S3 to push that event either directly to Lambda or through SQS if you need burst smoothing. |
Purely event-driven—nothing polls for work. |
3. Lambda orchestration |
Lambda receives the object key, generates a presigned-S3 URL (good for a few minutes), and calls Deepgram’s /v1/listen REST endpoint with JSON { "url": "<presigned-url>", … }. |
- Keeps the bucket private (no public URLs) - Avoids loading the audio into Lambda memory - Presigned URLs cost $0.00. |
4. Deepgram STT |
Deepgram streams the object, transcribes it (sync for short clips, async + webhook for multi-minute audio), and returns JSON results. |
Off-loads GPU/ASR complexity; you pay only per audio minute. |
5. Persist outputs |
Lambda writes two artifacts back to S3: transcripts/<basename>.json — full Deepgram response transcripts/<basename>.txt — best-alt plain text |
Makes downstream search / analytics simple; idempotency check = “if transcript exists, skip.” |
6. (Optional) Error path |
On unhandled exceptions or > x retries, the S3 event is sent to a dead-letter SQS queue for later replay. |
Gives ops a clear backlog instead of silent failure. |
Why this Architecture?
- Event-driven + pay-per-use: scales to spikes, drops to zero at idle.
- Low latency and cost: no double-handling of bytes in Lambda; Deepgram fetches directly.
- Operationally simple: S3 stores, Lambda orchestrates, Deepgram transcribes; SQS/DLQ make bursts and failures easy to manage.
>> 💻 Here’s the code for this technical guide in this repository.
Get Prerequisites (S3 Bucket, API Keys, Tooling)
Before wiring events and code, make sure you have the following in place.
Account Setup and Tools
Tool |
Why you need it |
Quick-start link/note |
AWS account with permission to create S3, Lambda, IAM, (optional) SQS |
You’ll provision the entire pipeline in your own account. |
Sign up or log in at https://aws.amazon.com/ |
S3 bucket (or two) |
- Stores incoming audio and finished transcripts. - Create a bucket in the Region you’ll run Lambda. - Enable block public access. - Turn on server-side encryption (SSE-S3 or SSE-KMS). |
AWS Console → S3 → Create bucket |
Deepgram account and API key |
Authenticates requests to /v1/listen. New accounts get $200 free credit. |
https://console.deepgram.com/signup → API Keys → Create Key |
IAM role for Lambda |
Grants least-privilege access to S3 (and SQS if used). |
See sample policy below |
Local dev tooling (optional) |
- AWS CLI or AWS SAM/Terraform if you want IaC. - curl or Postman to test webhooks. - A short audio clip (.mp3, .wav, or .m4a) to verify the flow. |
AWS CLI install: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html |
Runtime choice |
Guide explains Python 3.10, but the pattern is identical in Node.js, Go, etc. |
Pick the language your team supports. |
💡Tips:
- Throughout this guide, we use us-east-1; ensure you use a consistent regon for all your resources
- Keep Lambda outside a private VPC unless you have a NAT gateway. The function must reach Deepgram over the public internet.
Deepgram Credentials
Create an API key and keep it server-side only. We’ll pass it to Lambda via:
- Simple: Encrypted environment variable (good for demos)
- Best: AWS Secrets Manager (Lambda reads it at runtime)
Stage 1: Set Up S3 Bucket with Source Audio and Transcript Prefixes/Folders
From the Console
Create an S3 bucket. You can use one bucket (e.g., serverless-audio-transcription) with separate prefixes or two buckets—your choice.
- Input prefix: audio-incoming/ (where you upload audio)
- Output prefix: transcripts/ (where the function writes results)
Here’s the naming convection this guide uses:
s3://serverless-audio-transcription/
└── audio-incoming/ # <-- S3 Event Notification source
└── transcripts/ # <-- Lambda writes JSON + TXT here
📝 Notes:
- You don’t require bucket policy for presigned URLs. Do not add Deny rules that restrict GetObject by VPC endpoint or IP, or Deepgram won’t be able to fetch via the presigned link.
- We’ll filter S3 events to only trigger on audio types and the audio-incoming/ prefix so transcript writes don’t re-trigger Lambda.
Or Use Your CLI
# set your vars
BUCKET=serverless-audio-transcription
REGION=us-east-1 # change as needed; just enure you use a consistent region throughout
# 1) create the bucket
if [ "$REGION" = "us-east-1" ]; then
aws s3api create-bucket --bucket "$BUCKET"
else
aws s3api create-bucket --bucket "$BUCKET" \
--create-bucket-configuration LocationConstraint="$REGION" \
--region "$REGION"
fi
# (optional but recommended) lock it down a bit
aws s3api put-public-access-block --bucket "$BUCKET" \
--public-access-block-configuration BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
aws s3api put-bucket-encryption --bucket "$BUCKET" \
--server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'
aws s3api put-bucket-versioning --bucket "$BUCKET" \
--versioning-configuration Status=Enabled
# 2) create the “folders” (prefixes)
aws s3api put-object --bucket "$BUCKET" --key "audio-incoming/"
aws s3api put-object --bucket "$BUCKET" --key "transcripts/"
>> 💻 Here’s the code for this technical guide in this repository.
Stage 2: Create and Use Amazon SQS as an S3 Buffer and Add DLQ
Step 1: Create two queues
Head over to Amazon SQS to create two queues:
Dead-letter queue audio-dlq (Standard is fine). You’ll add the DLQ to the Lambda func for failures.
Main queue: audio-incoming-queue (Standard queue for max throughput)
- Visibility timeout: set > your Lambda timeout (e.g., Lambda 60s ⇒ visibility 120–180s)
- Redrive policy (DLQ): send to audio-dlq with MaxReceiveCount (start with 5)
📝 Note: Queue visibility timeout > your Lambda timeout because if your function runs near its timeout or you add retries/backoff inside, SQS needs enough time to keep the message hidden from other pollers. Too short and the same message can get delivered to another Lambda while the first is still working.
Step 2: Allow S3 to send messages to the main queue
S3 can only publish to SQS if the queue access policy allows it. In the SQS console → your main queue → Access policy → add a statement like:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowS3SendMessage",
"Effect": "Allow",
"Principal": { "Service": "s3.amazonaws.com" },
"Action": "sqs:SendMessage",
"Resource": "arn:aws:sqs:REGION:ACCOUNT_ID:audio-incoming-queue",
"Condition": {
"ArnEquals": { "aws:SourceArn": "arn:aws:s3:::YOUR_BUCKET" }
}
}
]
}
> Replace REGION, ACCOUNT_ID, YOUR_BUCKET with the actual values.
Step 3: Point S3 Event Notifications to the SQS queue
- S3 bucket → Properties → Event notifications → Create event
- Event types: All object create events (s3:ObjectCreated) (or PUT, CompleteMultipartUpload).
- Prefix: audio-incoming/
- Suffix: .mp3, .wav, .m4a (you can make one per suffix or one that’s broad)
- Destination: SQS queue → select audio-incoming-queue
This step sends “S3 event JSON” into your SQS queue as each audio file lands.
CLI Option
1) Create queues
aws sqs create-queue --queue-name audio-dlq
aws sqs create-queue --queue-name audio-incoming-queue \
--attributes RedrivePolicy='{"deadLetterTargetArn":"arn:aws:sqs:REGION:ACCOUNT_ID:audio-dlq","maxReceiveCount":"5"}' \
VisibilityTimeout=180
2) Allow S3 to send to SQS
MAIN_Q_URL=$(aws sqs get-queue-url --queue-name audio-incoming-queue --query 'QueueUrl' -o text)
cat > /tmp/sqs-access.json <<'JSON'
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"AllowS3SendMessage",
"Effect":"Allow",
"Principal":{"Service":"s3.amazonaws.com"},
"Action":"sqs:SendMessage",
"Resource":"arn:aws:sqs:REGION:ACCOUNT_ID:audio-incoming-queue",
"Condition":{"ArnEquals":{"aws:SourceArn":"arn:aws:s3:::YOUR_BUCKET"}}
}
]
}
JSON
aws sqs set-queue-attributes --queue-url "$MAIN_Q_URL" --attributes Policy="file:///tmp/sqs-access.json"
3) Wire S3 → SQS
cat > /tmp/s3-to-sqs.json <<'JSON'
{
"QueueConfigurations": [
{
"Id": "audio-incoming",
"QueueArn": "arn:aws:sqs:REGION:ACCOUNT_ID:audio-incoming-queue",
"Events": ["s3:ObjectCreated:*"],
"Filter": { "Key": { "FilterRules": [
{ "Name": "prefix", "Value": "audio-incoming/" }
]}}
}
]
}
JSON
aws s3api put-bucket-notification-configuration \
--bucket YOUR_BUCKET \
--notification-configuration file:///tmp/s3-to-sqs.json
Stage 3: Set Up AWS Lambda for Serverless Transcription
Create the AWS Lambda function that orchestrates audio transcription requests to Deepgram.
Step 1: Log in and Navigate to Lambda
- Log in to your AWS Console.
- Search for "Lambda" in the top search bar and open the Lambda service page.
Step 2: Create a New Function
- Click "Create function."
- Choose "Author from scratch."
- Set the following parameters:
- Function Name: audio-transcriber
- Runtime: Choose Python 3.10 (or later)
- Architecture: Choose arm64 (recommended for ~30% lower cost) or x86_64.
- Under Execution Role, select "Create a new role with basic Lambda permissions." (You’ll attach S3/IAM bits next)
- Click "Create function."
Step 3: Configure Function Settings
Once created, adjust these important settings under the Configuration tab:
Setting |
Recommended Value |
Why? |
Memory |
1024–1536 MB |
Ensures smooth performance |
Timeout |
60 seconds |
Allows sufficient time to call Deepgram asynchronously |
Ephemeral Storage |
(default) 512 MB |
Sufficient for temporary runtime storage |
Reserved Concurrency |
Optional (leave blank initially) |
Limit concurrency to control costs during spikes |
Step 4: Add Environment Variables (Lambda)
Under Configuration → Environment variables, add:
Name |
Required |
Example |
Purpose |
INPUT_PREFIX |
Yes |
audio-incoming/ |
S3 folder for source audio uploads |
TRANSCRIPTS_PREFIX |
Yes |
transcripts/ |
S3 folder to save output |
DEEPGRAM_API_KEY |
Yes |
(from Secrets/Env) |
Auth for Deepgram REST API |
DG_MODEL |
Optional |
nova-3 |
Deepgram model to use |
DG_LANGUAGE |
Optional |
en |
Language code (if needed) |
DG_SMART_FORMAT |
Optional |
true |
Improved readability of transcriptions (punctuation, casing, numbers, etc.) |
📝 Note: We’ll set sensible defaults in thee Lambda code if these aren’t provided.
Step 5: Connect the main SQS queue to your Lambda Function (event source mapping)
- Lambda → Add trigger → SQS → choose audio-incoming-queue
- Batch size: start with 5
- Batch window: 0–1s
- Maximum concurrency (per trigger): set a cap (e.g., 10) to throttle burst spend
- Visibility timeout (on the queue) should be greater than Lambda timeout × expected retries. Ensure it’s 2–3× Lambda timeout.
That’s it. Now S3 drops notifications in SQS; Lambda polls SQS at your chosen rate, and failures get retried up to MaxReceiveCount then land in audio-dlq for inspection.
Or use the CLI to connect the main SQS queue to your Lambda func:
aws lambda create-event-source-mapping \
--function-name ADD_YOUR_FUNCTION_NAME \
--event-source-arn arn:aws:sqs:REGION:ACCOUNT_ID:audio-incoming-queue \
--batch-size 5 \
--maximum-batching-window-in-seconds 1 \
--maximum-concurrency 10
Stage 4: Add Required Permissions to IAM Execution role for Lambda
Step 1: Get your inline policies ready
Below are the following inline policies you’ll attach to the Lambda execution role with least privilege for this guide:
CloudWatch Logs (basic Lambda logging):
{
"Version": "2012-10-17",
"Statement": [
{ "Effect": "Allow", "Action": ["logs:CreateLogGroup"], "Resource": "arn:aws:logs:*:*:*" },
{ "Effect": "Allow", "Action": ["logs:CreateLogStream","logs:PutLogEvents"], "Resource": "arn:aws:logs:*:*:log-group:/aws/lambda/*" }
]
}
S3 input (read audio for presigned auth and optional bytes path):
{
"Version": "2012-10-17",
"Statement": [
{ "Sid": "ReadInputAudio",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:GetObjectTagging"],
"Resource": "arn:aws:s3:::YOUR_BUCKET/audio-incoming/*"
}
]
}
S3 outputs (read for idempotency + write transcripts):
{
"Version": "2012-10-17",
"Statement": [
{ "Sid": "ReadWriteTranscripts",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:GetObjectTagging", "s3:PutObject","s3:PutObjectTagging"],
"Resource": "arn:aws:s3:::YOUR_BUCKET/transcripts/*"
}
]
}
SQS (buffer or DLQ)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"sqs:ReceiveMessage",
"sqs:DeleteMessage",
"sqs:GetQueueAttributes",
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:sqs:REGION:ACCOUNT_ID:YOUR_QUEUE_NAME"
}
]
}
👉 Shortcut: you can also attach AWS’s managed policy AWSLambdaSQSQueueExecutionRole instead of writing this inline.
(Optional) Secrets Manager
If you store the Deepgram API key in Secrets Manager:
{
"Version": "2012-10-17",
"Statement": [
{ "Effect": "Allow",
"Action": ["secretsmanager:GetSecretValue"],
"Resource": "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:YOUR_SECRET_NAME*"
}
]
}
Step 2: Go to the AWS Console → Lambda → open your function.
Step 2: Configuration → Permissions → under Execution role, click the blue role link (e.g., audio-transcriber-role-abc123).
Step 3: You’re now on the IAM Role page. Click Add permissions → Create inline policy.
Step 4: Click the JSON tab.
Step 5: Paste the policy JSON you need (e.g., the “CloudWatch Logs” block, or the “S3 access” block) and Save.
- If you already have an inline policy with a "Statement": [ ... ] array, you can add another statement to that array instead of creating a separate policy.
Step 6: Wait ~30–60 seconds for IAM to propagate. Re-test your Lambda.
💡 Common issues to watchout for:
- Wrong role: Always click the role link from the Lambda page to edit the correct one.
- Wrong ARN: S3 ARNs are arn:aws:s3:::BUCKET/PREFIX/* (no region in S3 ARNs).
- Policy shape: JSON must have a single "Version" and a "Statement" array; don’t paste multiple top-level objects.
- Inline policy limit: Keep each inline policy under the IAM size limits; create multiple policies if needed.
- Bucket policy Deny beats Allow: If there’s an S3 bucket policy with a Deny, it will override your role’s Allow (not needed for this guide; avoid Deny rules that restrict GetObject on input).
- Propagation delay: IAM changes can take ~30–60 seconds to take effect. Refresh and retry.
Stage 5: Add Handler Code to Lambda Function
This handler code (see repo):
- Accepts S3 events directly or SQS→S3 wrapped events.
- Generates a presigned URL and calls Deepgram /v1/listen.
- Writes JSON and TXT to transcripts/.
- Is idempotent (checks existing outputs).
- Includes useful logs and retry on 429/5xx.
Paste the code in the Lambda Code editor and click Deploy.
"""
Lambda: transcribe S3 audio via Deepgram
Runtime: Python 3.12
Memory: 1024–1536 MB
Timeout: 60 s (actual runtime ~0.6 s for async submit)
"""
import os, json, time, logging
import boto3, botocore
import requests
from urllib.parse import unquote_plus
# ---------- Logging ----------
log = logging.getLogger()
log.setLevel(logging.INFO)
# ---------- AWS clients ----------
s3 = boto3.client('s3')
secrets = boto3.client('secretsmanager')
# ---------- Config (via env vars) ----------
DG_URL = "https://api.deepgram.com/v1/listen"
TRANSCRIPTS_PREFIX = os.environ.get("TRANSCRIPTS_PREFIX", "transcripts/")
INPUT_PREFIX = os.environ.get("INPUT_PREFIX", "audio-incoming/") # changeable without code edits
DG_MODEL = os.environ.get("DG_MODEL") # e.g., "nova-3"
DG_LANGUAGE = os.environ.get("DG_LANGUAGE") # e.g., "en"
DG_SMART_FORMAT = os.environ.get("DG_SMART_FORMAT", "true").lower() in ("1", "true", "yes")
# Optional: bypass HEAD check while fixing IAM/bucket policy
SKIP_HEAD = os.environ.get("SKIP_HEAD_CHECK", "false").lower() in ("1", "true", "yes")
# ---------- Helpers ----------
def _get_api_key():
"""
Resolve Deepgram API key from Secrets Manager (preferred) or env var.
Supports a secret that is either a raw string or a small JSON doc with a key.
"""
secret_name = os.environ.get("DEEPGRAM_SECRET_NAME")
if secret_name:
resp = secrets.get_secret_value(SecretId=secret_name)
val = resp.get("SecretString") or resp.get("SecretBinary")
try:
j = json.loads(val)
return j.get("DEEPGRAM_API_KEY") or j.get("deepgram_api_key")
except Exception:
return val
key = os.environ.get("DEEPGRAM_API_KEY")
if not key:
raise RuntimeError("Deepgram API key not configured. Set DEEPGRAM_API_KEY or DEEPGRAM_SECRET_NAME.")
return key
def _presigned_url(bucket, key, expires=300):
"""Create a short-lived URL so Deepgram can fetch the private S3 object directly."""
return s3.generate_presigned_url(
"get_object",
Params={"Bucket": bucket, "Key": key},
ExpiresIn=expires
)
def _transcript_keys(input_key):
"""
Derive output object keys from the input key.
hello.mp3 -> transcripts/hello.json + transcripts/hello.txt
(Keeps it flat; swap for a date-partitioned prefix if you prefer.)
"""
base = input_key.rsplit("/", 1)[-1].rsplit(".", 1)[0]
return (f"{TRANSCRIPTS_PREFIX}{base}.json",
f"{TRANSCRIPTS_PREFIX}{base}.txt")
def _exists(bucket, key):
"""Check if an S3 object exists (for idempotency)."""
try:
s3.head_object(Bucket=bucket, Key=key)
return True
except botocore.exceptions.ClientError as e:
code = e.response.get("Error", {}).get("Code")
if code in ("404", "NoSuchKey"):
return False
if code in ("403", "AccessDenied"):
# Treat as not present so we attempt to write; PutObject will fail if still denied.
log.warning(f"HEAD access denied for s3://{bucket}/{key}; assuming not exists")
return False
raise
def _post_with_retries(url, headers=None, json_body=None, max_retries=3, backoff=0.5):
"""
Call Deepgram with basic exponential backoff on 429/5xx.
Keep Lambda under a small timeout; for multi-minute audio use Deepgram async+webhook.
"""
last_err = None
for i in range(max_retries + 1):
try:
resp = requests.post(url, headers=headers, json=json_body, timeout=30)
# Retry on throttling or server errors
if resp.status_code >= 500 or resp.status_code == 429:
raise requests.RequestException(f"Retryable status {resp.status_code}: {resp.text[:256]}")
if resp.status_code >= 400:
# Log a small slice so we can see Deepgram's diagnostic
log.error(f"DG {resp.status_code} body={resp.text[:500]}")
resp.raise_for_status()
return resp
except requests.RequestException as e:
last_err = e
if i == max_retries:
break
time.sleep(backoff * (2 ** i))
raise last_err or RuntimeError("Retries exhausted")
def _iter_s3_records(event):
"""
Yield (bucket, key) tuples regardless of trigger type:
- Direct S3 -> Lambda: event['Records'][*]['s3']...
- SQS -> Lambda: event['Records'][*]['body'] contains the S3 event JSON
- (Defensive) If SNS wrapped inside SQS, unwrap the 'Message' JSON too.
"""
records = event.get("Records", [])
for rec in records:
# Case 1: Direct S3 event
if "s3" in rec:
s3rec = rec["s3"]
yield (s3rec["bucket"]["name"], s3rec["object"]["key"])
continue
# Case 2: SQS message containing S3 event JSON
body = rec.get("body")
if body:
try:
inner = json.loads(body)
except json.JSONDecodeError:
log.error("SQS body was not valid JSON; skipping")
continue
# Some pipelines wrap the S3 event JSON as an SNS 'Message'
if isinstance(inner, dict) and "Message" in inner:
try:
inner = json.loads(inner["Message"])
except Exception:
log.error("SNS Message not valid JSON; skipping")
continue
for inner_rec in inner.get("Records", []):
if "s3" in inner_rec:
s3rec = inner_rec["s3"]
yield (s3rec["bucket"]["name"], s3rec["object"]["key"])
continue
log.warning("Record had neither 's3' nor 'body'; skipping")
# ---------- Handler ----------
def lambda_handler(event, context):
api_key = _get_api_key()
# Helpful startup log (don’t log secrets/presigned URLs)
log.info(json.dumps({
"stage": "start",
"has_records": "Records" in event,
"records_count": len(event.get("Records", [])) if isinstance(event.get("Records", []), list) else None
}))
found_any = False
for bucket, key in _iter_s3_records(event):
found_any = True
key = unquote_plus(key)
# Defense-in-depth: only process objects under the configured input prefix
if not key.startswith(INPUT_PREFIX):
log.info(f"Skipping non-input key: {key}")
continue
json_key, txt_key = _transcript_keys(key)
# Idempotency: skip if transcript already exists (unless SKIP_HEAD_CHECK=true)
if not SKIP_HEAD and _exists(bucket, json_key):
log.info(f"Transcript exists, skipping: s3://{bucket}/{json_key}")
continue
# Generate presigned URL and call Deepgram
url = _presigned_url(bucket, key)
headers = {"Authorization": f"Token {api_key}",
"Content-Type": "application/json"}
payload = {"url": url}
if DG_MODEL:
payload["model"] = DG_MODEL
if DG_LANGUAGE:
payload["language"] = DG_LANGUAGE
if DG_SMART_FORMAT:
payload["smart_format"] = True
t0 = time.time()
try:
resp = _post_with_retries(DG_URL, headers=headers, json_body=payload)
dg = resp.json()
except Exception as e:
# If we got a response, log a small slice for debugging (safe)
if 'resp' in locals():
log.error(f"Deepgram error status={resp.status_code} body={resp.text[:300]}")
log.error(f"Deepgram request failed for key={key}: {e}")
raise
# Extract best transcript text defensively
alt = (dg.get("results", {})
.get("channels", [{}])[0]
.get("alternatives", [{}])[0])
transcript_text = alt.get("transcript", "")
# Persist outputs (same bucket). Add SSE-KMS if you require it.
s3.put_object(Bucket=bucket, Key=json_key,
Body=json.dumps(dg).encode("utf-8"),
ContentType="application/json")
s3.put_object(Bucket=bucket, Key=txt_key,
Body=transcript_text.encode("utf-8"),
ContentType="text/plain; charset=utf-8")
log.info(json.dumps({
"stage": "done",
"audio_key": key,
"json_key": json_key,
"txt_key": txt_key,
"dg_request_ms": int((time.time() - t0) * 1000)
}))
if not found_any:
# You invoked with an event that didn’t contain S3 records (e.g., empty Test event).
log.info(json.dumps({"note": "no S3 records found in event", "event_keys": list(event.keys())}))
return {"statusCode": 200}
💡 Bytes fallback (optional): If your org enforces very strict bucket policies that block presigned URLs, add a temporary env var UPLOAD_BYTES=true and send bytes instead. Replace the Deepgram call in the handler code to post raw S3 bytes (Content-Type from the file extension) instead of sending a {"url": ...} JSON payload.
Stage 7: Monitor and Alert (5 minutes)
What to watch
- Lambda: Invocations, Errors, Throttles, Duration (avg/p95), IteratorAge (if SQS)
- SQS: Visible messages, AgeOfOldestMessage
- DLQ: Visible messages
Alarms (typical thresholds)
- Lambda Errors ≥ 1 for 2×5-min periods
- Lambda p95 Duration > 5s for 2×15-min periods
- SQS AgeOfOldestMessage > 60s for 2×5-min
- DLQ Messages ≥ 1 (alert immediately)
Useful Logs Insights query (p50/p95 over time)
fields @timestamp, @message
| parse @message /"dg_request_ms":\s*(\d+)/
| filter ispresent(@1)
| stats count() as requests, avg(@1) as avg_ms, pct(@1,50) as p50_ms, pct(@1,95) as p95_ms by bin(5m)
| sort @timestamp desc
💡 Tips to hardening the app for production
- Idempotency: already implemented via HEAD check; keep it enabled (fix IAM so no warnings).
- Security: Use Secrets Manager for the API key; keep bucket private
- KMS: use SSE-KMS for transcripts if required (add kms:Encrypt/GenerateDataKey to role).
- Lifecycle: Transition old transcripts to IA; expire when appropriate.
- Separate buckets for input vs. transcripts if you prefer stricter access boundaries.
- Async transcription for multi-minute files: submit job and receive via webhook to keep Lambda sub-second (variant pattern).
Reserved concurrency: Set a cap to protect spend during unexpected spikes.
Troubleshooting Tips for the Serverless Transcription App
- No Lambda logs after upload? S3 event miswired; check bucket Properties → Event notifications and Lambda Resource-based policy.
- KeyError 's3': You’re using SQS; unwrap record.body (the handler in the repo does this).
- ModuleNotFoundError: requests: add a layer/vendor dependency.
- 403 on HeadObject (idempotency): add s3:GetObject on transcripts/*.
- Deepgram REMOTE_CONTENT_ERROR 403: Lambda role lacks s3:GetObject on audio-incoming/* so presigned URL isn’t authorized; fix IAM.
- Timeouts/no network: Keep presigned pattern, increase memory (faster CPU), switch to SQS trigger and raise max concurrency.
- Trigger creation error (SQS visibility < Lambda timeout): Set visibility to ≥ 2× Lambda timeout (e.g., 120–180s).
>> 💻 Here’s the code for this technical guide in this repository.
Conclusion: Serverless Audio Transcription with AWS Lambda and Deepgram STT
You just stood up a production-ready, pay-per-use transcription pipeline: S3 catches audio, Lambda orchestrates, Deepgram transcribes, and results land back in S3; no servers to babysit, no idle spend. The architecture scales from a trickle to a flood, and every moving part is observable, permissioned with least privilege, and easy to extend.
A few takeaways worth underlining:
- Cost tracks usage. Lambda stays hot for ~sub-second submit work while Deepgram minutes do the heavy lifting. You’re not paying for idle CPUs or overprovisioned nodes.
- Operationally boring (in a good way). S3 events or SQS buffering handle bursts; DLQs and CloudWatch alarms give you fast feedback loops; idempotency prevents dupes.
- Security stays first-class. Private buckets + presigned URLs keep audio in your account; IAM scopes exactly what the function can read/write. (If you tighten bucket policies later, bytes-upload fallback is a safe escape hatch.)
- Developer-friendly. The code is small, explicit, and testable with sample events. Swapping models or languages is an env-var edit, not a rewrite.
Where to go next
- Long files/live traffic: Switch to Deepgram’s async + webhook flow to keep Lambda < 1s end-to-end for multi-minute content or batch jobs.
- Post-processing: Enrich transcripts with timestamps/diarization, push summaries/keywords to DynamoDB or OpenSearch, or run Comprehend for entities and sentiment.
- Lifecycle & analytics: Partition transcripts by date, add S3 lifecycle rules, and query with Athena for reporting.
- Hardening: Add WORM/versioning on transcripts, KMS for outputs, and formalize everything with Terraform/SAM and a CI pipeline.
- Observability: Pin the Log Insights query + p95 duration on a dashboard; keep DLQ alarms on at all times.
If you’re integrating this into an existing product, start by pointing your current upload path at audio-incoming/, enable the SQS buffer, and let the pipeline shoulder the load. From there, it’s a few env vars to tune accuracy, language, and formatting—and you’ve got accurate, fast, and massively scalable speech-to-text without the ops tax.
Ready to ship it? Sign up for Deepgram and start building with a developer-focused STT API—and you’ll be transcribing reliably at serverless scale in minutes.