Technology Partner

Stream

Stream provides developer-friendly APIs and SDKs for real-time chat, video, audio, feeds, and AI-powered moderation, powering in-app communication for 1B+ end users. Vision Agents is Stream's open-source framework for adding real-time vision and voice AI to live communication, and Deepgram plugs in natively as the STT provider for fast, accurate real-time transcription and diarization inside Vision Agents workflows.

For product managers and feature teams, this means adding live captions, voice search, and AI-powered conversation summaries to a video-call or in-app messaging product becomes a configuration change rather than a six-week build. Vision Agents handles the orchestration; Deepgram provides speech recognition and speech synthesis designed for real-time production use.

Vision Agents v0.2 ships with broad model support out of the box, including Deepgram, OpenAI Realtime, and Gemini integrations, with continuous improvements to latency, audio handling, and video handling. Recent launches in the Stream + Deepgram ecosystem include real-time AI character chat experiences (Lemon Slice Live) that combine streaming transcription with TTS-driven character voice.

If you are building communication features on Stream and want voice intelligence without standing up a separate transcription pipeline, Vision Agents ships with Deepgram already wired in as the speech provider. The framework is open source and the developer docs are linked below.

Stream Logo
Technology

Media Transcription

Contact Centers

Conversational AI


Looking to use Deepgram + Stream?

Talk to an Expert