Introducing Speech Summarization Powered by Domain-Specific Language Models
tl;dr:
We’re announcing the public release of our first domain-specific language model (DSLM) for speech summarization of call center interactions
Fine-tuned using more than 200K domain-specific conversations
No token length or audio duration limits
See our new summarization model in our API Playground or contact us to learn more
Since our inception, Deepgram has been a foundational AI company on a mission to create the essential building blocks for Language AI that will power the future of human-computer interactions. Sure, we’re mostly known for our industry-leading speech-to-text models and API, but that was always just the first phase of our journey:
Phase 0: Develop an end-to-end infrastructure and operational pipeline to curate data, train deep learning models, adapt general models into custom-trained ones, and deploy/operate these models for customers.
Phase 1: Apply this process to voice data to produce transcripts with near-human accuracy across multiple languages, domains, and use cases.
Phase 2: Make AI-generated transcripts more legible to both humans and machines with enhanced formatting options and speaker diarization.
Phase 3 (current phase): Give users the most comprehensive understanding of what was said, how it was said, and who said it using domain-specific language models (DSLMs) for high-level natural language understanding tasks like summarization, sentiment analysis, topic detection, etc.
Unlike OpenAI, Anthropic, and Google who are building massively scaled-up, general-purpose large language models (LLMs) like ChatGPT, Claude, and Bard with hundreds of billions of parameters, we are taking a different approach. While these models are undoubtedly powerful, they are large, slow, and too expensive to serve specific use cases efficiently and accurately at scale, to say nothing of the safety, privacy, and security concerns that will inhibit widespread enterprise adoption.
In contrast, we are building domain-specific language models (DSLMs) that are trained on use case-level data–with support for training on unique user-level data–which will provide several important benefits over their general-purpose counterparts:
Personalization
Superior accuracy on specialized topics
Low inference costs
Speed
Today we are proud to announce the general public release of our first such models for speech summarization of contact center and sales enablement interactions.
Deepgram Speech Summarization
The contact center industry is embracing AI-powered solutions to drive operational efficiency, cost reduction, and enhanced customer satisfaction. With the ever-increasing volume of customer interactions, contact centers are actively seeking innovative approaches to efficiently manage and analyze these interactions. Contact center agents spend an average of six minutes in wrap-up time per call, which involves manually updating notes from customer calls, documenting resolutions, and outlining next steps. Unfortunately, this manual process leads to longer average handling times, vital details being overlooked, and an overall decline in the customer experience.
To address these challenges head-on, we have developed a state-of-the-art DSLM-powered Summarization Model specifically tailored for contact centers and sales enablement use cases. This model is now publicly available for pre-recorded, English audio. By leveraging this model, agents and supervisors can effectively reduce average handling times, increase first-call resolution rates, and elevate the overall customer experience. Our Summarization Model automates the process of summarizing customer interactions and extracting pertinent information on a large scale. This empowers sales representatives with highly accurate summaries of customer conversations, allowing them to spend less time on administrative tasks and instead focus on building meaningful connections with customers and prospects.
Our model has undergone meticulous customization to cater specifically to the unique requirements of the contact center segment and offers a number of differentiated benefits:
Through extensive fine-tuning using more than 200K domain-specific conversations, our model surpasses the accuracy of alternative summarization methods for this segment.
No token length or audio duration limits–especially important in the call center space where call duration can often exceed the fixed context windows of current summarization solutions.
Blazing fast speed to support workflow automation.
Low cost per summary.
Unlike extractive approaches that tend to generate inaccurate summaries, our DSLM-powered model adopts an abstractive approach. This enables it to capture the essence of conversations with remarkable precision, delivering summaries that effectively convey the primary aspects of the conversation, including the reason for calling, agent responses, and identified follow-ups. An example of the difference between approaches can be seen below.
Transcript:
Alternative (extractive) summarization results:
Deepgram’s DSLM-powered Summarization Model:
As is clear from this example, our DSLM-powered Summarization Model delivers superior results that more accurately captures the essence of the interaction and includes important details the other model’s output lacks. In contrast to LLM-based solutions that have long latencies and much greater expense per query, the price and speed at which our model operates enables reliable, efficient summarization in record time in support of streamlined workflows.
By harnessing the power of automated summarization, contact centers and sales enablement platforms can uncover critical insights that enable leaders to efficiently navigate through thousands of conversations. This allows for quick identification of calls that require in-depth review and follow-up, ultimately saving time and effort while providing targeted coaching to agents.
To learn more, please visit our changelog or try out our new summarization model in our API Playground.
If you have any feedback about this post, or anything else regarding Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions or contact us to talk to one of our product experts for more information today.
Unlock language AI at scale with an API call.
Get conversational intelligence with transcription and understanding on the world's best speech AI platform.