AI Minds #053 | Michael Louis, CEO and Founder at Cerebrium


About this episode
Michael Louis, CEO and Founder at Cerebrium. Cerebrium is a platform to deploy machine learning models to serverless GPUs with sub-5 second cold-start times. Customers typically experience 40% in cost savings when compared to using traditional cloud providers and can scale models to more than 10K requests per minute with minimal engineering overhead. Simply write your code in Python and Cerebrium takes care of all infrastructure and scaling. Cerebrium is being used by companies and engineers from Twilio, Rudderstack, Matterport and many more.
Listen to the episode on Spotify, Apple Podcast, Podcast addicts, Castbox. You can also watch this episode on YouTube.
In this episode of the AIMinds Podcast, Michael Louis, CEO and Co-Founder of Cerebrium, discusses the evolution of his AI venture and entrepreneurial journey.
Michael reflects on his success with OneCard during the COVID-19 pandemic, which scaled significantly after an endorsement from the South African president and was later acquired by Walmart. This success led him to start Cerebrium, driven by the challenges he faced implementing machine learning in his previous venture.
He explains how Cerebrium simplifies the process of building, deploying, and scaling AI applications by reducing infrastructure complexities, making machine learning more accessible to start-ups and smaller companies.
Michael also highlights technical innovations at Cerebrium, such as optimizing machine learning operations and reducing computation costs, contributing to the platform's fast processing capabilities.
The discussion also touches on the economic impact of AI, emphasizing the rise of AI-driven solutions across industries and how it will shape the future of businesses. Michael envisions a future where AI-powered agents replace traditional roles, transforming functions like financial planning and resource allocation.
This episode provides valuable insights into AI’s role in driving innovation, enhancing efficiency, and creating new opportunities across sectors.
Fun Fact: Michael Louis’s company, OneCard, experienced an explosive growth during the COVID-19 pandemic, skyrocketing from $150,000 a month to over $200,000 a day in revenue within a few months. This growth was spurred by a presidential endorsement in South Africa, where the president recommended using their app for grocery deliveries.
Show Notes:
00:00 AI Minds Podcast: Featuring Cerebrium CEO
05:18 Machine Learning: Startup Struggles and Solutions
08:38 Iterative Testing for Success
10:16 Avoid High Cognitive Load in Apps
15:32 LLMs Revolutionizing Software Engineering
17:12 AI Cost Dynamics and Justification
More Quotes from Michael:
Transcript:
Demetrios:
Welcome back to the AI Minds Podcast. This is a podcast where we explore the companies of tomorrow being built AI First. I am your host, Demetrios. And this episode, like every other episode, is brought to you by Deepgram, the number one text to speech and speech to text API on the Internet today, trusted by the world's top enterprises, conversational AI leaders and startups, some of which you may have heard of, like Spotify, Twilio, NASA and Citibank. It is my pleasure to get to introduce our guest today because this one is pretty special for me. Michael, the CEO and co founder of Cerebrium. How you doing, dude? Welcome to the show.
Michael Louis:
Yeah, awesome, man, thanks for having me.
Demetrios:
Well, I was telling you before we hit record that I've been geeking out on Cerebrium, and I love what you all are doing, but I want to dive into your story because you have been an entrepreneur since the day you were born. It sounds like you started a few companies while at university and then just during COVID you had quite the success story. Can you walk us through what happened there?
Michael Louis:
Yeah, sure. So that company was called OneCard. I was, based in South Africa at that time. That's where I'm from. And it was simply similar to Instacart. So we delivered groceries to your door in under an hour. And we were growing pretty steadily kind of before COVID And then I guess when Covid happened, there was actually our president went and did like an address to the nation was use this app and order your groceries. And I guess from there we kind of took off.
Demetrios:
I mean, we've talked about a lot of endorsement.
Michael Louis:
So glad to know the president knew about us. And essentially from there, I mean, we scaled rapidly. We went from $150,000 a month to like over $200,000 a day in a couple months. Hired like 400, 500 people in a couple weeks. And eventually we got acquired by Walmart a few years later. So pretty good outcome. Definitely have a lot of other war stories from that time.
Demetrios:
Yeah, maybe. As you think back through the war stories and growing and just all of the trials and tribulations that you faced, is there one that you tend to remember more than others?
Michael Louis:
The one thing that we had to do as a founding team was obviously we were people were paying for groceries on their credit cards, and obviously the bank only settles that cash about two days later. But because we were growing so quickly and each day our amounts are just getting larger and larger. Us, the founders, had to float the company. We're talking millions of Rands.
Demetrios:
Wow.
Michael Louis:
As like founders. And so we constantly kept risk it. And looking back now, it was actually pretty stupid to do that. But luckily it paid off for the long run. Eventually we got some partnerships with banks to like float that for us. But I mean, things like that that's just the start of it. But yeah, some crazy things were happening.
Demetrios:
It reminds me of the book about the Nike founder, Shoe Dog, where he was constantly running into cash flow issues when he had his first company before starting Nike, because he would buy all the shoes and then sell them. But before you actually get that money to go and buy more shoes, you need to pony up. And so he was going to the banks trying to negotiate deals. And you have to know a few bankers and you recognize they can make your life a lot easier.
Michael Louis:
No, look, E commerce is a very tough business. I mean, margins are tight, volumes are high. I mean, there's logistics and inventory issues. No, it was a lot of problems. The solar went up.
Demetrios:
Yeah. So then did you just vest and rest at Walmart?
Michael Louis:
No. So what happened is we had like all the handcuffs and so they wanted us to stay for a couple years. They were going to give us like a wall chest to continue growing the product. I was 25, I think at the time, 26. And I could just, I had all these ideas now we had all this cash that we could use to like grow this company even bigger. And I could just see that they were not going to do anything that I wanted to do. And we were just going to coast and like integrate with, Walmart's local, I guess, retailer called Macro. And so I went to like one of my mentors and I was am I really going to like ride out this three year, the prime time of my life and just sit here? And he was okay, well the money that I would walk away from, because obviously not all of it face, straight away, the money that I walk away from, would I be able to make it back sooner if I did something else, like start another company? And I said 100%.
Michael Louis:
And he was like, cool, I'll fund it. And so, basically like a week later I quit. And then, he funded and like gave an angel investment to like an idea that I didn't even have. I had to like go figure out an idea and find Something. So that's kind of what happened.
Demetrios:
So you don't know what you're doing next. You just know that you have the support of a mentor who has almost written you a blank check. And how did you land on a pretty difficult field and idea to go and execute on? Right. You did not play the game on easy mode.
Michael Louis:
Yeah, I mean, I studied machine learning and stats and computer science university and I was always interested in it. And at one cart we tried to implement machine learning and it was honestly the worst six months of my life. Like tooling was fragmented, it was expensive. Even as a rapidly scaling startup where we, had lots of people just to like invest six months into creating this product and getting it out, you had no idea if it was going to work. And then obviously, a lot of just the classic story of 80% of machine learning projects don't make it, which was true in this case. A lot of things didn't work out, but like couple use cases that did work completely changed the way the company operated. And so, I was kind of thinking and I actually hired my kind of lead VP of engineering from one card and I was like, dude, remember how much this sucked? Like, we should make this more accessible to startups where they can rapidly iterate on machine learning concepts, they can get to production a lot quicker, they can test if it's going to work or not, and they don't have to have a significant investment. And that's why we find AI was always more accessible to enterprises because they could use long time horizons, they could, invest a lot of money.
Michael Louis:
But for startups, Seed Series A just wasn't accessible. So that's kind of the market that we wanted to serve.
Demetrios:
Yeah, it really does have to almost be built in your DNA. And if you're a startup, it is such a crapshoot. I really like that you put it that way. Like when it works, it works and you see incredible success and it changes the way the business actually operates. But you're not guaranteed that and you may spend six months, 12 months and never get there. So you're taking a significant risk. And that brings me to my next question. Like there's the inspiration that you just laid out, but what exactly did you do to help bridge that gap?
Michael Louis:
So Cerebrium's I mean the company's called Cerebrium and what we do is we're a service infrastructure platform, so we make it easy for engineering teams to Build, deploy and scale AI applications. And the way we thought about it was when we were building machine learning applications, we were focusing more on the infrastructure side. So what compute should we use? How does it scale? How do we ingest data, how do we store it, how do we run batch jobs instead of actually focusing on the customer and the use case at hand? And so with ruben, we were okay, what if users could just focus on the product and not worry about the infrastructure? What would that unlock for teams? And so that's kind of how Cerebrium was born and that's kind of what we help customers do.
Demetrios:
And I want to get to the speed in a minute because that's really what got you all on my radar. I think I saw some benchmarks and it was just like either somebody is smoking crack or this is really fast. But before we get to that, what I am trying to think about is there's always trade offs that you as a developer are going to have to make when it comes to what you choose. And I imagine that you've been talking to a lot of folks about what the trade offs are. If they want to use you versus if they want to go and just build it themselves versus if they want to outsource all of that to maybe one of these LLM providers. How do you look at that and phrase it?
Michael Louis:
The way that I look at it is the same way that I kind of run my team. Like I never try going from 0 to 100 solution. I always like want to test something very quickly to confirm like this is actually going to work. And rather than optimize. So typically when customers come to us and they tell us like they want to build, they want to fine tune their own LLM and then like implement RAG and do all these things, I'm just okay, well what have you done today? They're no, we're just going to start. I was maybe you should go use OpenAI first and implement RAG, see if your customers actually want a product like that or it's going to serve your use case. And then I mean if, let's say you have like a 50% success rate and the success is so good value, I'm like, okay, well then bring the data that you have, fine tune it on us and then deploy it on us like a custom fine tuned model, whatever the case may be. So I like to approach it just like in a piecewise fashion where how can I get your results quickly and then keep going.
Michael Louis:
I mean, there's obviously use cases where coming to us straight away, they already know that it's the right decision because of their use case, which is very specific. But more often than not we kind of send them away and then they come back when they're in the kind of the scale phase.
Demetrios:
You did touch on something there that I don't think a lot of engineers realize when they're building their solutions, which is you can make a very technically viable AI product that still fails horribly because someone doesn't want to type into a chat box.
Michael Louis:
Yeah, I mean, we've seen that so often with customers where they were providing chat with your database or let's say like chat with our product. And it was a cool feature, but no one was willing to pay $2 extra for it, for example. So like it's cool and it works, but no one is willing to pay for it. I guess the unique economics question of businesses really comes into play that we've seen.
Demetrios:
Yeah. And I've seen, folks, I think some of the best ways that I've heard it talked about is that you've got cognitive load or this. The cognitive load that is required when you're looking at an empty chat box is a certain amount and it's quite high versus the cognitive load if you log on to an app and you just start clicking photos or you start swiping or you start scrolling, those are things that are much less cognitive load and easier for a user to interact with. So if you're a B2C type company and you have this amazing idea of an AI solution which increases the cognitive load of that end user, it's going to fail. And you don't need to spend all the time, energy and effort putting out all this work for it to fail. I can tell you right now it's not going to work.
Michael Louis:
I saw someone who like summarized it pretty succinctly, which is Sahel, one of the previous co founders of Mixpanel. And he says for every use case, you're not going to want to type into a box. You want to have like a workflow app that understands the workflows, that guides the user and then get you to the end result. And I thought that was like a pretty good way. So the app layer is always going to be there.
Demetrios:
Exactly. And the chat, like the amount of work that I have to do to think about what it is that I want as my end state is a lot. And so to prompt it, and I'm not guaranteed that it's going to be successful if I prompt it. And what I've seen a lot with users is they don't have the patience to continue prompting. You try your first time, if it doesn't work, you're this is a not a good feature. Right? Yeah. So as you're going about creating Cerebrium, how'd you make it so fast, man? That's what I really want to know, like what is going on there?
Michael Louis:
I mean I would say we found it like a specialization in like low latency applications. And the thing that we found was that people were going from running things on CPUs which were like what a cent an hour to now. These GPUs are like dollars an hour. And so just the fundamental unique economics of businesses have now changed. And so we were well to get compute down, we need running for the least amount of time as possible. And so I know we've done like two years of work just doing image caching, being really efficient on like storage, our container runtime, everything along those lines to get our cold starts to low. So what we can basically do is just like start up your Python environment. We're now getting into GPU checkpointing where we can load model weights 60% faster as well.
Michael Louis:
So basically all these kind of improvements over time lead you to running it just as like a serverless model or workload. We can just spin it up, you run it, you spin it down and typically our customers save about 40% on costs. So that's where we've got like a lot of traction as well as just our networking stack. So we've been like really optimizing like how can we get latencies from our users? Extremely low. So I mean, obviously from Deepgram side, we've been working with you guys on like low latency voice applications. And I think we released this about last year with Deepgram and Daily where we got it down to like a 500 millisecond end to end voice to voice responses and that went pretty viral. People still today are implementing that and you know, that's still the fastest that we've kind of seen. So.
Demetrios:
And just to also be clear, I'm not sure if I mentioned it before, but on the inference side of things, that's where I was really surprised, I think when I saw what you're doing. Are you doing like some kind of tricks with the CPU and GPU and how they play together or is it just Your, your. Like you're saying you're spinning up GPUs and maybe I'm getting lost with this question, so we might need to edit it out. I'm seeing like some confused looks on your face.
Michael Louis:
Well the way that we're doing it is like, I wouldn't say anything special, but there are definitely some tricks. So, for example, like loading model weights. Typically how it goes is it goes from disk to CPU to vram. And the one thing you can do is you can just skip cpu. You can just go straight from disk to gpu. So that's one thing. And again, that's not something like you can just. There's open source frameworks that you can use to do that.
Michael Louis:
But then the other thing is you can now optimize you can get the end state of what your vran looks like after it's done all like the Cuda processing things like that when it loads models in. So we've now done like some work to actually store that. So that's like one use case. But I mean I wouldn't say. I'd say we've done a couple of special things, but that's the secret sauce.
Demetrios:
Yeah. And that's cool. Keep that for yourself. And if anybody else wants to know more about it, they can come talk to you.
Michael Louis:
Awesome. Happy to help.
Demetrios:
Yeah. So now, as you look forward and as you've been seeing you've been working with customers, you've been doing all kinds of stuff in the space, where do you get excited as you look forward, like maybe just in the next six months, or dare I say in the next year, because who knows how long that actually is? And like AI years, I would say.
Michael Louis:
There's kind of like two trends that I'm very interested in. Pretty cool for me is always just even as a young person, like back in the day, just being able to be an engineer and to create whatever I want to think about has always been pretty phenomenal. So a cool thing for Cerebrium has always been the engineer is always like the lifeblood of innovation. And at the moment, there's 30 million software engineers in the world, and so we've always wanted to support them. However, what's pretty interesting to see how LLMs is now making software development more accessible to users and engineers. And so we think in the next two years, maybe there's an extra 30 million. And while they might not be extremely technical, they can still create some Sort of machine learning product. So I think just seeing this massive entrance of like engineers into the market using LLMs I think is something fascinating.
Michael Louis:
I think the applications that people are going to create are going to be phenomenal. I think the second thing that I'm interested in as well is you. Everyone talks about agents. I think like people are doing them more internally now because they're very experimental. But what I do find is that a lot of spend is going to go from employee salary spend to actually going towards like these agent use cases and paying for the compute. And so I just think it's going to be interesting to see how like companies expenses instead of salary being the main cost basis, I think compute's honestly going to be it. I think those are two interesting things that I'm interested to see if it actually turns out to be like that or if I'm completely wrong. But I would say, I mean even across industries, voice healthcare, yeah there's been phenomenal use cases that we've seen.
Michael Louis:
So it's pretty exciting to see what people are working on and I guess we've got a front row seat. I mean people come to us when they're starting. So yeah, it's pretty cool.
Demetrios:
It is interesting that you talk about where the money goes with the agents and how you can justify agents because what a good friend of mine was talking about was how with the advent of prices basically plummeting for LLM calls you see that they're trending towards zero but at the same time we are creating more complexity in our systems and how we are architecting the agents or just like using more LLM calls every time that somebody wants a query answered or wants something done with their AI product. And then on top of that you see that if someone is having success with their AI product, people are going to start using it more. So if you think about it that way, you think about, wow, the cost. Yeah of the one LLM call is going down but the complexity means we have more LLM calls and then the success cases mean that we have people using it more. You start to look at does this make sense economically? But I always have in the back of my head the side where someone says if you compare it to what a human cost would be, it's always going to make sense, it's always going to be cheaper in that regard. And so I find that kind of framing and just thought experiment fascinating to look at.
Michael Louis:
What we've seen is you when companies move to like our service runtime we send them a bunch of costs. However, even though was they go from spending, let's say, 100k a month to doing like 60k, and like, obviously we would want them to spend 100k, they typically just reinvest that into the product or into marketing, and slowly they work their way back up to being at 100K. But now they're serving, you know, let's say 30%, you know, 30% more customers, for example.
Demetrios:
Yes.
Michael Louis:
And so we kind of see that pattern.