AIMinds #022 | Shaun Wei, CEO at RealChar
About this episode
Meet Shaun Wei, a driven tech professional who resides in Sunnyvale, California. Shaun is part of an innovative startup named Pony.ai, where he specializes in developing technology for autonomous vehicles. His role primarily focuses on managing the immense flow of streaming data essential for the operation of self-driving cars.
Shaun and his team tackle complex challenges, processing data from various sources such as cameras, lidars, and radars, as well as auditory signals like ambulance sirens to enhance the safety and efficiency of their autonomous systems. This demanding task requires handling vast amounts of information that arrive in milliseconds, underscoring the critical nature of Shaun's work in the push towards a future driven by AI-enabled transportation.
Listen to the episode on Spotify, Apple Podcast, Podcast addicts, Castbox. You can also watch this episode on YouTube.
In this episode of AiMinds podcast, hosted by Demetrios, the discussion features Shaun Wei, CEO and Co-founder of RealChar. The conversation offers an in-depth look into Wei’s transition from the autonomous vehicles sector to founding an innovative AI project focused on real-time data processing. Wei shares how complexities encountered in real-time data processing for self-driving cars inspired his development of a voice assistant technology. This evolves into RealChar, an open-source project, and subsequently to a refined, user-centered AI phone assistant named Revia.
Wei and Demetrios explore the critical need for responsiveness and real-time processing in both autonomous vehicles and AI assistants, drawing parallels between operational demands in both fields. They discuss the challenges inherent in creating AI that handles natural conversations and sophisticated tasks like managing phone calls, scheduling meetings, and interacting with various service systems. Revia, designed to ease personal communication burdens, integrates voice or text interaction, supports live monitoring of calls, and allows users to intervene and resume automated calls as needed.
The episode concludes with insights into the broader implications of AI in personal and professional realms, emphasizing the essential nature of trust, reliability, and human-like interaction in AI agents. Wei’s narrative from the frontline of AI development provides valuable perspectives on the current and future landscapes of AI applications in everyday life.
Fun Fact: Revia allows its users to interact with live call transcripts during a call, offering them the ability to jump in and take over the call if needed, then seamlessly hand it back to Revia, integrating human oversight with AI efficiency.
Show Notes
00:00 Users might wait for response with distraction.
03:42 Critical perception, prediction, and planning for autonomous vehicles.
10:03 Google Duplex: AI making natural phone calls.
10:48 Google Duplex struggles, humans needed for calls.
15:12 Revia helps schedule and avoid phone hassles.
16:55 Revia: a solution for managing phone calls.
20:21 Creating reliable AI agents, tracing decision-making.
23:16 Questioning trust in an agent for travel.
26:20 Revia specializes in phone call mobile app.
29:50 Transcripts reveal phone call delays, offering solutions.
More Quotes from Shaun:
Transcript
Demetrios:
Welcome back, everyone. I'm Demetrios, and this is another edition of the AI Minds podcast, a podcast where we explore the companies of tomorrow being built. AI first. This episode is brought to you by Deepgram, the number one speech to text and text to speech API on the Internet today. Trusted by the world's top conversational AI leaders, startups, and enterprises like Spotify, Twilio, NASA, that one that puts rockets into space and Citibank. Today we're joined by none other than the CEO and co founder of RealChar, Shaun, how you doing, man?
Shaun Wei:
I'm doing great. How are you doing?
Demetrios: I'm excited to talk with you because you are building something for the consumer. And we just bonded before we hit record on both having little daughters who are around one year old. So I can relate to lots of sleepless nights and kids getting into drawers they shouldn't be getting into.
Shaun Wei:
Yes, they. They are. My daughter is, like, really noticed right at the moment right now.
Demetrios:
Exactly. It comes with the territory. I think so, dude. Well, I know you've got a cool story. You were working in autonomous vehicles for about four years. I want to hear a little bit about that and what you learned around streaming and real time when it comes to AI.
Shaun Wei:
Yeah. So, hello, everyone. I'm Shaun So I worked in a startup acalled Pony AI, and we're based in Fremont, California, and we were doing some driving cars. My team is primarily working on a lot of streaming data. So think about knowledge. The larger model is really slow, but the processing data, but for self driving cars, they consume a lot of data from the cameras, lidars, radars, also like audio, because you have to listen on the serums of the ambulance to know when to stop. So those data are coming in milliseconds.
Shaun Wei:
The server and car need to process those data and make decisions also in milliseconds. So those are the unique capabilities of all the seven driving cars on the road. But thinking about all the larger model based applications, you find it's really slow. It's still missing the real timeness of the software. So why I thought, why building my own voice assistant, that's got me really inspired. I think I need to bring the best of the real time processing from the software and car into the larger model application world.
Demetrios:
Yeah, that makes a lot of sense. And I know that there's some people that I've heard talk about how there's the actual objective time that it takes to generate a response, and then there's the subjective time. And for that end user, sometimes it doesn't need to be as fast as milliseconds. If you can preoccupy them with something else while they are waiting for the answers. So maybe it's like trying to solve a puzzle or give them a little like tidbit, like, hey, did you know this little fact? When it comes to autonomous vehicles, though, you can't do any of that. That doesn't work at all because there's no like, hey, let's preoccupy the end user while we get into a car accident, right?
Shaun Wei:
Yes, yes. So that's why, like in self driving cars, there's few things really critical, right? So what I talk about is more like perception. Like it's how it sees the world. How then the second part is like prediction. How do you predict the people who are wrong to you? Right, when they want to make left turn or right turn, or they want a sudden stop, then you have something called planning. Is also planning ahead a little bit, just how humans think ahead. Before you even talk, you think about what are the key points you want to attach. Those are the key elements for a autonomous vehicle.
Shaun Wei:
We think those are the best things we should bring into the new, larger model world.
Demetrios:
Yeah. Then you started a little open source project that got famous on GitHub. It blew up. I would say was that while you were still working with the autonomous vehicles or had you already left?
Shaun Wei:
Oh, already left at the moment, once I get a taste of how the larger model works and how the GPS three works, so I find, okay, that's immediately the one thing I should do. Like, I quit my job.
Demetrios:
Wow, so you were like, autonomous vehicles are cool, but this is the future.
Shaun Wei:
Yes. Not as cool as like all the new AI and larger models.
Demetrios:
Yeah. Well tell me about what you created, this personal voice assistant, and it's all open source, right?
Shaun Wei:
So for the RealChar is the open source project. So that's where we started with. So if you, you can still search the same like RealChar. And on the GitHub issues, that's a fully open source project. And we use like a lot of speech tags, test speech. And also Deepgram is also one part of the solutions in that open source project. You can get it try. And so that we move on to like something we are currently building.
Shaun Wei:
It's more a voice assistant for managing your personal phone calls. We can talk about it more if you.
Demetrios:
Yeah, yeah. I really want to get into it. I am curious to hear the evolution of going from an open source project to what you are doing now and why you had the open source project was that just to test the waters and see if people wanted something like an end user voice assistant.
Shaun Wei:
I'm really believing in all the EA companion AI assistant. I think in the future everyone will have their own, like Symanza or jabbas, right?
Demetrios:
Yeah.
Shaun Wei:
So I want to really build in that future. So that's why we started with open source projects called RealChar. So the idea is to test how the voice interaction works, plus a character, right? So last year was all about character, right? Character, AI, there's bunch of different characters. So that's why we tried. We said, okay, it's not too hard to build. So we spent about three weeks to build real chart. So that's a character you can talk to in real time. We had interruptions already by the time.
Shaun Wei:
So that's why it blew that people was super excited to see that project. So then we think after that, we think, okay, how do we make the AI doing things for us, right. Not just like, you know, just talking to you, like be a, like comfort or something. Right. I want it to help me do things. So that's why I come up with ideas called rivia. Right? So I, yes, that's where I come starting to working on this like, voice AI assistant to manage my daily phone calls.
Demetrios:
Okay, well, tell me a little bit more about what you're planning on doing now. And how did you go about validating what you were doing? And it feels to me like you had a lot of traction with RealChar.
Shaun Wei:
Yes.
Demetrios: And you could have tried to build a business off of that. Right? Maybe it's like that companion, and you charge people per month to have a companion or whatever. There's like, I don't want to say enterprise features because it's not b, two b, it's more b, two c. But you have special features that are not the open source version. And you could charge that, but you felt like, wow, let's actually get this a little bit deeper. How did you validate that people wanted it to do things?
Shaun Wei:
Yes. So very testing like a question, because when we build RealChar, where we immediately thinking, what's the next step? So for AI companions, we think there's two directions. One is going to more the content or media type. So you create more characters, make it more engaging, make it more animated. You get to be getting more experience from the conversation. That is one direction. The other direction is making it more AI assistant. One of the Javas part is I want to give it more calendars, my emails.
Shaun Wei:
It should be helping me to making phone calls, book scheduled meetings. We look at ourselves and we think my team is really specialized in the later ones. So. So that's why we decided, okay, so we. We want to focus on this instead of, like, making it more. More animated or that way.
Demetrios:
Yeah, yeah, exactly. I do like that. And it's very easy to think about when you tie these two cultural references to it. Like Samantha or Jarvis. Yes. Like, okay, Jarvis. Yeah. Is your companion to help you do things and can get you out of a bind.
Shaun Wei:
Yes.
Demetrios:
And Samantha is more like someone to talk to that will help you and help your mental health maybe.
Shaun Wei:
Yes.
Demetrios:
So the thing that I, I know that we talked about before we hit record was when in like 2015, Google gave us that demo of an AI assistant booking a haircut for you. And I was very excited when I saw that. It feels like that was a bit like Santa Claus. Like, hey, this is great, but it doesn't actually exist. You feel like you can do that now.
Shaun Wei:
Yes. So I'm in. I was in like Google Assistant team where like tsunami Pichai like, announced the Google. The project is called Google Duplex. So the idea was when you ask Google Assistant to make a restaurant like reservation, it was just to help you to call that restaurant and supposed to be able to handle all the natural conversations and help you to book that restaurant. The promise was great, but putting actions, very funny things. I actually interviewed someone who on the other end received like Google duplex phone calls. So usually they said, okay, this looks like very fake.
Shaun Wei:
But then later on they find, oh, they're still there needs and they are bringing more business to the restaurant. But they find, okay, later on they find, okay, since Google Duplex is pretty stupid, they don't understand any more. Like nuance. Like, oh, where do you want to sit? Or do you want to see outdoor or it's not no indoors available anymore. So they find, okay, later on, Google have to ask actual humans to making those phone calls. That's really odd. So I think they had an issue with scaling up the operation then it's definitely over promised during that time.
Demetrios:
Yeah, it reminds me of the latest news that came out of Amazon with their checkout. The like, oh, yeah, that was the Amazon go. And it's like, oh, yeah, that super advanced AI which is basically just outsourced workers looking at a screen and putting items in your cart. And this wasn't so different because you had people calling in to restaurants since the AI wasn't there yet. But granted, that was 2015. A lot has changed since then.
Shaun Wei:
Yes, I think now the difference between the new AI and the previous generation of AI is now the newer AI with larger model. It can really understand the nuance when the human is starting to talk. Right. So this just wasn't there like a couple years ago. So Google was way ahead of its time.
Demetrios:
Yeah.
Shaun Wei:
Now we think we have a shot. We have opportunity finally to represent anyone, to start making phone calls to business. So that's also inspired. What also inspired me is I have a one year old. So last year when we tried to call like, California paid family program, it spent me like three days straight just trying to call. And the worst experience is they have this verification code. If you ever call California paid verifying program, you wouldn't notice. They have this verification code is when you listen to the audios, they give you a random four digit numbers.
Shaun Wei:
You have to pay attention to that 40 numbers. You have to put in the 40 numbers. Then they allow you to, okay, getting the next waiting line. Then they will tell you, okay, it's all like four. You have to recall back. So for that three days, I had nothing done. I was like really pissed off. Yeah.
Shaun Wei:
So that's why I said, okay, I need to do something. There should be better than this. So that's why I thought, oh, the Google class is something like making phone calls. I have the same technology now. I know how to make it real time and make the essence action do actions for you. So that's when we started working on this project called Rivian and. Yeah.
Demetrios:
So what can you show off that you can do now? Have you set it loose with the California state system?
Shaun Wei:
Yes. So for Revia and for anyone interesting, it's Revia tech. R e v i a tech. So we have a landing page there. So the idea for Revia is it will representing you to make any phone calls so you don't have to call those business anymore. So it manage your area like a tedious cost. So for that California paid family program phone call, we, we made review, understand, in real time, what are the vertical code through the gram. Hello.
Shaun Wei:
Right. So we get those numbers, then we tell the larger models. Those are the things you need to press. And you have press it now. So then knows, okay, I need to press buttons. It actually press button for you. And when it detects, okay, you are like, the queue is already full. You cannot like wait anymore.
Shaun Wei:
So it won't hang up the phone call automatically and record that number again until it gets to get a hold to a human.
Demetrios:
Incredible. Incredible. That is. So you scratched your own itch. On that one?
Shaun Wei:
Yes.
Demetrios:
Wow, that is so cool. And what else have you done?
Shaun Wei:
For Revia, we are thinking about like people really the painful experience. Like people usually suffer from making phone calls. Think about the last time when you're trying to call like United Airline or call a bank. So they have their really complex menus and then they wait, put your waiting line for so many hours, then they will tell you, oh, the office already closed. You have to call back again tomorrow morning. So what we made is you can actually schedule calls for Revia. You can say, oh, revia, schedule a call tomorrow morning at 08:00 a.m. in the morning.
Shaun Wei:
Call this number. And this is the objective and this is the basic information you need. So when you wake up at 09:00 a.m. in the morning, Revia will give you a summary of what happened during the call. You will be able to see the full transcript and listen to the audio. And if sometimes it does fail, cannot shift your objective. You can call back with the same context and give it a little more instruction. And then that's it.
Shaun Wei:
So your day will be just like instruct, keep calling. And you don't need to pay attention to that call at all.
Demetrios:
Oh my God. And I can see so many different use cases this would be valuable for, because we've all had those moments where we call not even so like the banks and the government are easy ones. But I'm thinking like maybe it's your local plumber who is overworked and they can't get to it. Yeah, and, yeah, and then, yes, yes.
Shaun Wei:
So that's exactly the thing. So we built the Revia for myself. I'm a dad, I own a house and I'm a founder. I have million things, important things to do. So I build this to solve this for manage all the phone calls. Whether I think about managing my house, there's plumbers, the gardeners and the media can do is crazy things can do is if there's something with your like sewer or like for your shower, like drain or something, you can tell Revia, just call nearest ten plumbers, get their quotes, get their time, and then give me a summary of who is available in the next tomorrow. Then call back to that person and say, you can come to my house and here's my address. So that's exactly we built for all the plumbers.
Demetrios:
Wow. And a lot of people are talking about AI agents right now. And I think when you hear people talking about AI agents, they instinctively think about on your computer, inside your computer, an agent that will break down tasks and then go do things inside of your browser or inside of your operating system. This feels like you're taking the agent outside of the hardware and you're bringing it into the real world.
Shaun Wei:
Yes. So that's the same feeling I had. Because when we look at all the autonomous agent, I find a lot of work. You don't want to be replaced by AI somehow, but that way I talk to people. You do want to be replaced by AI when making phone calls. So those are the really worst experience you already have. And also, when you think about AI, bring those to the real world. It's totally different challenge because for browsers, computers, it's all digital.
Shaun Wei:
But when you bring the AI into the real world, the other end, when they receive a phone call, sometimes don't expect AI calling them. So how do you play a role that's similar like a human, but they still sounds okay, you know? I know okay. I still know it's maybe AI, but I get my information. Oh, I'm okay. AI calling me. So that's a little bit tricky. You know, when you making this, like bring this to the real world, it's similar to how the self driving car works. When you bring those two down the road, you have the same issue.
Demetrios:
Are you specifically calling out that when someone answers the phone, you say, hi, I'm Shaun's AI assistant.
Shaun Wei:
So we already call. When we call out, we tell them like, we are representing a client, right. So we, we, but you know, sometimes the voice will give it away and sometimes like, you know, they may. So it's just like similar like how the, when the bin is calling you, they will say, oh, this is a recording for bid training. Is similar cases for Revia.
Demetrios:
How do you deal with the idea of AI agents being a little bit flaky and not always doing what you tell them to do?
Shaun Wei:
Yes. So this is always a joke about like you can create a demo of AI agents probably using only 10% of a resource, but 90% of resources to put into how to simulate, evaluates and validate that agent is reliable for that scenario. So that's exactly what we did. So when we bring the AI agent into the real world, so what we did was we tracing how the AI agent is making decisions into milliseconds. So we know exactly at that point of time why is saying that? Or why is press this button? So we have the full like replay history we can trace. The second is like we build our own, like in house simulations. How AI talking to AI all the time. So even though we are like specialized in like outbound calling, but we also have AI to receive inbound calling.
Shaun Wei:
So they are constantly play with each other. That's how we feel like all the AI should be like learning from each other, right? Making it more natural. So that's what we had. We have a simulation environment for them to keep constantly talking to each other for a few things. If feel in the real world, we can bring that into the virtual world, then let them to roleplay up to that point and we see how they react.
Demetrios:
And so have you encountered the situation when an agent calls a bunch of gardeners to fix your lawn and it tells you that, okay, we got a gardener that's going to come over tomorrow at 04:00 and at 09:00 a.m. the gardener shows up and it's like what the agent told me for, why are you here now? So you lose that trust in a way.
Shaun Wei:
So this is very interesting. Like we have a slightly different theory because we think we largely model is hallucination. Right? Right. Yeah. So, but for critical information, you do have to have a code in control. So a few things, like first things we have noticed, like you have to have coding. Control is like scheduling, right. The larger model give you one numbers, but you have to make sure those are validated across board, right.
Shaun Wei:
You have. So you have a validation for like scheduling and also a payment, right. When you will try to like order food, you have to have like those really accurate. So there are a bunch of things you have to have coding control. You cannot just let a larger model to in control. So that's another thing we find is very useful is like we have those special test cases. Whenever we find those test cases, we add those into the test suite. So we will be able to simulate continuously.
Shaun Wei:
Yes. Initially we had similar issues. Like it hallucinate a lot for some random days, random numbers. So that's how we solve that.
Demetrios:
Yeah, I always find that funny. And I wonder, I've done a lot of thought experiments on if I trust an agent to go and plan me a trip to France and if I will, you know, like, I guess if you see the confirmation ticket for the train, because I'm in Europe in my inbox and I see the hotel confirmation, that's one thing and that's very grounded in reality. But if the agent tells me, okay, tomorrow your train leaves at this time, and then I show up at that time and then there's no train even or I don't have any of the details that could be difficult. And that could be like, yes, easy way for me to be like, well, I don't think that this agent actually works.
Shaun Wei:
Yeah, I think this is like, the shared concern among people who are using a agent. A few things I can share, few insights I can share with you. First is, for any of the AI agents, you have to build trust. It's just like how you interact with human. So that's how we, when we create our Revias, we allow you to talk to AA agent first to get a hand, get a sense of how the AI going to interact with you and how the AI going to interact with other, like, other humans. So you're starting to build a trust. Like, oh, I can make it book a restaurant for me. Oh, it's reliable.
Shaun Wei:
Did that. Oh, then I can use that on plumbers. Then I can use that in my bank. So there's a trust like levels for those tasks. So that's how we want to like you to understand how the AI agent is going to be used. Power your daily life. Then the second part is, if you are building a agent, you might be doomed if you are using GPT four. So, like, we have a tracing on all the GPT four outputs.
Shaun Wei:
It's not stable. We have tried all the ways, try to make the output restable. So if you cannot stabilize the output, your AI agency is out of control. So we have to use other models to have more stable controls. So that's how we simulate. GPT four is constantly failing us at the simulations. So that means you cannot, there are certain behaviors. It's unexpected.
Shaun Wei:
You cannot rule that out. Right. You either add a lot of coding to fix that, or you use different things. So that's why I find one way to find, okay, if your agents need to be, like, creative, fine. But if your agent need to be, like, really stable and really accurate, if you create GPT four, it might be really hard.
Demetrios:
Yeah. So how do I interact with Revia? Is it talking on the phone or talking to the agent? Or do you have an interface on? Is there also a way to interface with it? And give, like, here's all the things that I like. Here's the things that I don't like, or here's the ways that I like things to be done. And here's my calendar. Can you sync with that? Et cetera, et cetera?
Shaun Wei:
Yeah. So when we talk about Rivia, Revia is still an e assistant, but we specialize in phone calls. So, Revia, we have a mobile app. We already passed Apple reviews, and we are starting to do beta testings and yeah, it's mobile app like you can use. And once you log in, you can through text or you can have a natural voice conversation. I'm specializing voice so you can have a voice conversation and say, oh, rivia, something is wrong with my air conditioner or something. So I need to find someone to fix that. And then Revia will, based on that context and based on where you live, it will ask you, oh, also it knows, like when you're trying to reach to those people to fix what are the basic information it needs, right.
Shaun Wei:
It needs, oh, can I share your address with the person or share your name or share your phone numbers. It asks you for those basic informations. Then it will search the Internet or the locations, your locations for the nearest people who can fix this. So they say, oh, are you ready to allow me to make phone calls? You say yes. Then that's it. So it will make all the phone calls out and then it will start to receive all the summaries once you finish. But you can. The ruby also gives you the capabilities to jump into any phone call.
Shaun Wei:
You can see that live transaction transcripts and hear the live audio if you find, oh, Revia is giving something wrong instructions. The crazy thing is you can take over that phone call at the moment and the other side will hear your voice and Rivia is like, it will stop, like stop talking. And even crazier is you can hand this over back to Rivia. Rivia will be able to continue the conversation. We'll never lose our context of what happened.
Demetrios:
That sounds like you were inspired from the autonomous driving days. Yes. You want to take over driving when there's a hairy turn or something. And then you say, okay, now back to you when I'm on the highway.
Shaun Wei:
Yeah, exactly. I think it's when we're thinking about when we build this like ai assistant, do we want to go through the copilot path or the auto pilot path? We find like people like the autopilot pass. Right. So I don't want to be in the call, right. If I want, I can. Right. But mostly I don't want to be in the car. So you have to handle everything for me.
Shaun Wei:
But I still wanted the capability to, like steering the wheel. Steering wheel, right. So I will be able to jump in the call and to take over.
Demetrios:
Have you thought about offering this to the businesses that you are calling? Because it seems like the agent, if they're calling the business, the business is probably happy to get more business. But after a while, if all of a sudden 80 90% of the public uses this. The business is going to get fed up with talking to AI agents. Yeah.
Shaun Wei:
So few things we have noticed. Like if you look at all the people who are building voice agents, they all built for the business. No one builds AI system for myself. Right. So we really want to build the voice AI for you, for the individuals who are busy. Right. And we tell people who are interested in our technology saying, okay, there is a business sales pipeline, right? Go through the pipeline. Once we have more resource, we might work with you.
Shaun Wei:
And another interesting thing is since we know when they making all the phone number, phone. Phone calls, we have all the transcripts. We will tell the business, oh, we can help you. So we know there's a huge delay when we make phone calls. My customer making phone calls to you. So do you want us to help you? So we already have a solution for them. That's a number and so many phone calls. That's more convincing.
Demetrios:
Wow. Yeah, I could see that. Well, Shaun, this has been great, man. Where I get it at Revia tech. That's r e v i a tech. I'm going to go sign up right now because I have to make a few government calls to get my daughter's passport renewed. And I'm going to take it for a little test drive.
Shaun Wei:
Yes, yes. It's like also like you, it's still in a waitlist, right? So if you share the reference link with your friends, and if they join the waitlist, you get sooner priority access. And we are thinking about releasing that priority access already next month.
Demetrios:
Perfect. Pretty soon. Perfect. That sounds great. Well, I'm excited for everything you're doing. And I hope to never have to talk to my phone service provider ever again. That is what I am hoping for. By using Revia.
Shaun Wei:
Yes. Yes. We will make that real life come true.