How can developers and software designers easily build voice and video functionality into their apps? Twilio says it has the answer, and I’m talking with its head of voice and video on this episode of Dynamic Developer. The following is a transcript of this interview, edited for readability.
Listen to the podcast version of this Dynamic Developer episode on SoundCloud
Bill Detwiler: I’m your host Bill Detwiler, and I’m joined by with Hakim Mehmood, VP and GM of voice and video for Twilio. Hakim, thanks for joining us.
Hakim Mehmood: Thank you for having me, Bill.
Bill Detwiler: Hakim, for folks who aren’t familiar with Twilio, give us a run down on the company and your role.
Hakim Mehmood: Bill, I’ll start a little bit with myself. That’s not because I’m full of myself but in general. I spent close to 19 years at a company called Cisco, where we, to an extent, pioneered voice over internet or voice over IP and video over IP, creating solutions like Telepresence, WebEx and what have you. From the sidelines, I’ve been always following Twilio. What Twilio really from the inception was, it was demystifying this complex telephony stuff and making it super, super easy for developers, creating an API ecosystem around these complex telephony servers. You had the likes of Cisco, the Nortels, the wires back in the day. Jeff Lawson, our co-founder and CEO, started looking at the complexity. He started thinking about, how do we simplify it for everyday developers? That was the inception of Twilio, and since then, the company has evolved. We’re a large public company now.
If you were to think about us today, Bill, we are a customer-engagement company, and allow me to define what that means. Today you get an SMS from your dentist that your cleaning appointment is tomorrow or you get a voice call from them. Or during this pandemic, you do a TeleVisit with your family physician. All these channels of engagement and touchpoints for consumers that businesses already were on the path. This digitization path has been going on for a while, and that has accelerated massively in the last 15, 18 unfortunate months.
SEE: Hiring Kit: Mobile Application Developer (TechRepublic Premium)
Twilio is a platform at the center of this digitization where businesses can engage with their customers with data, with context, with the channel of choice, all the way from messaging to voice, to video, to email, what have you. And every flavor of messaging, whether it’s in Asia, people using WhatsApp or Facebook or pure SMS. That’s what we do. We literally demystify all these modalities for the developers so that they can build these modalities in their workflow. Just in your example, your dentist sending you an SMS, perhaps. Or you getting an OTP from your bank, perhaps a developer wrote a few lines of code that allowed that workflow to become real. That’s what we do. We are a customer-engagement company. We literally host trillions and trillions of transactions on our platform every year.
Bill Detwiler: I think that’s a great segue into what we really want to talk about, which is, and you mentioned the interaction with the dentist and then also a telemedicine visit with a family physician. I think that’s the thing that I’m hoping you can help our audience with is understanding and sort of demystifying that complexity of voice and video communications, because customers today want to be engaged on the platform that they want, as you said, and increasingly that is voice and video. That might seem a hard thing, if you’re a developer, to build into your app. So, let’s start with maybe your impression. What are some of the challenges that developers and teams face when trying to build voice and video engagement into the applications and systems that they’re developing?
Hakim Mehmood: I’ll go back into my history. I started as an engineer, as a programmer in a couple of startup companies before working for Cisco. Back in the day, we had chipsets like DSPs where you would write special signaling processing code, where you would get bits and bytes of audio and packetize them and send them to the other side and demultiplex them and reconstruct the video. The whole process is super complex. If you think about the internet, from its inception, was not built for real time. Over time, it has evolved. Voice, to an extent, and video moreso, is real time. As you and I are talking through this interaction, if my conversation or my joke had a 30-second lag, it would be ineffective.
So, the hardest part in voice and video is to preserve the fidelity while being real time. For every developer, not that they can’t do it, we did it back in our day. It’s really, really hard work to set up the infrastructure, to set up the routing infrastructure, to travel through firewalls, to have the right codex, to have the right bits flowing into the pipe. It’s super complex. It’s a lot of work. What we do at Twilio is we make those building blocks as a part of our infrastructure, and to an extent obfuscate all that complexity from a developer so that they can just call an API, like make a call to Bill and it just works.
To give you an example, especially in video, video has multiple aspects. You and I are talking, it’s important for lip sync to happen. All those smaller, smaller bits are super important. We have an app sample code, which is open source that, Bill, I promise you can go download, and in five minutes write your first video application. That is the promise of Twilio while preserving pristine quality or the infrastructure we have built over time.
So the net-net would be we do not want the developers to spend a lot of their valuable time in creating the plumbing, in thinking about the codex, in thinking about the bits. We want them to add value at a higher level. Like you said, your telemedicine visit should push you into a lobby where a medical representative or assistant interacts with you. Then all that interaction should be transposed into the room that you are with your doctor. That’s the level we expect businesses to work as video becomes more and more ingrained in everything we do every day.
How can developers include video in their applications?
Bill Detwiler: It’s more video-as-a-service, right? I mean you’re providing voice and video as a plugin option for development teams, so they can focus on the apps and the systems for their company that they’re building. They don’t have to sort of go off, like you said, on the plumbing. I think that’s really a great analogy.
What recommendations do you have for development teams who are looking to incorporate voice and video into their applications? I’m thinking now less from maybe a technical perspective because truly as you’re describing it, using the API, the technical process has gotten a lot easier from when you started now. But I’m thinking here more about the right way to design, to incorporate video into an application, where it makes sense and when it makes sense, how to do it in the right way.
Do you have advice for teams and developers that are considering that, even if it’s just as simple as this is where a window should go, or this is a situation where it works well, or this is a situation where it doesn’t work well, or if you’re going to do it, here’s a best practice? What are some tips that you would give to listeners and viewers?
Hakim Mehmood: I think, Bill, one of the key things is for developers, I’ll reiterate one of the points. It is we have made it super simple to plug in video, whether you have a web browser, whether you have a mobile client or whether you have any other modality. We have made it simple. That part. What I would ask the developers is focus on the business case, business use case and the experience. To your point, whether a patient should show up on the top pane or the lower pane in a telemedicine appointment, that’s more important. We have data to share what has worked well for our customers so we can advise our customers how to do it, how not to do it. We also have a lot of advice for our customers based on our experience, how to set up their SDKs as they build out video and voice applications in their mobile or desktop clients, how to parameterize it based on different conditions of network operating in.
Look, Bill, you and I have seen this. You go into a hotel: You have crappy Wi-Fi, and you have to hit this important conference call or an appointment or a workflow. I’ll give you another example. We have a really good customer that has basically taken the notary workflow, which you and I used to go to notary. We get in the line, they get all this and use video to go ahead and create complete, end-to-end workflow. My advice to the developers, thinking about extending their brand, thinking about acquiring more business through virtual means, look the pandemic is going to end, and that video has become a natural extension. Focus on user experience. Focus on the data that they require to make that user experience and business flow simple. Leave the bits of plumbing and leave the bits of the transport to us. We know how to do it better, and we’ll advise you every step of the way.
We have a lot of payer-side on the medical insurance side, but they’re trying to build experiences off of us from a healthcare platform standpoint. How we advise them is, hey look, our experience, what has worked from a user experience standpoint for other customers, we roughly do a billion minutes of video. I’m not talking about conferencing. Look, we are not a conferencing platform between Zoom and Microsoft and Cisco and Google. That modality has been spoken for, but we are here to help the rest of the business cases. One of them is we just beta’d the product–I know we’re going to talk about it in a little while–is Twilio Live, that I’m super excited about. Basically, what is happening is there are these mega trends.
SEE: Business leaders as developer: The rise of no-code and low-code software (free PDF) (TechRepublic)
We always have broadcast video. You do webinars and you have thousands and tens of thousands of users listening on demand. But with the rise of the Clubhouse and TikToks of the world, there is this new prefix that has gone into streaming. It’s called interactive, and it’s super critical. Why is it super critical? Because it has near real-time characteristics associated with it. Imagine you and I on a panel, and we have an audience listening, and we are trying to solicit responses from them on a poll or something mid-conversation. So, the characteristics have to be near real time, and there is no point in every enterprise inventing that experience. We have already invested a whole lot in the infrastructure, in the APIs, in the guidelines, in the documentation dos and don’ts of that experience. That’s where we can really, really help developers accelerate these modalities into their business flows or whatever the use case might be.
Bill Detwiler: Yeah. Let’s talk about the user experience. I think that’s really important. You talked about, because I love the example you gave of where to put the patient’s box in the video chat because that’s something that I think maybe can get overlooked sometimes if you’re not making a conscious effort during the dev process to do that. What are some of those lessons that you’ve learned and seen in the data around user experience for voice and video? Are there things that turn people off when it comes to voice and video that are relevant for developers? Are there things that maybe customers really like or have told you that yes, this is the way that I want to interact with voice or interact through voice and video with a company?
Hakim Mehmood: One of the things we have learned is we have a spectrum of customers. I told you an example of a notary. I can give you an example of people who unfortunately are incarcerated and TeleVisits associated with them. We have education. We have a very, very diverse set of customers. We have banking. So there’s no one-size-fits-all, but that’s where it becomes super powerful being a platform company, Bill. What I mean by that is we give you an SDK on the client side. You can orchestrate and play with the experience that works from an engaged score better for you. So, in my opinion, the best experiences are built. They’re not bought off the shelf.
What you saw during the pandemic, the likes of Zoom, the likes of WebEx people gravitated towards them really quickly because this was the only way to create business continuity. What we have seen since then is that the customers have started building curated experiences. I’ll give you examples. In companies, while you have these employee resource groups that are supporting a particular cause, they want to decorate their rooms in a certain way. So they require platforms, the off-the-shelf platforms don’t allow them to do so. Other than the basic virtual backgrounds, they don’t allow them to do so.
So, what we are seeing is basically due to the flexibility that our SDK provides on the side, people can curate these experiences quite a bit. I’ll give you an example of a fitness company that has standardized on Twilio. There were a bunch of experiences. Do I put square grids? Do I put round faces? Do I do what? Eventually they gravitated towards having around sort of in the middle face of the trainer and everybody else fitting on the sides because that’s the experience that was most amenable to their users. That’s only possible when you have a platform that allows that flexibility. Very simply. It cannot be that your developers have to go back to the drawing board and write tens of thousands of lines of code. It is that you can simply tweak a few parameters and test experiences really, really quickly. That’s what we have observed.
It is really, really important, Bill. We’re a customer engagement company. I get tired of me saying that again and again. It is so important to get that experience piece right, whether it is when we send a message to you for your dentist appointment or when you are in a video experience that is back and held on our platform.
Communication Platform-as-a-Service
Bill Detwiler: Yeah, that makes a lot of sense to say that it does come down to the uniqueness of the company and the customers and the experience. It isn’t a one-size-fits-all. So, in the final sort of minutes here that I have you for, I’d love to get a little more technical and draw a little more on that engineering background, the technical background you have. Which is if you were talking to developers, do you have any recommendations or any feedback? Does it matter what IDE they’re using or does it matter what languages they’re proficient in? Does it matter? Is there any information you would share with them to help them develop their skills at using voice and video and incorporating that in the apps and systems that they’re building? Or because of what you’re saying, you’ve got the API it’s so simple. It really doesn’t matter. Your platform is kind of environment agnostic.
Hakim Mehmood: To an extent we are environment agnostic. But for every developer today, my recommendation would be don’t invent the wheel. We have open source to our sample apps. Go download those apps, look at our source code and make that a starting point of your experience. I was just talking to a founder of a very unique dating application. The founder had one more developer that they were writing code. They had written a lot, and they ran into a couple of issues. So we got on a call with them. In the end what we ended up doing is pointing them back to our open source code so that they did not have. We have already figured out some of the blocks that they were encountering over the last so many years. That would be my real starting point for them.
Then the second point would be read our documentation. Our documentation, actually, back in my previous employer, we used to envy the APIs and the documentation because Twilio invented this space called CPaaS, communication platform-as-a-service. Our documentation is very, very good. Read that, download our sample code and make that a starting point and use our excellent, excellent support services to help you navigate. Look, we are a developer. We have millions of developers on our platform. We take a lot of pride in making the developer a first class citizen on our platform. That’s how we sell. If you listen to our CEO’s talks, he’ll say, look, my first customer was a few cents and I saw them grow over time to millions of dollars in a ARRR. This is we are a user-based platform. This is because we care and cater to developer and their experience every single day. They are our family customers. So that would be the starting point.
Bill, I’m going to pivot a little back if you don’t mind on this Twilio Live thing. I think I spoke about the rise of these newer paradigms of customer engagement. Video has been a customer engagement channel. If you and I were talking about something else, we would use video, whether we were colleagues or customers or friends. But this new paradigm, if you think about what TikTok has done, what Clubhouse is doing through audio and we provide solutions, we are modality agnostic on our platform. You can do audio, you can do video, you can do content and you can have millions of viewers behind our platform in a very interactive way where they send emojis, where they send responses. It’s a really, really hard problem to solve.
I have a lot of enterprise customers coming to us and saying, “Hey, we are looking for greater employee engagement. How do we build those experiences? Is it a heavy lift?” Many of them tried on their own and it’s a massive heavy lift. They now come to us and are starting using this platform. We just announced the beta for this particular product and I have close to 500 customer signups on it already, and we’re trying to get them through the funnel as fast as possible. So, this new engagement channel, whether you think about it in group shopping where you have big audiences, if you watch these TikTok videos. Or you think about them in fitness, where you have very large classes. Or you think about them in talk host style video broadcasts.
These new experiences to engage your customers and then get data about them is the latest modality in customer engagement. Our aim is to–we already are a leader in this platform–create a platform that developers throughout enterprises or technology providers that build these experiences for customers to be the platform for that. I am super, super excited about this launch. I cannot tell you enough how much inbound interest we have here at this point in life.
Bill Detwiler: So, when is Twilio Live launching? The beta is out, you said. Do you have an official launch date yet in case people want to go check it out, or how do they find out more about the beta?
Hakim Mehmood: Twilio.com/live, a very simple URL. We are going to be generally available around October timeframe when it’s our… Signal is our biggest show of the year and that’s where we are going to be generally available. It’s an extension of our video platform. It doesn’t only do video. It does audio. Look, if you think about the events, if you think about the big conferences, Signal, Signal used to be an in-person event. What has happened during these last 15, 18 months is the conferences went virtual. You saw the rise of platforms like Hopin, et cetera. But the way we think about the future is it’s all going to be hybrid. You’ll have people coming physically to conferences. People by nature are gregarious. We want to shake hands. We want to see each other. But in order to reach greater audiences, you’re going to have platforms that are built on Twilio Live-like platform that extend these experiences virtually to a great number of customers, thus expanding reach and creating a lot more engagement. Very, very excited about this, Bill.