Tunity CEO Yaniv Davidson explains how machine learning can help beam television content to your mobile device.
Tunity CEO Yaniv Davidson and Tunity Head of Research and Analytics Paul Lindstrom spoke with TechRepublic's Dan Patterson about how television can be transformed through machine learning, as it beams TV content to mobile devices. The following is an edited transcript of the interview.
Dan Patterson: The television has been relegated to the second screen. Not long ago, that was your phone. Yet, connecting the two remains somewhat of a holy grail for the business and technology communities. Tunity does something that is very novel and fascinating. With the application, you can scan visually a television screen, and have the audio beamed to your phone. Tell me a little bit about how the technology works.
Yaniv Davidson: We use computer-vision and deep learning-based technology, so neural networks we train, in order to detect where the TV is, and detect which channel, basically compare whatever the user is viewing with about 140 live channels that we support. Once we determine what channel the user is viewing, we actually also determine the exact timing of what the user is... So, if I'm watching a game in New York, and you're watching it in LA, and there could be 30-40- seconds difference between us, we determine that too. So we can actually also synchronize the audio. And we use our own audio protocol to basically stream the live audio directly to every user. If I walk into a bar and there are 20 TVs there, all I have to do is point my phone to the TV, scan it for a second, and Tunity automatically will detect the channel, the timing, and I'll start hearing the game that I want.
Dan Patterson: Yes. And how does the technology work? In what way do you use machine learning? And tell me a little bit about the back end.
SEE: Web Development + Mobile Development (CNET Forums)
Yaniv Davidson: Sure. So, basically, the app itself only takes few frames of video, basically seeing what the user is seeing. And then the machine learning elements, -- there are a bunch of machine learning--elements that help us, let's say, guess better which channels you're more likely to watch, based on who you are, where you are, what other people around you are watching, what's on in the game right now. Then we basically identify different elements or different features of the picture, and try to match it with channels or content that we know is live right now. And then basically we do the final actually matching of frame-to-frame in order to determine the exact time. Does that make sense?
Dan Patterson: Yeah. What role does big data and analytics play?
Yaniv Davidson: Big data is a very big word, and a lot of people use it. That's actually more around Paul's area of expertise. I just want to kind of make the clarification around, you know, there's data, which is great, and there's information, which is actually the part that's useful within the data. But Paul, this will be better if you discuss it.
Paul Lindstrom: What's most important to keep in mind with what's going on here, is that there's a lot of measurement that's attempting to go on to determine what people are watching within out-of-home locations: bars, gyms, restaurants, doctors' offices, you name it. But almost all of that is being done using audio-recognition technologies. And fundamentally, that means that they're not able to recognize viewing that's occurring where the sound is muted. So if there's no sound, and you're using audio recognition, you can't do it. And there's an inverse relationship between your ability to identify a program or what's bring tuned and the amount of ambient noise. So, if you're in a bar with a lot of people in it, it's much harder to recognize that, in fact, tuning is occurring.
And so, at this point in time, and I won't go through all the details in terms of how we get there, but the vast bulk of viewing that's being looked at as out-of-home tuning is actually guest viewing in other people's homes. By utilizing a video recognition, which is what Yaniv was just describing, we can actually identify the tuning that's occurring in these locations, where there's ambient noise, and where the sets are muted. And it's providing insights that have never been available to anyone before.
Dan Patterson: And how do you, or do you, work with the content providers? Are there rights issues when it comes to streaming the information to a device? How do you negotiate the B2B deals?
Paul Lindstrom: The way that it's being picked up is that the audio is being provided when somebody is within a short distance of the screen, within the distance that you could end up seeing it. So, in effect, it's not much more than wearing a set of headphones, and would fall under a fair use. It's not a case where you could leave the bar and continue to listen to an event on ESPN as if you were doing so on the radio. So that it's falling within those boundaries. We're working with the program suppliers to be able to show them an idea of this audience that's currently unmeasured and unreported.
Dan Patterson: And where do you see, in the next, say, 18 to 36 months, where do you see the role of data and machine learning in applications like Tunity?
Yaniv Davidson: When we started Tunity, three, four years ago, starting developing the technology, it was mostly based on computer vision and classic machine-learning. Which was great, but we then started really getting into deep learning, and everything you can do with computer vision plus machine learning, you can do better with deep learning. The reason being is, and please stop me if I get too technical, but with classic machine learning there's a limit to how much each of the samples that you have, will actually improve the performance of your algorithm. So, if I have 1,000 samples, I'll do maybe twice as good, as if I have 100 samples. With deep learning, that limitation is almost gone, so it's almost, you can see a linear curve in the performance, based on the number of samples. So if I have 1,000,000 samples, I'm going to be 1,000 times better than if I have 1,000 samples, to simplify it a little bit.
SEE: 5G technology: A business leader's guide (Tech Pro Research)
So, today Tunity gets tens of thousands of users using it actually every day. That means that on a weekly basis, we can improve the performance of our system, without even affecting our users. Nobody has to download a new version of an app or anything. Everything runs in the cloud. And all we do is train new, better neural networks, that actually perform better because, you know, if you ask a two-year-old kid, is this a car or not? Then, you know, he'll have to guess until he sees enough cars. When you see thousands or tens of thousands of TVs a day, and different channels, then with deep learning, we can train the neural networks to know that it's a TV, even if we're at a weird angle, or understand if this is a soccer game or a basketball game, which opens up this to a lot of more applications. Is this a Starbucks ad or a Budweiser ad?
So basically, the way I see deep learning taking Tunity's technology is, right now we're very focused. We want to let people hear any muted TV. You want to recognize the channel correctly. We want to do it quickly. And we want to give the exact timing, so that audio and video are synchronized. The next stage is, obviously not only supporting 200 or 500 live channels, is creating a huge library of content over-the-top, and you can think about different services that we can serve with that. And the next phase is actually understanding what's on TV right now, or where I am. Am I in a bar? Am I in a gym? So that's the way we see it on our end.