TechRepublic's Dan Patterson asks Evernote CTO Anirban Kundu to explain how machine learning in the Evernote platform can lead to a world of content. The following is an edited transcript of the interview.
Dan Patterson: Digital transformation impacts almost every business, but it's often hard to bridge the analog and the digital world. Anirban, your company has pioneered transforming the tactile, the items that we have, and that we use, and that we compose on in the digital world, or in the tangible world to the digital world, and I know that your OCR capabilities have been pretty strong for a long time. But help us understand the technology behind how OCR works.
Anirban Kundu: Okay. So before I get rolling, actually a little bit of context, a little bit of history would probably apply in this particular question. Evernote actually was founded by Stepan Pachikov, a brilliant scientist from Russia, and he essentially sold, or licensed his technology to the Apple Newton product 25-years-ago... somewhere in that time-frame, and continued working on a product using a handwritten recognition engine called CalliGrapher, which became the foundations of a company called ParaGraph.
So, the whole idea of bridging the gap between the physical world, and things that are taken in handwritten forms, or even in textual forms, in a printed form, and then making that available in some form we can remember, and be able to search an index... that has always been the foundations of the company. It initially started off with various levels of neural networks that essentially allowed you to be able to detect low quality or distorted text and signs, things of that sort, and then moving further into both OCR and intelligent character recognition, even, ICR technologies.
SEE: Big data policy (Tech Pro Research)
But then we went further down that path, where we realized that we can't limit it only to Roman languages. We end up spending a tremendous amount of time also being able to recognize CJKO, Chinese, Japanese, Korean languages, and even in the handwritten form. And the amount of work that it ends up, involving getting those languages to work, which includes not being able to truly recognize what a word boundaries, or even in the handwritten form, understanding symbols rather than actual characters that are well-placed out, was pretty much the foundation of the company for about the first 10 years of its existence.
So, Evernote has done a tremendous amount and has a tremendous amount of patents related to the idea of being able to go from this real world in handwritten form, and then taking it all the way into the digital form. It's a combination of machine learning, where it's neural networks to be able to distinguish particular characters, then piping that into various levels of SVMs or support vector-machines, which is another form of machine learning, to be able to do it at a very rapid rate. And then also being able to probabilistically say that, "Oh, I think this is the word or this is the sentence that you're trying to work on."
One of the things that OCRs always work on, is this idea of being able to identify words. Well, in Evernote's CR technology, it's not just words, we can actually just identify the context of the sentence, which helps us figure out better how and what is it the user was saying. So those have all been placed into the foundations of how Evernote essentially does character recognition across multiple languages.
Dan Patterson: So, you said the magic word there a few times, which is machine learning. Obviously, this is a technology that's powering much of digital transformation across industries. If you were a user like me, and you've dumped many components of your life, not just notes, into Evernote, search and the discovery functions within the product are almost as essential as the composition functions. How has machine learning helped the product evolve, and how does it help the end user-experience?
Anirban Kundu: I would say it's helped, machine learning has helped in a bunch of different ways. If you really think about Evernote, the core context of what we're trying to do starts off with collecting. Can I help you remember a particular moment, a point in time, right? And sometimes it's visually based, sometimes it is something that you've written, things of that sort. But that's all great. If you can't retrieve it at a later point in time, then it's kind of useless, kind of fruitless in essentially collecting that information.
For us, when we apply machine learning it applies in three different contexts. The first one is obviously, especially if it's with images or handwritten notes or things of that sort, we end up applying our own internal OCR technology, and ICR in certain contexts, depending on the language and things of that sort. To be able to identify what's the probability or the probabilistic score of what is it that you're trying to remember? So it's not just one thing, it's a probability given the context of a statement. And so we use that to help you then find that document at a later point in time, very easily using the search technology.
The second one that we apply, and quite a bit, is recommendations in helping you auto tag your content, or auto organize your content, so that even though you may not have said something explicitly, we can semantically try to understand the meaning of what that is, and then help you find it with that so that you don't have to actively think about, "Oh, I have to tag up this content," or, "I have to create all of these essential building blocks of how I'm going to recall this content while I'm creating the content." We work really hard on trying to figure out how we can limit the amount of cognitive overload or cognitive changes or context switching that you have to do while you're creating versus while you're retrieving the content. So that's the second category.
SEE: End user data backup policy (Tech Pro Research)
And then there's a third category of it, which is, especially in the context of teams, because we've come to recognize that a lot of the content that Evernote users end up creating is in the context of businesses, and in the context of working with other people. And so, there's a lot of work that we end up doing thinking about who we should be nominating this content to be shared with, to be connected with, and then when that person is connected with it how do we make it really fast for them to be able to also index that content? Things of that sort.
So there are three categories of things where machine learning plays a tremendous part for us. We say there are other things related to use and content management that we also deal with, but those are not things that the users directly see right away.
Dan Patterson: Were there challenges when implementing... I know this is a massive question, but when implementing machine learning technology for those three core functionalities, were there any challenges you've experienced that other businesses or CISOs might get a lot out of learning from? I know it's hard to talk about challenges or problems without giving away some of the secret sauce, and I don't want you to do that, but I think machine learning is really having an impact in business across the board right now, and I wonder if you could help us learn from some of the challenges you had implementing these.
Anirban Kundu: The first thing I would say... There are basically two points that I want to make for this. The first point is, it truly starts off by taking a user's perspective of what is it that they want to achieve? What is the net feature or functionality that the user ends up getting and how do we, as a result, then expose that functionality to them? So that's the first thing you have to have, this view of the user-first, not from the corporation or the company, or the product perspective first. So that's the first thing I would say that has been very enlightening to me, in terms of how we do machine learning at Evernote.
The second one, which is in terms of... there's obviously policies that have to be set up, in terms of what can or cannot be machine-learned. There are things of that sort, but then there's also other things such as what kind of algorithm should we be using in what context?So, there are various different algorithms that we end up using machine learning. It's all the way from neural networks to support vector machines as I referred to before, and even random forests. It depends on the amount of data, the amount of content that we have for a particular type of algorithm we're running into, and in terms of the features that we can extract out.
So, for instance, for neural-networks and deep-learning, we apply those more to digital imaging based technologies. They're not used as much on auto tagging, or being able to search for a particular piece of content as much, because the diversity of content and the amount of content that we can verify against is obviously limited in that case versus in the digital image side of things, especially with things like ImageNet giving us access to a tremendous amount of dataset, it makes it a lot simpler. So, the algorithm depends on the amount of data we have and the type of application that we're trying to apply it to.
And then there's the last bit, which is what is an acceptable level of goodness?It goes all the way from perfection to a certain level of uncertainty, but blending it with the multiple levels of multiple different algorithms, kind of like in a random forest model, is also something that we spend a lot of time thinking. So it's not about, "Oh, we have to get it to nine nines if possible even," you never get to that level, but that's the model of thinking. What is good enough to determine something that's being ready to be used by the user? So those are things that ...
Dan Patterson: When you talk about good, how do you define the value sets or the success that defines good?
Anirban Kundu: It depends on the application, right? So for example, in image recognition, at the base level of understanding a token or a word, that can be a little bit less accurate, but when you put it then in the context of a sentence when you combine them all put together, then you expect to achieve that higher level of probability of this is good as a sentence, as a context, as a whole, for example. So it depends on the context.
In terms of recommendations for, "Hey, I think this ought to be put into this notebook but it needs to be tagged in this form," then the level of goodness can, again, drop a little bit because we're letting the user interact with the system. "Oh yeah, I do agree with your assessment or I don't agree with your assessment, and then make a change, and then that ends up feeding back in the system making this better. So, the question is, accuracy, in being able to retrieve, is something that we expend a tremendous amount of time to be as good as possible. But then when it's a recommendation where the user has a chance to view these various levels of the recommendation, then it can be a little bit lesser in nature. So it depends on the context, I'd say.
Dan Patterson: I wish we had hours to continue a conversation about machine learning, OCR, and the other technologies that are the core of Evernote. But I wonder if you could leave us with a forecast, maybe looking ahead at 18- to 36-months in terms of the capabilities of machine learning in business, and the enterprise. And I ask about the near term because these are technologies that companies now have practical knowledge of, and are experiencing the implementation of. So it's really important to get this right, right now.
Anirban Kundu: Let's do this, let's stay in the context of collect, 'cause clearly this is a very, very large topic and there's multiple different avenues we can go down.
Dan Patterson: Indeed.
Anirban Kundu: But in the collect context, I would say Evernote is thinking about fundamentally three different things, one which is not just, "Hey can we recognize this handwritten note?" But even more than that, "Can we map that handwritten note into some kind of a semantic meaning that can then feed into, even potentially a third party service." So for example, can we take a handwritten note that a sales rep has written as to his connections with his potential client, and then automatically feed that into something like sales force? So that's the moving of the content into some action that is driven by the content, so that's the first set of things.
The second one, which is an understanding that people don't just only deal with letters and characters and words, they also deal with images and entities and relationships that exist between those particular images, boxes, and circles or whatever there may be. And so one of the things that we're trying to do is understand and be able to find relationships that might exist across these diagrams of boxes, or images, these entities. And then taking it a step further which is to say, "Oh, you know what, I think I recognize this as being an org chart of some sort," so when the user then ends up doing a search off of something like, "I remember I drew an org chart, but I didn't put any words in there that I could remember or search on," they could just say, "Oh, I remember I drew an org chart, can you find me that particular note that had the org chart in it?" Or, "I had two boxes and a circle, I think that's kinda what it looked like, can you find that?" And we can start helping you find those particular pieces of information, and that's all driven by the wonders of machine learnings, actually.
And then there's the third category of collect that we think a lot about which is how do you, not just work in the context of visual side of things, but can you apply that to audio and videos in nature? And in the audio context is, how can we make transcriptions from audio even significantly better by understanding and mapping it and marrying it to the content that you already have? Because then, for example, an acronym that you used that is only unique to your corpus of data or your business' corpus of data, that acronym itself can be transcribed from the audio that you speak of in a particular context. Those are things that we think a lot about and those are things that we are working on right now, and all driven by the foundations of machine learning as the platform.
- Amazon Neptune is here: 6 ways customers use the AWS graph database (TechRepublic)
- Special report: Turning big data into business insights (free PDF) (TechRepublic)
- Transforming Graph Data for Statistical Relational Learning (TechRepublic)
- GraphQL for databases: A layer for universal database access? (ZDNet)
- Open source "Gandiva" project wants to unblock analytics (ZDNet)
Dan Patterson has nothing to disclose. He does not hold investments in the technology companies he covers.
Dan is a Senior Writer for TechRepublic. He covers cybersecurity and the intersection of technology, politics and government.