Basic text search today is both simple and solved. Open a site in your browser, press Ctrl-F, and type a word. The browser finds the text–or fragment of text–wherever it appears. A search for “cat” will highlight those three letters in words like “cat,” “catalog,” “placate,” and “Octocat.” More powerful techniques let you find text that matches a complex pattern.
Text search serves as a core capability built into modern word processors, spreadsheets, databases, and, of course, search engines.
Basic image search works, too. An image search for “cat” fills your screen with photos of felines. The search works, in part, because of the millions of images on the web labeled as cats.
Image recognition, however, remains much more difficult. When provided a photo, an image recognition system can answer the question, “Does this photo contain a cat?”. Humans have little difficulty with this sort of simple recognition problem. But, this type of system has historically been difficult to code.
In late 2015, Google released two systems to help developers create powerful new applications that recognize images and learn patterns.
Recognize images: Cloud Vision API
The Cloud Vision API provides programmers tools to identify, classify, and group images. Some elements of the tools may seem familiar. Google Photos, for example, already allows you to search your photos for images of specific people, common objects, and notable landmarks. Character recognition identifies text from many languages in an image. Cloud Vision offers these capabilities to coders via the API.
Cloud Vision adds a layer of image analysis, as well. The system can indicate if an image likely shows violence or contains explicit adult content. It also offers sentiment analysis, which classifies people’s faces based on displayed emotions (e.g., smiling, crying).
Learn patterns: TensorFlow
Google also released TensorFlow, which provides anyone an open source library for machine learning. (On release, the Python API for TensorFlow was “the most complete and easiest to use,” although Google also provided a C++ API.)
A machine learning system functions a bit differently than a conventional application. Results emerge from data in a machine learning system, unlike in a traditional application where the program defines the results.
For example, a conventional program might offer many ways to display letters in a variety of fonts, sizes, colors, and styles. The system defines each of these characteristics. You adjust the settings to show the letter in whatever combination you choose.
A machine learning system flips the process. You gather examples of letters, then feed them into the system. The data might include images of text from print publications, web pages, posters, signs, books, magazines, cereal boxes, receipts, and so on. The system “learns” to distinguish letters based on the data provided.
Machine learning already helps Google identify spam, translate language, recognize images, and return search results. TensorFlow gives developers the core tools needed to build systems that learn from data.
Cloud Vision and TensorFlow open significant new worlds for developers.
Cloud Vision points to a future when image search tools will be as common as text search tools are today. Image search can help us understand how people use products. Image analysis may help journalists, political pundits, or brand managers analyze attitudes and emotions. And, content detection can help protect young students from violent or offensive images in educational apps. With Cloud Vision, Google brings the benefits of image recognition as a service to developers.
Machine learning and TensorFlow will likely have an even greater impact. Machine learning will help as we develop autonomous vehicles, design smarter things, and build connected cities. Much like the smartphone increased the utility of a phone, machine learning will boost the power of a connected cloud platform. Machine learning may be the feature that sets the Google Cloud Platform apart from competitors.
Maybe one day a machine learning system will be so good that it will recognize not only that a photo contains a cat, but also that it is your cat. However, I’m not sure if we’ll ever devise a system to tell you what your cat is thinking.
What do you think?
Have you explored what image search and recognition tools might mean for your organization? Or built a machine learning system and put it to work? What uses of image recognition and machine learning have you seen?