Natural language processing: A cheat sheet

Learn the basics about natural language processing, a cross-discipline approach to making computers hear, process, understand, and duplicate human speech.

natural-language-processing-concept-banner-header-vector-id1133656955.jpg

Image: Visual Generation, Getty Images/iStockphoto

It wasn't too long ago that talking to a computer and having it not only understand, but speak back, was confined to the realm of science fiction, like that of the shipboard computers of Star Trek. The technology of the 24th century's Starship Enterprise is reality in the 21st century thanks to natural language processing (NLP), a machine learning-driven discipline that gives computers the ability to understand, process, and respond to spoken words and written text.

Make no mistake: NLP is a complicated field that one can spend years studying. This guide contains the basics about NLP, details how it can benefit businesses, and explains where to get started with its implementation.

SEE: Natural language processing: A cheat sheet (free PDF) (TechRepublic)

What is natural language processing?

Natural language processing (NLP) is a cross-discipline approach to making computers hear, process, understand, and duplicate human language. Fields including linguistics, computer science, and machine learning are all a part of the process of NLP, the results of which can be seen in things like digital assistants, chatbots, real-time translation apps, and other language-using software.

The concept of computers learning to understand and use language isn't a new one—it can arguably be traced all the way back to Alan Turing's Computing Machinery and Intelligence paper published in 1950, which was where the idea of the Turing Test comes from. 

In brief, Turing attempted to determine whether machines could behave in a way indistinguishable from a human, which fundamentally requires the ability to process language and respond in a sensible way. 

SEE: All of TechRepublic's cheat sheets and smart person's guides

Since Turing wrote his paper, a number of approaches to natural language processing have emerged. First came rules-based systems, like ELIZA, which were limited in what they could do to a set of instructions. Systems like ELIZA were easy to distinguish from a human because of their formulaic, non-specific responses that quickly become repetitive and feel unnatural: It lacked understanding, which is a fundamental part of modern NLP.

With the advent of machine learning, which allows computers to algorithmically develop their own rules based on sample data, natural language processing exploded in ways Turing never could have predicted. 

Natural language processing has reached a state where it's now better at understanding human speech than real humans. Even this impressive milestone still falls short of truly complete NLP, though, because the machine performing the work was simply transcribing language, not being asked to comprehend it. 

Modern NLP platforms are also capable of visually processing speech. Facebook's Rosetta, for example, is able to "extract text in different languages from more than a billion images and video frames in real time," TechRepublic sister site CNET said.

SEE: Managing AI and ML in the enterprise 2020: Tech leaders increase project development and implementation (TechRepublic Premium)

Additional resources

What are the challenges of natural language processing?

Computers don't need to understand human speech to speak a language--the machines operate on a kind of linguistic structure that allows them to accept input, process data, and respond to commands.

Languages like Swift, Python, JavaScript, and others all have something in common that natural language lacks: Precision.

Human speech isn't precise by any stretch of the definition: It's contextual, metaphorical, ambiguous, and spoken imperfectly all the time, and understanding language requires a lot of background and interpretive ability that computers lack.

Computational linguist Ekaterina Kochmar, in a talk about natural language processing, explained that words exist in a sort of imaginary semantic space. In our minds, Kochmar said, we have representations of words, and words with related or similar meanings live close together in a web of semantic understanding.

Thinking of language in that manner allows machine learning tools to be built that let computers algorithmically create their own semantic space, which lets them infer relations between words and better understand natural speech.

SEE: Robotic process automation: A cheat sheet (free PDF) (TechRepublic)

That doesn't mean challenges are overcome, though. Going from understanding simple, precise statements like those given to digital assistants to producing sensible speech on their own is still difficult for NLP programs. Candy hearts produced by artificial intelligence (AI) taught to understand romantic language are predictably absurd, and 1 the Road, a novel written entirely by an artificial neural network, is generally nonsensical with only the most occasional glimpse of semantic understanding, which could be entirely chalked up to chance.

As advanced as natural language processing is in its ability to analyze speech, turn it into data, understand it, and use an algorithm to generate an appropriate response, still generally lacks the ability to speak on its own or grasp the ambiguity and metaphor that is fundamental to natural language. 

We've mastered the first part: Understanding. It's the second part, generating natural speech or human language, that we're still a bit stuck on. And we might be stuck there for a while, if pioneering mathematician and computer scientist Ada Lovelace is correct: She posited that computers were only able to do what we told them to, and were incapable of originality. Known as Lady Lovelace's Objection, it's become a common part of criticism of the Turing Test and thus a criticism of natural language processing: If machines can't have original thoughts, then is there any way to teach them to use language that isn't ultimately repetitive?

Additional resources:

How is natural language processing used?

Natural language processing has a lot of practical applications for a variety of business uses. 

Google Duplex is perhaps the most remarkable use of natural language processing available as an example today. The digital assistant, introduced in 2018, is not only able to understand complex statements, but it also speaks on the phone in a way that's practically indistinguishable from a human—vocal tics and all. Duplex's goal is to carry out real-world tasks over the phone, saving Google users time spent making appointments, booking services, placing orders, and more. 

Ninety-eight percent of Fortune 500 companies are now using natural language processing software to filter candidates for job searches with products known as applicant tracking systems. These products pick through resumes to look for appropriate keywords and other linguistic elements.

SEE: Robotics in the enterprise (free PDF) (TechRepublic)

Chatbots are quickly becoming the first line of online customer service, with 68% of consumers saying they had a positive experience speaking with one. These bots use natural language processing to address basic requests and problems, while also being able to elevate requests to humans as needed.

Uses of NLP in healthcare settings are numerous: Physician dictation, processing hand-written records, compiling unstructured healthcare data into usable formats, and connecting natural language to complicated medical billing codes are all potential uses. NLP has also been used recently to screen COVID-19 patients.

NLP can be used to gauge customer attitudes in call center environments, perform "sentiment analysis" on social media posts, can be used as part of business intelligence analysis, and can supplement predictive analytics.

Natural language processing has a potentially endless variety of applications: Anything involving language can, with the right approach, be a use case for NLP, especially if it involves dealing with a large volume of data that would take a human too long to work with. 

Additional resources:

How can developers learn about natural language processing?

NLP is a complicated topic that a computer scientist could easily spend years learning the ins and outs of. If your objective is being at the cutting edge of NLP research, it's probably best to think about attending a university known for having a good computational linguistics program.

Developers who want to learn to make use of current NLP technology don't need to dive that far into the deep end. Text analytics firm MonkeyLearn has an excellent rundown of resources and steps to get started with natural language processing; here are a few key points from its guide.

MonkeyLearn's guide also has a variety of links in it to articles, research, and journals that any budding NLP developer should be aware of. 

Additional resources: 

What is the best way for businesses to get started with natural language processing?

Every business uses language, so there's a good chance you can come up with at least one or two uses for natural language processing in your organization—but how do you go from thinking about what NLP could do for you to actually doing it? There are a lot of steps to consider.

For starters, you need to know what your objectives are for NLP in your business. Do you want to use it to aggregate data as an analytics tool, or do you want to build a chatbot that can interact with customers via text on your support portal? Maybe you want to use NLP as the backbone of an e-mail filter, understand customer sentiment, or use it for real-time translation. 

No matter what you want NLP to do for your business you need to know your goal before even starting to think about achieving it.

SEE: Top cloud providers in 2020: AWS, Microsoft Azure, and Google Cloud, hybrid, SaaS players (TechRepublic)

Once you know what you want to do with natural language processing, it's time to find the right talent to build the system you want. You may already have developers in-house who are familiar with Python and some of the NLP frameworks mentioned above. If that's the case, get them involved in the planning stages from the very beginning. 

If you don't have anyone in-house who can develop natural language processing software, you're faced with a choice: Hire new people or bring in a third-party that specializes in NLP solutions.

If you choose to go about your NLP objectives in-house, you'll need to find the right software solutions or providers for hosting your NLP platform, and there are plenty of recognizable names to choose from. 

IBM Watson has options, AWS offers Amazon Comprehend and other NLP services, Microsoft Azure has NLP services as well, as does Google Cloud. Choosing the proper platform will require input from your developers because they're the ones who will be working with the software every day, and your NLP initiative's success may hinge on how well they can use the platform.

Additional resources: