Deep learning is a technology with a lot of promise: helping computers “see” the world, understand speech, and make sense of language.

But away from the headlines about computers challenging humans at everything from spotting faces in a crowd to transcribing speech — real-world performance has been more mixed.

One deep-learning technology whose real-world results have often disappointed has been facial-recognition.

In the UK, police in Cardiff and London used facial-recognition systems on multiple occasions in 2017 to flag persons of interest captured on video at major events. Unfortunately, more than 90% of people picked out by these systems were false matches.

The shortcomings of publicly available facial-recognition systems were further highlighted in summer this year, when the American Civil Liberties Union (ACLU) tested the AWS Reckognition service. The test found that 28 members of the US Congress were falsely matched with mug shots from publicly available arrest photos.

Professor Chris Bishop, director of Microsoft’s Research Lab in Cambridge, said that as machine learning technologies were deployed in different real-world locales for the first time it was inevitable there would be complications.

“When you apply something in the real world, the statistical distribution of the data probably isn’t quite the same as you had in the laboratory,” he said.

“When you take data in the real world, point a camera down the street and so on, the lighting may be different, the environment may be different, so the performance can degrade for that reason.

“When you’re applying [these technologies] in the real world all these other things start to matter.”

Training for the real world

Deep learning relies on training deep neural networks — mathematical models loosely inspired by the structure of the brain — teaching these networks to make accurate predictions, typically by feeding them huge amounts of labelled data.

SEE: IT leader’s guide to deep learning (Tech Pro Research)

In facial-recognition systems, accuracy can suffer when the images the system has been trained on aren’t sufficiently varied — in terms of factors like the individuals’ pose, lighting, shadows, obstructions, glasses, facial hair, and the resolution of the image.

“The learning process allows the machine to be robust to the variability that is well represented in the training material, but not to the variability that is not represented,” said Alessandro Vinciarelli, professor in the school of computing science at the University of Glasgow.

The need to cope with the extreme variability and messiness of the real world makes training facial-recognition systems for use in public far more demanding, said Professor Mark Nixon, president of the IEE Biometrics Council and professor in Computer Vision at the University of Southampton.

“There are a lot of variables which conflate the recognition problem, so the current machine learning approaches would need a database of impractical size,” he said.

Given the difficulty of encompassing real-world variability in training data, The University of Glasgow’s Vinciarelli said better-controlling conditions such as lighting and positioning cameras to get a clear view of the front of the face — as is the case for facial-recognition systems at e-passport gates — would likely be the most realistic way of improving performance of public facial-recognition systems.

Along with increased complexity, real-world deployments of machine-learning systems have to be able to resist attempts to trick them. In the case of facial-recognition systems, examples of such attacks include printing a pattern onto glasses that disrupts the system’s ability to recognise faces, resulting in a facial-recognition system failing 80% of the time in one instance.

“There are a lot of bad actors in the world and you have to be bulletproof against adversaries,” said Bishop.

Another problem stemming from training data not being sufficiently varied is bias. One study found that facial-recognition systems were more likely to misidentify certain ethnic groups if those groups were underrepresented in the training data. And in the ACLU’s AWS Rekognition test the group found, “nearly 40 percent of Rekognition’s false matches in our test were of people of color, even though they make up only 20 percent of Congress”.

Machine-learning systems can also codify stereotypes and prejudicial beliefs in their training data, for example, a system came to associate the words “woman” and “homemaker” after being trained on Google News articles.

“Natural data arising from people has biases because we have biases as humans, and this technology detects biases and amplifies them if you apply it naively,” said Bishop.

However, Bishop doesn’t see these issues as intractable problems, rather as obstacles to be overcome by the machine-learning community, for example as researchers learn methods to counter training data bias and to train systems in a way that can better cope with real-world variability.

“There’s a very natural and understandable tendency to say ‘Oh, this thing works, great let’s rush out and start deploying it’, and then you have a very steep learning curve,” he said.

“As a community there have been a few bumps in the road, and we’ve been going over some of that learning curve, and now we recognize the importance of addressing all of those other issues.”

Accepting uncertainty

By their nature, machine-learning systems will also never deliver results with absolute certainty, says Bishop, for instance, they will say there’s a 90% chance that face is a match or 95% chance the word someone just spoke was ‘hello’.

Bishop says it’s important not to discount such systems because their answers will always have a degree of uncertainty, pointing to the useful work they can still do.

“This is part of this revolution that’s happening in software, we’re shifting from computation and binary, where every transistor is on or off and everything is about logic, to this world of data, to the real world where everything is shades of grey, where it’s probabilities, where it’s uncertainty,” he said.

“None of these systems are going to produce certainty as output. It’ll never say ‘You have cancer’ or ‘You don’t have cancer’, it’ll look at a blotch on your skin and say ‘There is a 73.5% chance that this is malignant’.”

The threshold for acting on those predictions depend on the context, he says, while you might ignore an email that only has a 5% chance of not being spam, if a mole has a 5% chance of being cancerous there’s a much stronger chance a doctor might order further tests.

These probabilistic systems can be helpful for advising humans, he said, for example a computer-vision system that allows a doctor to discount 90% of smear tests and focus on the remaining 10% still saves that clinician a lot of time, even if the system can’t replace the doctor entirely. In response to the ACLU report, AWS made a similar point about its Rekognition facial-recognition system, saying it was designed to “narrow” the choices available to a human, rather than making definitive judgement calls.

Bishop stresses that in an age of machine learning, we will have to accept a level of uncertainty in the answers our computers give us and the way they operate.

“If you demand an absolute rigorous mathematical proof that an autonomous vehicle will never kill anybody, you’ll never have an autonomous vehicle,” he said.

“If you’ve got a vehicle that’s an order of magnitude less likely to kill somebody than human-driven vehicles, perhaps it would be unethical not to deploy those.

“It’s getting our heads around the fact that we’re now very much in the world of uncertainty, not the world of logic.”

Read more: