Yesterday, I spent two hours trying to act like a person.
Should be pretty easy – after all, I’ve had 34 years’ practice. But it turns out that acting like an authentic human being is far trickier than it sounds.
I was one of four people chosen to be human testers in this year’s Loebner Prize, the annual event where artificial intelligence software tries to pass the Turing Test, invented by the brilliant British mathematician Alan Turing in 1950 to identify a machine that can think. The 2012 event took place at Bletchley Park in the UK, where Turing worked as one of the codebreakers who helped the Allies win World War II.
During the event each judge has simultaneous instant message conversations with a remote person and software designed to chat like a human – and then has to decide which is which. The contest has been run for 22 years, and in that time the chat software hasn’t managed to fool a third of the judges, which is the threshold that Turing set for identifying an intelligent machine.
My only way of convincing the judges that I was a fully paid-up member of the human race was through what I typed into a chat window. If I failed to make my answers relevant or my prose distinctive, I risked being written off as a chatbot, mechanically spewing out canned responses with little regard to the question.
The Loebner Prize, and to some extent the Turing Test itself, has been criticised by AI academics for lacking rigour and structure, and is considered by some to be more of a sideshow than a serious proving ground.
But even if the prize’s intellectual credentials are in doubt, the event poses a fascinating question: how do we distinguish between human and artificial communication? It’s a distinction that I found far trickier to make than I first thought.
As soon as the judge’s “Hello” or “Hi” popped up on my screen, I was faced with a dilemma: do I go with a stock greeting or is that too predictable and exactly what some faceless bot would choose.
Because every word you type is broadcast online and people are milling around the human test room reading your messages over your shoulder, I found that the tone of my communications was more guarded and less natural.
Throughout the conversations I kept questioning my own responses. I repeatedly asked myself whether I was being spontaneous enough and whether I should signal my humanity by dropping in a colourful fact or quirky turn of phrase.
But often the easiest response to the judges’ barrage of questions, including, “What was India like?” or “How did you find the journey here?” was the sort of generic blah that could emerge from a machine gluing together relevant subjects, verbs and objects.
As the contest continued, it struck me that, contrary to my preconceptions, much of what people say to each other isn’t a pure expression of human individuality, but a sprinkling of fresh thoughts on a bed of reheated phrases and sentiment. A style of discourse that many people would describe as robotic.
While my conversations with other humans sometimes felt laboured, it was the bots that provided the truly crazy flights of fancy. What generally gave the chatbots away wasn’t predictable comebacks or stilted tone, but their off-topic and outlandish replies.
One bot insisted that it was a cat, while another offered a judge condolences on the death of his pet dragon. At times the bots’ whacked-out patter read like a bad parody of drug-addled conversation.
Typically within 15 minutes of starting the half-hour discussions, the judge would despair of the nonsense from my digital counterpart, telling me, “The other guy is terrible” or “You’re clearly the human”.
In short bursts the bots were fine – and could answer certain questions plausibly, without lapsing into nonsense.
One of the judges told me that if you edited together snippets of the conversations with the bots they would look perfectly plausible but, unedited, the whole artifice comes crashing down.
Admittedly, the bots are at a disadvantage. I could immediately tip off the judges to my authenticity by mentioning the tour of the venue that preceded the contest or the garish colours of organiser Hugh Loebner’s Hawaiian shirt. Also, it takes only one slip-up by a bot to ruin an otherwise perfect dialogue and reveal its true identity.
Perhaps, as suggested by Brian Christian, who wrote a book inspired by his experience at an earlier Loebner Prize, a fairer test would be to have the humans remotely logging in over the internet.
Being a human tester at the event opened my eyes to how much human dialogue is predictable, but also to how a truly human-sounding chatbot needs to overcome far greater challenges than just coming off as robotic.
My idea of how bots communicate has been shaped by bad science fiction, shouting film names at automated cinema booking services and, more recently, Apple’s pocket assistant Siri.
But the reality of the chatbot, unfettered and free to discuss any topic, is an entity struggling to even understand the question put to it, sinking in a sea of associated meanings and grasping at whatever floats by.
I was worried about sounding like a robot when it seems that not even the bots can manage that.