Microsoft and Facebook have teamed up with US university researchers to train a computer to simulate that same human curiosity and ask similar questions when presented with photos.
Their results varied, with some systems better at generating human-like questions than others. At their best, a system asked, “Was anybody hurt in this accident?” when shown a car crumpled by a collision.
At their worst, another posited the nonsensical, ”What caused the fall?’ when shown the aftermath of a hurricane. You can see other examples of the machine-generated questions below, labelled GRNN and KNN.
The bigger question is why simulate these reactions at all? The researchers say it’s about creating a machine that can pose and respond to more complex questions than virtual assistants such as Apple’s Siri can today.
“A system that asks relevant and to-the-point questions can be used as an integral component in any conversational agent, either to simply engage the user or to elicit task-specific information,” they write in the paper Generating Natural Questions About an Image.
“Furthermore, deciding what to ask about demonstrates understanding; some educational methods assess student understanding by their ability to ask relevant questions.”
The team also want to move beyond the more typical challenge posed to machine-learning systems of writing a caption describing what’s in an image.
As well as being more challenging, posing interesting questions is an ability that sets humans apart from their animal cousins, according to the report.
“Generating questions is an important task in NLP [natural language processing] and is more than a syntactic transformation on a declarative sentence.
“Interestingly, asking a good question appears to be a cognitive ability unique to humans among other primates.”
Another Microsoft research project into human-AI interaction proved less successful recently, with Redmond forced to take the Tay chatbot offline after it began spouting inflammatory and racist opinions fed to it by the public.
Teaching a machine to ask questions
The goal of this latest research was for a computer to generate a question that could potentially engage a human in starting a conversation, based on the image it was shown.
So, for instance, the machine should be able to ask about something more complex than just the number of horses shown in a picture.
The researchers created three datasets comprising a total of 15,000 images. These pictures were then shared out among contract workers, who wrote five questions for each image, generating a database of 75,000 questions.
Several neural networks were then fed photos together with the attached questions from the datasets, to train the systems in how to generate questions about images.
Using these datasets, the neural networks were then shown images they hadn’t been trained on and were asked to asked to create questions.
Human judges assessed how human-like the machine-generated questions were and machine-translation evaluation algorithms, such as Bleu, compared the similarity of the human questions with the synthetic questions generated for each image.
The best performing neural network was a Gated Recurrent Neural Network, based on a state-of-the-art multimodal recurrent neural network used for image captioning. A variant of this system outperformed other machine models in two-thirds of runs.
The researchers, who also include academics from Carnegie Mellon University and the University of Rochester, plan to release the image and question datasets to allow further work into developing a system that can generate human-like questions.
They expect further progress will depend on the development of machine models that are able to draw on general knowledge about concepts not seen in the image.