Gesture recognition: Wave if you think it's the future

At the moment the latest user interfaces are dominated by touch. But things are changing fast and technology is pointing us in new directions.

The history of interfaces has involved presses, clicks and swipes but the future will be gesture and natural language. Photo: MIT

Written in a Hong Kong coffee shop and despatched to TechRepublic over a free 100Mbps wi-fi service.

I have just watched a three-year-old walk up to a flat-screen TV and swipe. Nothing happened of course - so she then tapped it with a finger. When that too failed to produce any effect, she looked puzzled for a moment and then toddled off to find a controller.

Well, she would, wouldn't she? As a part of the iPad generation, she has been born into a multi-interface world where everything seems to be changing.

A few days ago I was in a friend's car where all the settings are controlled by a mouse. The driving position was delightfully clean and devoid of any obvious clutter and complexity - but that was because the controls were hidden in layers of drop-down menus.

In contrast, my car has a facsimile of a drive-mode selector - a stick connected to electronics and not a mechanical linkage. It also has real buttons and knobs and a touchscreen menu system that is only two layers deep and not six.

Some of our household appliances and IT devices have touchscreens and some don't. Usually I can remember which is which, but from time to time I find myself tapping and swiping a screen that isn't touch-sensitive.

And because I travel a lot, I also run into even more interface confusion in hotels, elevators and aircraft.

Overall, it seems that hotel and aircraft entertainment systems provide the biggest interface challenges for me and the rest of humanity.

No other sector has so many variants. And how frustrating when you can't even switch the lights on and off, navigate to the broadcast TV channels or find an on-demand movie.

But what really bugs me is that they mostly provide an online video tutorial on how to drive their unique interface. If there was ever an admission of bad design it is the operating manual, instruction book and video tutorial.

A few years ago someone presented me with a rather splendid multi-function wristwatch. It obviously cost a tidy penny, but it came with a 350-page handbook. I'm afraid it didn't last long. I don't read handbooks and the interface was in no way intuitive - so I just packed it all up and gave it to someone else.

I suspect that wristwatch is now continually circumnavigating the planet looking for an owner who likes the challenge of handbooks. I just hope it never catches up with me again.

The history of interfaces has involved the turn, press, click and swipe. What happens next? How about wave or gesture, facial recognition, body language and natural language?

The problems of speech recognition

Of all these natural modes, speech turns out to be the most difficult to engineer. While it's easy to realise in quiet environments, any form of background noise quickly degrades performance.

Try any form of voice recognition against the din of a busy city thoroughfare, such as Oxford Street in London, for example.

But to be fair, even holding a conversation with another human in those locations is tricky - and we have the advantage of a priori knowledge, context, cognition, and visual cues including subliminal lip reading to help us overcome acoustic masking and distractions. Our machines do not.

Will they get these additional inputs? Certainly, but not yet. I reckon we'll have to wait for a near 100 per cent cloud penetration to gain that facility. So it looks as if gesture space may well be the next big interface advance.

But there is a newcomer on the block that may well give you and me new capabilities. HTML5 is going to be a game-changer when it works in league with the cloud and artificial intelligence. That triumvirate of technology will enable you and me to design interfaces to our own apps and documents.

But best of all, it may just let us adjust interfaces to suit our individual preferences, as opposed to those of some anonymous designer or engineering team.

Personally, I'm a bit of a trekkie and I have always lusted after that Captain Kirk/Jean-Luc Picard interface - I just want to talk to machines. And with the current rate of progress in AI, it is conceivable that dream might become a reality before I expire.

In the meantime I suspect that people texting, speaking into free space, wearing headphones and Bluetooth earpieces while sitting, walking, running and driving will be replaced by people waving and pulling funny faces in front of invisible cameras.


Peter Cochrane is an engineer, scientist, entrepreneur, futurist and consultant. He is the former CTO and head of research at BT, with a career in telecoms and IT spanning more than 40 years.

Editor's Picks