Kinect you say? Sounds like some kind of social networking service. You’ll be asking me to touch base next.
Never fear, I’m not talking about LinkedIn et al – or indeed any kind of physical touching. Quite the opposite. The Kinect sensor is a Microsoft Xbox 360 peripheral that translates a gamer’s gestures into input commands – turning the human body into the controller of the console, and relegating joypads and other remote controls to the cupboard under the stairs.

Unlike Nintendo’s Wii games console, which uses controllers containing accelerometers and optical sensors to interpret gamers’ movements, Kinect users don’t have to hold a piece of plastic to participate. The system is what’s known as a natural user interface (NUI).

So what exactly can Kinect do?
Kinect is capable of motion-sensing and speech recognition, so Xbox 360 owners can use hand or full body gestures or voice commands to play games and otherwise interact with their console, as well as for media playback. Kinect also includes facial recognition software so individual users can be automatically logged in to their Xbox accounts and the system can recognise different players when they stand in front of the sensor.

Kinect

Kinect translates a gamer’s festures into input commands, turning the human body into the console’s controllerPhoto: Microsoft

Is Kinect popular?
As of March this year, Microsoft reported it had sold 10 million units after four months on the shelves. For a bit of context, Apple sold about 15 million iPads during the first nine months the tablet gadget was on sale.

Microsoft had predicted it would be able to ship three million Kinects by the end of 2010 – in the event it shipped eight million. So yes, Kinect has proved considerably more popular than Redmond expected.

Pretty impressive. So how does Kinect work then?
Kinect draws on a computer science and engineering research field known as computer vision, which explores ways to enable machines to extract data from images. Whether it’s object recognition or motion tracking, computer vision aims to get machines to see the world in a similar manner to the way the human eye and brain work together to see and recognise things, movements, situations and so on.

Having an eye is imperative to being able to see, so a camera is essential for any computer-vision sensor. The technology inside Kinect includes both a standard RGB camera and a depth sensor in the form of an infrared camera that projects an invisible Z-shaped pattern over the room to gauge depth and help the software identify and map where its human controllers are in 3D space. Using infrared also means Kinect is able to function in the average gamers’ favourite habitat: a darkened room.

In addition to cameras, the Kinect sensor contains a multi-array microphone – acting as an ear so it can hear voice commands – and a custom processor running proprietary computer-vision software. The device itself also contains a motor so it is able to tilt to better track its human masters as they dance around in front of it. For a detailed look at the guts of Kinect I recommend checking out this excellent Kinect teardown blog.

But if the hardware is clever, consider the software. Microsoft’s R&D team divided the human body into 31 colour-coded segments, designing a body-part recognition algorithm to act as Kinect’s brains and accurately predict which pixels are which body parts – training the system with millions of images of body positions derived from human motion-capture footage. Having an accurate and powerful algorithm to identify body parts is key to enabling Kinect’s hardware to process all the myriad actions and poses gamers can generate fast enough for a game to be playable.

As one Cambridge doctorate holder who worked on Kinect, Dr Jamie Shotton, noted earlier this year, the software systems underpinning the peripheral “had been a dream of science fiction for many years”.

A lot of people have worked on Kinect both inside and outside Microsoft – including various Cambridge University PhD students who joined the machine learning and perception group at Microsoft’s Cambridge research labs, along with 3D sensing company PrimeSense, whose tech has been licensed by Microsoft for Kinect.

Tech pundits may also recall Kinect’s pre-commercial codename – the enigmatic Project Natal.

I’ll admit I’m intrigued, but it sounds like an awful lot of effort and fancy tech for a gaming peripheral…
Indeed, and if Kinect was just a gaming peripheral I’d end the Cheat Sheet here. But…

…it’s the potential for Kinect outside gaming that’s really exciting. The interest the sensor has generated is in evidence in the many Kinect hacks that have sprung up since its launch in November 2010.

Whether it’s tapping Kinect to perform mock surgery or control robots or build 3D models of interiors or even to project a skeleton on to a moving human body for an impromptu anatomy lesson, there’s no shortage of weird and wonderful things being done with this particular slice of kit. Things that may be fun, yet are also concerned with much more than just gaming.

And enthusiast hackers are just the half of it. Despite an early wobble, Microsoft now actively encourages people to get creative with Kinect. In June the company released the Kinect for Windows SDK – enabling developers to create PC software that can tap the computer-vision goodness.

Next up, what about a Kinect app store for Windows? The thought has surely crossed Microsoft’s mind.

The company has also released a Kinect SDK for its robotics development toolkit, Robotics Developer Studio. Microsoft hopes Kinect will be able to play a role in bringing robotics to the mass market.

“Gaming is just the beginning, and I foresee this technology fuelling rapid advances in augmented reality and telepresence, internet and personalised shopping, and healthcare, to name just a few,” predicted Cambridge’s Shotton. “We are even looking at how touch-free interaction could find its way into the operating theatre so the surgeon can navigate the patient’s data much more quickly and without risk of contamination from a mouse or keyboard.”

Wowzers. So what could Kinect-style NUIs do for PCs? What sort of use-cases does Microsoft see coming down the line for gesture-based computing?
Back in 2009, Microsoft chairman Bill Gates talked up the potential for Kinect to be used in offices – for communication, collaboration and interacting in meetings.

Since then, Microsoft has been kicking the tyres of what Kinect-style NUIs can do for productivity, showing off various natural user interface R&D projects – such as a project that incorporates a NUI into a touchscreen so the screen can know and respond to what is touching it and how exactly that object or finger is oriented.

Another R&D project aims to improve remote collaboration by enabling objects to be rendered digitally and projected in 3D for glasses-wearing participants. Kinect tech is also being used to breathe photo-realistic life into avatars – which Microsoft believes could be used to improve the experience of remote working, email and other business-focused comms in future.

Redmond has recently released Avatar Kinect – a feature allowing Xbox 360 users to chat in groups via their avatars, with facial expressions and body language captured using Kinect. Smile and your avatar smiles for you – albeit rather uncannily since these are not yet photo-realistic digital doubles.

While the initial software has been designed for gamers to socialise, there’s no reason the tech couldn’t be used for other types of collaboration in future. Business meetings where you can slap your boss in the face and not get fired for it? Technically, at least, it may be possible.

Microsoft’s chief research and strategy officer, Craig Mundie, has talked about the possibility of miniature Kinect cameras being embedded into laptops and even mobile phones to enable remote videoconferencing for business meetings. “I could dream about a day when anywhere you have a camera, the back of your cellphone, or the bezel of your laptop, there