Kinect: Cheat Sheet

Updated: Microsoft's controller-less controller...

Kinect you say? Sounds like some kind of social networking service. You'll be asking me to touch base next.
Never fear, I'm not talking about LinkedIn et al - or indeed any kind of physical touching. Quite the opposite. The Kinect sensor is a Microsoft Xbox 360 peripheral that translates a gamer's gestures into input commands - turning the human body into the controller of the console, and relegating joypads and other remote controls to the cupboard under the stairs.

Unlike Nintendo's Wii games console, which uses controllers containing accelerometers and optical sensors to interpret gamers' movements, Kinect users don't have to hold a piece of plastic to participate. The system is what's known as a natural user interface (NUI).

So what exactly can Kinect do?
Kinect is capable of motion-sensing and speech recognition, so Xbox 360 owners can use hand or full body gestures or voice commands to play games and otherwise interact with their console, as well as for media playback. Kinect also includes facial recognition software so individual users can be automatically logged in to their Xbox accounts and the system can recognise different players when they stand in front of the sensor.


Kinect translates a gamer's festures into input commands, turning the human body into the console's controllerPhoto: Microsoft

Is Kinect popular?
As of March this year, Microsoft reported it had sold 10 million units after four months on the shelves. For a bit of context, Apple sold about 15 million iPads during the first nine months the tablet gadget was on sale.

Microsoft had predicted it would be able to ship three million Kinects by the end of 2010 - in the event it shipped eight million. So yes, Kinect has proved considerably more popular than Redmond expected.

Pretty impressive. So how does Kinect work then?
Kinect draws on a computer science and engineering research field known as computer vision, which explores ways to enable machines to extract data from images. Whether it's object recognition or motion tracking, computer vision aims to get machines to see the world in a similar manner to the way the human eye and brain work together to see and recognise things, movements, situations and so on.

Having an eye is imperative to being able to see, so a camera is essential for any computer-vision sensor. The technology inside Kinect includes both a standard RGB camera and a depth sensor in the form of an infrared camera that projects an invisible Z-shaped pattern over the room to gauge depth and help the software identify and map where its human controllers are in 3D space. Using infrared also means Kinect is able to function in the average gamers' favourite habitat: a darkened room.

In addition to cameras, the Kinect sensor contains a multi-array microphone - acting as an ear so it can hear voice commands - and a custom processor running proprietary computer-vision software. The device itself also contains a motor so it is able to tilt to better track its human masters as they dance around in front of it. For a detailed look at the guts of Kinect I recommend checking out this excellent Kinect teardown blog.

But if the hardware is clever, consider the software. Microsoft's R&D team divided the human body into 31 colour-coded segments, designing a body-part recognition algorithm to act as Kinect's brains and accurately predict which pixels are which body parts - training the system with millions of images of body positions derived from human motion-capture footage. Having an accurate and powerful algorithm to identify body parts is key to enabling Kinect's hardware to process all the myriad actions and poses gamers can generate fast enough for a game to be playable.

As one Cambridge doctorate holder who worked on Kinect, Dr Jamie Shotton, noted earlier this year, the software systems underpinning the peripheral "had been a dream of science fiction for many years".

A lot of people have worked on Kinect both inside and outside Microsoft - including various Cambridge University PhD students who joined the machine learning and perception group at Microsoft's Cambridge research labs, along with 3D sensing company PrimeSense, whose tech has been licensed by Microsoft for Kinect.

Tech pundits may also recall Kinect's pre-commercial codename - the enigmatic Project Natal.

I'll admit I'm intrigued, but it sounds like an awful lot of effort and fancy tech for a gaming peripheral…
Indeed, and if Kinect was just a gaming peripheral I'd end the Cheat Sheet here. But...