Google DeepMind has revealed new improvements to its machine learning system, seen here in 2014 playing classic arcade games from the 1980s.
First computers trounced humans at chess and now they’re beating us at video games.
Google DeepMind’s AI made headlines last year when it was shown acing the classic arcade game Pong. Since then Google has been honing the algorithm’s joystick skills to the point where it can beat expert human players in even more games from the 1980s console, the Atari 2600.
Yesterday DeepMind researchers revealed that refinements to the system’s reinforcement learning software have improved the AI’s performance to the point where it can best people in 31 games. In the same set of tests, an earlier version of the DeepMind system only trumped people in 23 games.
The updates have brought the system close to performance of a human expert in various titles – including Asterix, Bank Heist, Q-Bert, Up and Down and Zaxxon.
This contrasts with the performance of earlier systems in Asterix, Double Dunk and Zaxxon –where the software scored a fraction of the total achieved by human players. In Double Dunk the upgrades allowed the system to go from struggling to play the game to roundly beating human scores.
Even with the improvements, certain games remain beyond the abilities of the DeepMind system – with the software still failing to rack up a noteworthy score on Asteroids, Gravitar and Ms. Pac-Man.
How the old Google DeepMind DQN system and the new Double DQN system performed relative to humans.
The DeepMind system hasn’t been coached on how to win at these games – instead it spends a week playing each of the 49 Atari games, learning how to improve its score and gradually getting better over time.
The system uses a deep neural network – groups of computer nodes organised in connected-layers that Google describes as a “rough mathematical cartoon of how a biological neural network works in the brain”. Each layer is responsible for feeding information back through the layers to top-level neurons that make the final call on what the system needs to decide. For example, in the case of an image recognition system, on what animal is in a picture, or, for an automated transcription, which word someone just uttered.
When it comes to playing video games, Google DeepMind’s Deep Q-network is fed pixels from each game and uses its reasoning power to work out different factors, such as the distance between objects on screen.
By also looking at the score achieved in each game the system builds a model of which action will lead to the best outcome.
The new DeepMind system – which uses the Double Q-learning technique – reduces mistakes the earlier software made when playing the games by lowering the chance of it overestimating a positive outcome from a particular action.
“The resulting algorithm not only reduces the observed overestimations” but also “leads to much better performance on several games,” say the DeepMind researchers in the paper.
However, the system’s continued poor performance in Ms. Pac-Man exposes a weakness that DeepMind discussed earlier this year. The limitation stems from the DeepMind system only looking at the last four frames of gameplay, about one fifteenth of a second of the game, to learn which actions secure the best results. This lack of long-term vision prevents the system from easily navigating mazes in games like Pac-Man.
In a few games the earlier algorithm used by the DeepMind network performed better than the new system using Double Q-learning. However, in instances where the earlier system performed noticeably better than the new, both systems’ scores remained above those of human players.
The uses that Google has in mind for DeepMind’s self-learning algorithms are unknown but DeepMind’s co-founder Demis Hassabis has said he sees a role for DeepMind’s software in helping robots deal with unpredictable elements of the real world. Google could well have a need for such software, having bought many different robotics firms in recent years, including Boston Dynamics, one of the world’s best known robot designers.