DeepMind sets AI loose on Diplomacy board game, and collaboration is key

Artificial intelligence systems have become increasingly well-adapted to a host of basic board games. Now, DeepMind is hoping to teach agents the art of collaboration using Diplomacy.

artificial intelligence

IMAGE: iStock/MaksimTkachenko

From Turochamp to DeepBlue, human-vs.-computer competition has captivated audiences for decades fueling plenty of hyperbole along the way. In recent years, artificial intelligence (AI) systems have claimed supremacy across a variety of classic games. The AI research and development company DeepMind has been behind many of these systems at the bleeding edge of innovation.

In March 2016, one such bout of bytes vs. brains pitted DeepMind's AI system, AlphaGo against Go legend and 18-time world titleholder Lee Sedol. With millions tuning in around the globe, the unthinkable slowly unfolded as AlphaGo picked apart arguably the best player of the abstract strategy board game of the past decade with surgical precision. The stunning AlphaGo victory awarded the AI system a 9 dan ranking, the highest such certification.

Now the company has set its sights on training an AI agent on another of mankind's mysterious board games; this time trying its hand at Diplomacy. After all, it was only a matter of time before we trained AI the skillful art of negotiation en route to global domination.

Building on lessons learned

Unlike more rudimentary games, Diplomacy involves a complex level of strategy and scheming. In a game like checkers, for example, a player has a rather limited decision about where to move an individual piece at any given time. The nuances and complexities, of course, increase with chess as a player must assign value to pieces and orchestrate a cohesive series of moves for success. In the esoteric world of boardgames, Diplomacy presents its own set of challenges for AI.

"Diplomacy has seven players and focuses on building alliances, negotiation, and teamwork in the face of uncertainty about other agents. As a result, agents have to constantly reason about who to cooperate with and how to coordinate actions," said Tom Eccles, a research engineer at DeepMind.

SEE: Building the bionic brain (free PDF) (TechRepublic)

Shifting from zero-sum to collaboration

AI systems have proved to be far superior to even the best human beings at zero-sum games like chess and Go. In this type of gameplay, there can only be one winner and one loser. Dissimilarly, Diplomacy requires agents to build alliances and foster collaboration.

"On the one hand, it is difficult to make progress in the game without the support of other players, but on the other hand, only one player can eventually win. This means it is more difficult to achieve cooperation in this environment. The tension between cooperation and competition in Diplomacy makes building trustworthy agents in this game an interesting research challenge," said Tom Anthony, a research scientist at DeepMind.

The ability to expeditiously vanquish a human player in a zero-sum game is certainly impressive, however, a richer layering of skills opens up another world of AI potential. Our day-to-day lives involve an intricate patchwork of balanced synergies; our individual needs often packaged within a larger group effort. That said, this research could enhance agents' ability to collaborate with us and one another, leading to a vast spectrum of real-world applications.

"In real-life, we often work in teams and have to both compete and cooperate. From simple decisions such as scheduling a meeting or deciding where to eat out with friends, to complex decisions such as negotiating with suppliers or clients or assigning tasks in a joint project, we constantly reason about how to best work with others. It seems likely that as AI systems become more complex, we'd need to provide them with better tools for effectively cooperating with others," said Yoram Bachrach, a research scientist at DeepMind.

The agents of digital transformation

Organizational workflows are typically hinged on collaboration and teamwork. As digital transformation takes hold across industries, organizations are increasingly utilizing a host of autonomous systems to increase efficiency and streamline operations. Enhancing agents with artificial soft skills related to teamwork and cooperation may be key moving forward.

"Artificial Intelligence is increasingly being applied to more complex tasks. This could mean that a number of different autonomous systems must work together, or at least in the same environment, in order to solve a task. As such, understanding how autonomous systems learn, act, and adapt to each other, is a growing area of research." Eccles said.

SEE: Managing AI and ML in the enterprise 2020: Tech leaders increase project development and implementation (TechRepublic Premium)

The importance of the sandbox

It's important to note that this research focused on understanding the interactions in a "many-agent setting," and used a limited No-Press version of gameplay, which does not allow communication. Further research and development will allow future agents to participate in full Diplomacy gameplay, leveraging communication to build alliances and negotiate with other players.

In the full version, "communication is used to broker deals and form alliances, but also to misrepresent situations and intentions," according to the paper. Teaching an agent to utilize other players as collaborative pawns to ensure victory does bring up a series of concerns.

In one such scenario, the authors of the report explain that "agents may learn to establish trust, but might also exploit that trust to mislead their co-players and gain the upper hand." The researchers reiterate the importance of testing these agents in an isolated environment to better understand developments and pinpoint detrimental behaviors if they arise.

"We start from the premise that all AI applications should remain under meaningful human control, and be used for socially beneficial purposes. Our teams working on technical safety and ethics aim to ensure that we are constantly anticipating short- and long-term risks, exploring ways to prevent these risks from happening, and finding ways to address them if they do." Anthony said.

Also see