In the latest sign of artificial intelligence (AI)’s eventual dominance of the workplace, a Canadian deep learning startup-turned-division of Microsoft Corp. has successfully created an AI-based system that achieved the maximum possible score on Ms. Pac-Man.
That might not sound like the most complicated task in the world – especially since the edition in question was the Atari 2600 version and not the arcade original – but as Microsoft senior writer Allison Linn explains in a recent blog post, the challenge facing researchers at Montreal-based Maluuba was more daunting than you might think.
“A lot of companies working on AI use games to build intelligent algorithms because there’s a lot of human-like intelligence capabilities that you need to beat the games,” Maluuba program manager Rahul Mehrotra explains in the story, noting that the variety of situations you can encounter while playing the games makes them a good testing ground.
In other words, the techniques used to develop the AI-driven Ms. Pac-Man master (or is that mistress?) could serve as a foundation for developing other AI agents capable of making decisions of their own in the future – including the types of decisions required to complete more complex work.
Like many of its ilk, Ms. Pac-Man was intentionally designed to be easy to learn yet nearly impossible to master so that players would keep dropping in quarters, with co-creator Steve Golson noting that Ms. Pac-Man in particular was programmed to be more random than the original Pac-Man, so it would be harder for players to finish.
How it works
To conquer Ms. Pac-Man, which involved achieving the maximum possible score of 999,990, Maluuba’s researchers used a mix of reinforcement learning and a method they referred to as “Hybrid Reward Architecture,” programming more than 150 agents to operate in parallel with each other, with a top agent mediating between them. Some agents, for example, were “rewarded” for successfully finding one specific pellet, while others were ordered to avoid ghosts. The top agent then evaluated the agents’ suggestions before deciding where to move Ms. Pac-Man.
“The top agent took into account how many agents advocated for going in a certain direction, but it also looked at the intensity with which they wanted to make that move,” Linn writes. “For example, if 100 agents wanted to go right because that was the best path to their pellet, but three wanted to go left because there was a deadly ghost to the right, it would give more weight to the ones who had noticed the ghost and go left.”
That divide between the top agent’s egalitarian programming and each individual agent’s individual desire to achieve its specific result or collect its specific pellet regardless of the obstacles or ghosts in the way, proved to be the algorithm’s secret sauce.
“There’s this nice interplay between how they have to, on the one hand, cooperate based on the preferences of all the agents, but at the same time each agent cares only about one particular problem,” Maluuba research manager Harm Van Seijen says in the story. “It benefits the whole.”
The reinforcement learning process, in which agent receives positive or negative responses for whatever action it tries, then learns through trial and error to maximize the positive responses, also played a role, Linn writes.
More common in AI programming is supervised learning, a system in which agents improve their performance by being fed examples of good behaviour.
In the story, program manager Mehrotra surmises that the method used to develop a Ms. Pac-Man mistress (or master) could one day be used to help a company’s sales department, basing its decisions on factors such as which clients are scheduled to have their contracts renewed, which contracts are most valuable to the company, and whether a potential customer is available or not.
Such an agent would then leave sales teams free to focus on what they do best – driving sales – while eliminating the grunt work of identifying receptive clients.
“It really enables us to make further progress in solving these really complex problems,” research manager Van Seijen says.
Check out a video walking viewers through Maluuba’s process below.