As highlighted in the video above, researchers applied the technique to obtain the highest score possible in Ms. Pac-Man, an early '80s arcade game that’s notoriously difficult to crack. Maluuba’s AI beat the best human score ever achieved by 4X.
Rather than brute-force attack Ms. Pac-Man, the Maluuba team used what’s called Hybrid Reward Architecture. This method relied on 150 agents, each of which worked in parallel to master specific aspects of the game. Data collected by each agent was then fed to a top agent – kind of like a senior manager – that processed everything and made the final decision on where to move the game’s character.
Doina Precup, an associate professor of computer science at McGill University in Montreal, said the idea of having the agents work on different pieces of the puzzle to achieve a common goal is incredibly interesting. In fact, she said the technique is similar to some theories on how the human brain works and could have broad implications for teaching AIs to perform complex tasks with limited information.