Google's chess experiments show how to boost the power of AI


His group decided to find out. They created a new, diverse version of AlphaZero, which consists of multiple AI systems that train independently and on different situations. The algorithm controlling the overall system acts as a kind of virtual matchmaker: It's designed to identify which agent has the best chance of succeeding when it's time to make a move, Zahavi said. Is. He and his colleagues also coded a “diversity bonus” – a reward for the system whenever it picked strategies from a large selection of options.

chess piece

When the new system was made free to play their games, the team saw a lot of diversity. Diversified AI players experimented with new, effective openings and made novel—but solid—decisions about specific strategies, such as when and where to castle. In most matches, it defeated the original AlphaZero. The team also found that the variant version could solve twice as challenging puzzles as the original and could solve more than half of the total list of Penrose puzzles.

“The idea is that rather than finding a single solution, or a single policy, that will defeat any player here [it uses] The idea of ​​creative diversity,” Porter said.

With access to more and different games to play, the diverse AlphaZero had more options for difficult situations, Zahavi said. “If you can control the type of games he sees, you basically control how it will generalize,” he said. Those strange intrinsic rewards (and the tricks associated with them) can become forces for a variety of behaviors. The system can then learn to assess and value different approaches and see when they were most successful. “We found that this group of agents can actually compromise these positions.”

And, importantly, its implications extend beyond chess.

real life creativity

Porter said a diverse approach can help any AI system, not just those based on reinforcement learning. He has long used diversity to train physical systems, including a six-legged robot that was allowed to detect a variety of activities before he deliberately “injured” it. This allowed it to continue to move forward using some of the technologies it had developed earlier. “We were just trying to find solutions that were different from all the previous solutions that we had found so far.” Recently, he has also been collaborating with researchers to use diversity to identify potential new drug candidates and develop effective stock-trading strategies.

“The goal is to create a large collection of potentially thousands of different solutions, where every solution is very different from the next,” Cooley said. So—as various chess players learned to do—for every type of problem, the overall system can choose the best possible solution. Zahawi's AI system clearly shows how “exploring different strategies helps to think outside the box and find solutions,” he said.

Zahavi suspects that to get AI systems to think creatively, researchers simply need to push them to consider more options. This hypothesis suggests a strange relationship between humans and machines: perhaps intelligence is simply a matter of computational power. For an AI system, perhaps creativity boils down to the ability to consider and select from a large set of options. As the system receives rewards for selecting a variety of optimal strategies, this type of creative problem-solving is reinforced and strengthened. Ultimately, in theory, it could simulate any type of problem-solving strategy recognized as creative in humans. Creativity will become a computational problem.

Leemhecharat said that a diverse AI system is unlikely to completely solve the broad generalization problem in machine learning. But it is a step in the right direction. “It's reducing one of the shortcomings,” she said.

More practically, Zahavi's results match recent efforts showing how cooperation can lead to better performance on difficult tasks among humans. Most hits on the Billboard 100 list were written by teams of songwriters, for example, not by individuals. And there is still room for improvement. The diverse approach is currently computationally expensive, as it must consider many more possibilities than a simple system. Zahavi is also not convinced that the variational alphazero even captures the full spectrum of possibilities.

“I still [think] There is scope to find different solutions,” he said. “It's not clear to me that, given all the data in the world, there is [only] An answer to every question.”

original story Reprinted with permission quanta magazine, An editorially independent publication of Simons Foundation Its mission is to enhance public understanding of science by covering research developments and trends in mathematics and the physical and life sciences.