AlphaZero: The AI from Google which mastered Chess in 4 hours

University of Toronto Machine Intelligence Team

5 min readNov 12, 2020

AlphaGo was a machine learning program made to play the two player game of Go, a strategy board game originating from China which is considered to be more complex than chess. The program began development in 2014 by Google’s UK based artificial intelligence subsidiary DeepMind. The premise of the program was to begin with only the basic rules of GO, making arbitrary chosen moves, then to use reinforcement learning and games of self play to determine its own strategy.

At first glance, making a program to play a game seems to be largely inconsequential. But the achievements of the Deep Learning algorithms if AlphaGo and its successor AlphaZero brought the power and capabilities of machine learning into the limelight, and are commonly credited as the catalyst for the machine learning craze.

*Figure 1: Different iterations of DeepMind’s machine learning game engines*

A year later after its development in 2015, AlphaGo won against European Go champion Fan Hui, thus becoming the first Go program to beat a professional player without handicaps. And then in 2016 AlphaGo beat the reigning world champion Lee Sedol.

Finally in 2017, AlphaZero was made as a successor to AlphaGo Zero, an improved version of AlphaGo. But this time the program was not designed solely to play GO, but to also play any other two player board game. Starting from the basic rules of chess, after just 4 hours of self learning AlphaZero mastered chess and outperformed the reigning AI champion, Stockfish 9. AlphaZero then learned GO and Shogi and defeated its predecessor AlphaGo in 30 hours, as well as the top Shogi Elmo in only 2 hours.

AlphaZero and its predecessors are not the first programs to beat humans at our own game. AlphaZero is commonly compared to IBM’s Deep Blue, which famously defeated the reigning world chess champion Gary Kasporov in 1996, becoming the first computer system to do so. The two programs share technical similarities for their game engine. Both use a decision tree representation to hold possible future boardstates which could result from the current boardstate, the current boardstate being the root node in the decision tree.

*Figure 2: An illustration of what a chess player’s decision tree would look like*

A tree searching algorithm then traverses the decision tree and passes the boardstate into an evaluation function to assign a points value to the boardstate, indicating how attractive the boardstate is to each player. These points values are used to decide which move to take at a given point in a game.

Where AlphaZero differs from DeepBlue and other traditional chess engines is in the evaluation function. Traditional game engines had to create their own evaluation function using handcrafted heuristics designed with the help of an expert in the game.

Figure 3: IBM’s Chess Grandmaster consultant Joel Benjamin playing against DeepBlue

AlphaZero uses a deep learning neural network (p,v) = f(s)which takes the board state s as its input and has parameters θ. f(s) then outputs a vector of move probabilities for each action a, and a scalar v which estimates the outcome z of the game from board position s.

To decide on a move, the decision tree is traversed with a Monte Carlo search algorithm(see reference 6) with root node holding the current board state sᵣₒₒₜ. For each state s, the traversal algorithm chooses moves which have high move probability, low visit count, and high value for continued exploration, until it reaches a terminal gamestate. The search returns a vector that represents a probability distribution over the possible moves . Finally, a move is chosen which has high value but also high probability.

At the end of the game, the output is quantized into a variable Z, with a win being 1, draw 0, and lose as -1. The value of Z is measured against the scalars vₜ for the moves taken during the game and placed into a loss function.

The parameters are of are then tweaked by the gradient descent in order to minimize the loss function. In the loss function the coefficient c is a hyperparameter which regulates the weight of the parameters.

Being self taught, the Alpha programs are not constrained by conventional wisdom of a human player. The advantage of this trait was demonstrated in the match against Go champion Lee Sedol.

*Figure 4: Go world champion Lee Sedol’s reaction to AlphaGo’s move 37*

On move 37 in game two of the best of seven match, AlphaGo made a play which was described to be “unimaginable” to human intuition. The level of unconventionality of this play in GO would be similar to a chess player sacrificing their queen in order to capture a knight. But despite how unorthodox the move was, it gave AlphaGo its upper hand which allowed it to eventually take game two.

By being self taught, AlphaGo was able to make a move which was unthinkable for humans, which shows that AI has the capability to find unique solutions to problems which humans would not think to try.

And with the release of AlphaZero, who’s general purpose algorithm mastered 3 different games rather than just one, it makes one wonder what else machine learning and AI can be applied to.

References

[1]https://deepmind.com/blog/article/alphazero-shedding-new-light-grand-games-chess-shogi-and-go

[2]https://science.sciencemag.org/content/362/6419/1140.full?ijkey=XGd77kI6W4rSc&keytype=ref&siteid=sci

[3]https://hub.packtpub.com/deepmind-alphago-zero-game-changer-for-ai-research/(timeline)

[4]https://www.businessinsider.com/why-google-ai-game-go-is-harder-than-chess-2016-3

[5]https://www.sciencedirect.com/science/article/pii/S0004370201001291

[6]https://yourbasic.org/algorithms/las-vegas/#

[7]https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DHT-UZkiOLv8&psig=AOvVaw2VMtQJnKmYU2KKj4eTL_sr&ust=1600465513007000&source=images&cd=vfe&ved=0CA0QjhxqFwoTCNCQleWU8esCFQAAAAAdAAAAABAD

[8] https://www.computerhistory.org/chess/stl-431614f67f976/

[9] https://www.sites.google.com/site/qgchess/chess-algorithms

AlphaZero: The AI from Google which mastered Chess in 4 hours

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by University of Toronto Machine Intelligence Team

No responses yet