Other Programs - State-of-the-Art Go-Playing Programs

Chapter 2 Background and Related Work

2.4 State-of-the-Art Go-Playing Programs

2.4.7 Other Programs

Here we briefly and selectively introduce some other state-of-the-art and specific programs that are worth mentioning. PACHI by Petr Baudis and Jean-loup Gailly (Baudis and Gailly, 2010) is now the strongest open source program. VALKYRIA by Magnus Persson (Persson, 2010) features heavy and rich knowledge representation in the playout and is specifically competitive on the 9×9 board. LIBEGO by Łukasz Lew (Lew, 2010) is the fastest implementation of MCTS. OREGO by Peter Drake (Drake, 2011) is one of the popular test beds among computer Go researchers.

Chapter 3 E RICA

In this chapter, we introduce our Go-playing program ERICA. Section 3.1 briefly reviews the development history of ERICA, as well as its standings in some of the tournaments that ERICA participated up to July 2011. Section 3.2 investigates the implementation of MCTS of ERICA along with some of our own ideas. Finally, Section 3.3 gives several examples picked from the games that ERICA played against human players to indicate its strength.

3.1 Development History

3.1.1 First Version Created on May 2008

The first version of ERICA was created in May 2008, based on implementing MOGO’s famous “UCT paper” (Gelly et al., 2006). The work was motivated by the impressive performance of CRAZY STONE and MOGO in the 9×9 competition at the 2007 Computer Olympiad.

This earliest version of ERICA was written in pure C programming language. The speed was about 20,000 uniform random simulations per second on a single-core CPU of 2.26 GHz, on the 9×9 board. A board is realized by a single array, keeping the related information of a position, such as each string’s color, liberty, owner⁸ and size.

8 The owner of a string is the representative stone of it.

MOGO-type, fixed-sequence simulation and RAVE form the basic MCTS framework of the program. In this period, the personal communications with Yizao Wang, one of the creators of MOGO, helped considerably for the author’s understanding of the “UCT paper” and to make ERICA stronger.

In the Computational Intelligence Forum & World 9 × 9 Computer Go Championship held on September 25-27, 2008, in Taiwan, ERICA ended up in the 4th position. Table 3.1 shows the result of this competition. In this tournament, ERICA

won a game against GO INTELLECT but lost to JIMMY, the strongest Taiwanese Go-playing program at that time.

Position Program Wins Country

1 MOGO 10 France Championship held on September 25-27, 2008, in Tainan, Taiwan.

In the 9×9 Go tournament at the 2008 Computer Olympiad held on September 28 to October 5, 2008, in Beijing, China, ERICA finished in 11th place among the 18 participants. Figure 3.1 shows the game between ERICA (White) andAYA (Black) in round 2. In this game, thanks to the correct handling of seki in the playout, ERICA

reversed the bad situation and won.

Figure 3.1: The final position of the match in round 2 in the 9×9 Go tournament at the 2008 Computer Olympiad: ERICA (White) vs. AYA (Black). The stone marked by ∆ is the last move.

The groups marked by ○ form a seki. White won by resignation.

At the 2009 Computer Olympiad held on May 10-18, 2009, in Pamplona, Spain, ERICA participated in the 9×9 Go tournament with the same version that played in the previous year. Finally, ERICA scored the 6th position among the 9 participants.

3.1.2 Second Version Created on June 2009

In June 2009, a new version of ERICA was created, under the supervision of Rémi Coulom. The main advancement in this new version consists in the implementation of the Boltzmann softmax playout policy that was successful in Coulom’s CRAZY STONE. In addition to RAVE, prior information was taken into account in the formulation of progressive bias (Chaslot et al., 2007). The supervised learning algorithm Minorization-Maximization (MM) (Coulom, 2007) was used to train the pattern weights.

At the 2009 TAAI Computer Go Tournament held on October 30-31, 2009, in Taiwan, ERICA won the 3rd and 2nd position in the 9×9 Go (Table 3.2) and 19×19 Go (Table 3.3) tournaments respectively.

Position Program Country

1 ZEN Japan

2 MOGO France

3 ERICA Taiwan

Table 3.2: The result of the 9×9 Go tournament at the 2009 TAAI Go Tournament.

Position Program Country

1 ZEN Japan

2 ERICA Taiwan

3 DRAGON Taiwan

Table 3.3: The result of the 19×19 Go tournament at the 2009 TAAI Go Tournament.

In the next month, ERICA participated in the 3rd UEC Cup held on November 28-29, 2009, in Japan, and at last scored the 6th position. In the round 2 of this tournament, ERICA for the first time defeated the well-known strong 19 × 19 Go-playing program AYA, as shown in Figure 3.2. The last move is marked by ∆. In this game, ERICA (Black) killed several White’s groups, marked by ×, and acclaimed a great victory.

Figure 3.2: The final position of the match in round 2 of the 3rd UEC Cup: ERICA (White) vs.

38 AYA (Black).. Black won by resignation.

3.1.3 Third Version Created on February 2010

In February 2010, Łukasz Lew, the author of LIBEGO was a great help in speed optimization of ERICA. The author re-wrote many primary data structures and created a new version of ERICA, under the supervision of Coulom. For instance, macros, a sort of preprocessor in C programming language, were used extensively for loop unrolling.

The speed of the simulation was accelerated by a factor of 2 compared to the previous version. In this period, we concentrated on 19×19 Go, trying hard to make use of larger patterns in the tree and improve the quality of the playout.

At the 2010 Computer Olypmiad held on September 24 to October 2, 2010, in Kanazawa, Japan, ERICA won the gold and silver medals in the 19×19 Go (Fotland, 2010) and 9×9 Go tournaments respectively. In the 19×19 Go tournament, after the final round is finished, three programs, ZEN, THE MANY FACES OF GO and ERICA were in a tie. The final positions were decided in the second playoff, when both ZEN and ERICA defeated THE MANY FACES OF GO and ERICA defeated ZEN. This indicates that the three programs were competitive in playing strength. Figure 3.3 shows the final match between ZEN (White) and ERICA (Black). This game was decided by a large-scale semeai in the opening stage. ZEN misread the semeai so that ERICA killed White’s big group (marked by ×) and secured the lead until the end.

Figure 3.3: A position of the final match in the playoff of the 19×19 Go tournament at the 2010 Computer Olympiad: ZEN (White) vs. ERICA (Black). The previous move is marked by

∆. Black won by resignation.

In the 4th UEC Cup, held on 27th to 28th November 2010 in Japan, ERICA won the 3rd position⁹. Table 3.4 shows the result of this interesting tournament. In this tournament, the games between the strong programs clearly indicated that handling semeai correctly is particularly crucial. Figure 3.4 shows the position of the match between THE MANY FACES OF GO (White) and ERICA (Black) in round 4 of the preliminaries in the first day. This game was decided by the large-scale semeai in the middle game. The Black’s group, marked by ○, has 4 liberties A, B, C and D while the White’s group, marked by ×, has only 3 liberties E, F and G. Finally, ERICA played correctly to win this capturing race and defeated the tough rival THE MANY FACES OF

GO.

9 Special thanks to Professor Tsan-Sheng Hsu, Research Fellow of Academia Sinica, Taiwan, who kindly provided us the hardware resources, an 8-core server with 64GB memory, for this tournament.

Position Program Country

1 FUEGO Canada

2 ZEN France

3 ERICA Taiwan

4 AYA Japan

5 THE MANY FACES OF GO America

6 COLDMILK Taiwan

7 CAREN Japan

8 PERSTONE Japan

Table 3.4: The result of the 4th UEC Cup, 2010.

Figure 3.4: A position of the match in the 4th UEC Cup: THE MANY FACES OF GO (White) vs.

ERICA (Black). The previous move is marked by ∆. Black won by resignation.

3.2 MCTS in E RICA

This section investigates the implementation of MCTS in ERICA, along with some of our own ideas. Note that these ideas might be re-inventions, since there are plenty of open source Go-playing programs to trace that we might overlook, not mentioning to the ones of unavailable source code.

3.2.1 Selection

The selection formula of ERICA is a combination of the strategies of UCT, RAVE and

progressive bias which maximizes the selection formula (3.1):

e_bias

where Coefficient is the weight of RAVE computed by Silver’s formula (Silver, 2009) and the exploration term of RAVE is taken off as Silver suggested. For the exploration term of UCT, Cuct is set to 0.6 for all board sizes.

The term progressive_bias is computed by the formula (3.2):

where CPB is a constant which has to be tuned empirically. nuct, initialized to 1, is the visit count of this node and v_prioris the prior value in [0,1]. After the end of search, the most visited candidate move in the root node is selected to play. For ERICA, the good value of C_PB on the 19×19 board is around 50. Note that the good values of C_PB can vary in different board sizes.

3.2.2 Expansion

ERICA uses delayed node creation (a node is expanded in the nth (n>1) visit ) to reduce the memory overhead caused by RAVE, as explained in Section 2.2.2. For ERICA, the good value of n on the 19×19 board is around 5. In node creation, the prior computation takes into account various features which are partly listed in (Coulom, 2007), according to the pattern weights given by MM.

3.2.2.1 Larger Patterns

For ERICA, the first and foremost feature in prior computation on the 19×19 board is larger patterns of diamond-shape (Stern at al., 2006). Firstly, larger patterns of up to

size 9 (by the definition in (Stern at al., 2006)) are harvested from the game records according to their frequencies of appearance. Then, these patterns are trained by MM together with other features that participate in prior computation. In ERICA, larger patterns are only used in progressive bias, not in the playout. The improvement from larger patterns is measured to be over 100 Elo.

3.2.2.2 Other Features

Other useful features for the 19×19 board are, for instance, ladder, distance features (distance to the previous move, distance to the move before the previous move and Common Fate Graph (CFG) distance (Graepel et al., 2001), etc) and various tactical features of semeai and life-and-death, such as “save a string by capturing”.

3.2.3 Simulation

3.2.3.1 Boltzmann Softmax Playout Policy

In simulation stage, ERICA uses Boltzmann softmax playout policy (usually called softmax policy or Gibbs sampling (Geman and Geman; 1984)). Softmax policy was firstly applied to a Monte Carlo Go program in (Bouzy and Chaslot, 2006) and called psdueo-randommoves that are generated by domain-dependent approach which uses a non-uniform probability. In the experiments of Bouzy and Chaslot, only 3×3 patterns along with one-liberty urgency were served as the features. This scheme of pseudo-random, non-uniform probabilistic distribution was further improved and extended by Coulom to multiple features (Coulom, 2007).

The softmax policy _ is defined by the probability of choosing action a in

where Φ(s, a) is a vector of binary features, and θ is a vector of feature weights.

To explain the softmax policy, Figure 3.5 gives an example of a position in the playout, Black to move. The previous move is marked by ∆. For Black, now the only legal moves are A, B, C and D¹⁰.

Figure 3.5: An example of a position in the playout. The previous move is marked by ∆. Black to move.

Suppose there two binary features (for simplicity, e^ⁱis denoted by _i):

1. Contiguous to the previous move. A candidate move that is directly neighboring to the previous move has this feature. The weight of this feature is ₁. Point A, C and D have this feature.

2. Save the string, put in atari by the previous move, by extending. The weight of this feature is ₂. Point A has this feature.

Then, the weight of each move is, A: weight= ₁₂

B: weight= ₁ C: weight= ₁

D: weight= 1, with no corresponding feature.

10 In ERICA, an empty point that fills a real eye, such as E, is also regarded as an illegal move, though they are legal according to the Go rules. Forbid “filling a real eye” in the playout is commonly used in current Mone Carlo Go-playing programs.

Consequently, the probability to choose each move is given by A:

The move generator in the playout of ERICA is depicted by the pseudocode shown in Table 3.5. The details are explained as follows.

Table 3.5: Pseudocode of the move generator in the playout of ERICA.

ComputeLocalFeatures deals with the local features (the features related to the MoveGenerator()

previous move or the move before the previous move, etc) and updates the gammas of the local moves which have the local features. 3×3 patterns and some of the local features of ERICA will be introduced in Section 4.3.2.

The move to be generated is decided in the “for loop”. Firstly, if TotalGamma, the sum of the gammas of all the moves in this position, is equal to 0, Move is set to pass and returned immediately since no move has a nonzero probability. Otherwise, ChooseMoveByProbability chooses a move and assigns to Move by softmax policy as described in the previous section.

After Move is chosen, ForbiddenMove examines that if Move is forbidden, which means it has a feature of zero weight. If Move is detected to be forbidden, SetZeroGamma subtracts its gamma from TotalGamma and resets the gamma to zero.

The mechanism of ForbiddenMove is a compromise for the features which are too costly to incrementally update. Note that it is also possible to check the legality of Move in ForbiddenMove. The next section will give an example of ForbiddenMove.

After the examination of forbiddenMove, Move is passed (call by reference) to ReplaceMove for further inspection. ReplaceMove is an extended version of ForbiddenMove in the sense that it not only examines if Move is forbidden or not, but also replaces it with a better move for the former case. Section 3.2.3.4 will give an example of ReplaceMove.

Outside the “for loop”, when Move is ready to be returned, RecoverMoves sets back the gammas of the moves reset by forbiddenMove.

3.2.3.3 ForbiddenMove

Figure 3.6 gives an example of ForbiddenMove of ERICA’s move generator in the playout, Black to move. In this example, point A is forbidden because it is a self-atari of 9 stones, which is a clearly bad move. In ERICA, a self-atari move is not forbidden

if it forms a nakade shape.

Figure 3.6: An example of ForbiddenMove, Black to move.

3.2.3.4 ReplaceMove

Figure 3.7 gives an example of ReplaceMove of ERICA’s move generator in the playout, Black to move. In this example, point A is forbidden and replaced with B by the rule “when filling a false eye, if there is a capturable group in one of the diagonal point, then capture the group instead of filling the false eye”.

Figure 3.7: An example of ForbiddenMove, Black to move.

3.2.4 Backpropagation

In this section, we present two useful heuristics of RAVE to improve its performance.

Section 3.2.4.1 presents the first heuristic, to bias RAVE updates by move distance.

Section 3.2.4.2 presents the second heuristic, to fix RAVE updates for ko threats.

3.2.4.1 Bias RAVE Updates by Move Distance

When updating the RAVE values in a node, the heuristic “Bias RAVE Updates by

Move Distance” is to bias the simulation outcome according to how far the updated move was played away from this node. The number of the moves between this node and the updated move is defined as the distance of this move, denoted by d. The weight to bias the simulation outcome is defined as distance weight, denoted by w. If the simulation outcome is 1, then the updated outcome is 1−d*w; if the simulation outcome is 0, then the updated outcome is 0+d*w. Figure 3.8 gives an example.

Figure 3.8: An example of “Bias RAVE Updates by Move Distance”.

As far as we know, FUEGO was the first Go-playing program that proposed and used this idea¹¹. This heuristic brings in the information of move sequence to RAVE.

It is worth about 50 Elo in our experiments.

3.2.4.2 Fix RAVE Updates for Ko Threats

Figure 3.9 is an illustration to show the occasion where this heuristic is applicable.

This position is selected from the game played on the KGS Go Server (KGS) between

11 The details of FUEGO’s approach to “Bias RAVE Updates by Move Distance” can be found in the documents of FUEGO in the official web site, http://fuego.sourceforge.net/.

Search path: Node 1→move A→Node 2→move B→Node 3→move C…

Suppose simulation outcome=1 and w=0.001.

When updating the RAVE values of Node 1,

For move A, d=0 and the updated outcome of A is 1 – 0*0.001 = 1.

For move B, d=1 and the updated outcome of B is 1 – 1*0.001 = 0.999.

For move C, d=2 and the updated outcome of C is 1 – 2*0.001 = 0.998.

ajahuang [6d] (White) and Zen19D[5d] (Black). The previous move (marked by ∆) played by ZEN is clearly meaningless, though it’s a sente move or ko threat that forces White to respond. Apparently, the correct move in this moment is A, namely to capture the ko. But why ZEN played a ko threat before capturing the ko?

Figure 3.9: An example to show the occasion of the heuristic “Fix RAVE Updates for Ko Threats”: ajahuang [6d] (White) vs. Zen19D [5d] (Black). White won by resignation.

The problem is (probably) out of RAVE. It is due to the intrinsic problem of RAVE that in the root node, the RAVE value of the ko threats (such as the previous move marked by ∆), which were searched in the lower levels of the tree, are also updated. But a ko threat is supposed to be played after a ko capture. So, in this example, the RAVE value of Black’s ko threats (such as the previous move marked by

∆) should not be updated in the root node. This is the main idea of the heuristic “Fix RAVE updates for Ko Threats”. Figure 3.10 gives an example of this heuristic to show how it works practically in the tree. In this example, the RAVE value of move E in Node 1 is not updated because it is detected as a ko threat move of Node 5. This

heuristic is worth about 30 Elo in our experiments.

Figure 3.10: An example of “Fix RAVE Updates for Ko Threats”.

3.3 KGS Games of E RICA

Starting December 13, 2010, ERICA played on the KGS Go Server (KGS) using the account EricaBot, running on a 4-core CPU of 3.07 GHz. With short time setting of 10×00:15 (15 seconds byo-yomi for 10 times), it was rated 3-dan in the beginning and about 3.75-dan on June, 2010, as shown in Figure 3.11.

Search path: Node 1→move A→Node 2→move B→Node 3→move C

→Node 4→move D→Node 5→move E→Node 6.

Suppose:

A: Black captures a ko.

B: White plays a ko threat.

C: Black responses the ko threat threatened by B.

D: White re-captures the ko.

E: Black plays a ko threat.

Then don’t update the RAVE value of move E in node 1.

Figure 3.11: The KGS Rank Graph for EricaBot.

Figure 3.12 shows a 19×19 game between EricaBot (White) and a 2-dan human player BOThater36. In this game, ERICA captured the center group by ladder-atari (move 120) and won. This game shows that ERICA is a solid 3-dan player and features moderate opening play on the 19×19 board.

Figure 3.12: A 19×19 ranked game on KGS: EricaBot 3-dan (White) vs. BOThater36 2-dan (Black). White won by resignation.

Figure 3.13 shows a 9×9 game between Erica9 (White) and a 5-dan human player guxxan. In this game, ERICA played a classical killing method (move 32 and 34) to kill the Black group in the top-left corner. This game shows that ERICA is already a solid high dan player on the 9×9 board.

Figure 3.13: A 9×9 game on KGS: Erica9 (White) vs. guxxan 5-dan (Black). White won by resignation.

Chapter 4 Monte Carlo Simulation Balancing

Applied to 9×9 Go

4.1 Introduction

Monte Carlo evaluation of a position depends on the choice of a probability distribution over legal moves. A uniform distribution is the simplest choice, but produces poor evaluations. It is often better to play good moves with a higher probability, and bad moves with a lower probability. Playout policy has a large influence on the playing strength. Several methods have been proposed to optimize it.

The simplest approach to policy optimization is trial and error. Some knowledge is implemented in playouts, and its effect on the playing strength is estimated by measuring the winning rate against other programs (Bouzy, 2005; Gelly et al., 2006;

Chen and Chang, 2008; Chaslot et al., 2009). This approach is often slow and costly, because measuring the winning rate by playing games takes a large amount of time,

在文檔中應用於電腦圍棋之蒙地卡羅樹搜尋法的新啟發式演算法 (頁 48-0)