Experiments - Monte Carlo Simulation Balancing Applied to 9×9 Go

Chapter 4 Monte Carlo Simulation Balancing Applied to 9×9 Go

4.3 Experiments

Experiments were run with the Go-playing program ERICA. The SB algorithm was applied repeatedly with different parameter values, in order to measure their effects.

Playing strength was estimated with matches against FUEGO. The result of applying SB is compared to MM, both in terms of playing strength and feature weights.

4.3.1 E

RICA

ERICA is developed by the author in the framework of his Ph.D. research. More details of ERICA can be found in Chapter 3.

4.3.2 Playout Features

This subsection and the remainder of this chapter uses Go jargon that may not be familiar to some readers. Explanations for all items of the Go-related vocabulary can be found in the Sensei’s Library web site (http://senseis.xmp.net/). Still, it should be possible to understand the main ideas of this chapter without understanding that vocabulary. The playouts of ERICA are based on 3×3 stone patterns, augmented by the

atari status of the four directly connected points. These patterns are centred on the move to be played. By taking rotations, symmetries, and move legality into consideration, there is a total of 2,051 such patterns. In addition to stone patterns, ERICA uses 7 features related to the previous move (examples are given in Figure 4.1).

1. Contiguous to the previous move. Active if the candidate move is among the 8 neighbouring points of the previous move. Also active for all Features 2–7.

2. Save the string in new atari, by capturing. The candidate move that is able to save the string in new atari by capturing has this feature.

3. Same as Feature 2, which is also self-atari. If the candidate move has Feature 2 but is also a self-atari, then instead it has Feature 3.

4. Save the string in new atari, by extending. The candidate move that is able to save the string in new atari by extending has this feature.

5. Same as Feature 4, which is also self-atari.

6. Solve a new ko by capturing. If there is a new ko, then the candidate move that is able to solve the ko by capturing any one of the neighbouring strings has this feature.

7. 2-point semeai. If the previous move reduces the liberties of a string to only two, then the candidate move that gives atari to its neighbouring string which has no way to escape has this feature. This feature deals with the most basic type of semeai.

Figure 4.1: Examples of Features 2,3,4,5,6 and 7. Previous move is marked with a dot.

4.3.3 Experimental Setting

The performances of MM and SB were measured by the winning rate of ERICA

against FUEGO 0.4 with 3,000 playouts per move for both programs. In the empty position, ERICA ran 6,200 playouts per second, whereas FUEGO ran 7,200 playouts per second. For reference, performance of the uniform random playout policy and the MM policy are shown in Table 4.1.

Table 4.1: Reference results against FUEGO 0.4, 1,000 games, 9×9, 3k playouts/move

For fairness, the trainings of MM and SB were both performed with the same features described above. The training of MM was accomplished within a day,

19 × 19

performed on 1,400,000 positions, chosen from 150,000 19×19 game records by strong players. The games were KGS games collected from the web site of Kombilo (Goertz and Shubert, 2007), combined with professional games collected from the web2go web site (Lin, 2009).

The production of the training data and the training process of SB were accomplished through ERICA without any external program. The training positions were randomly selected from the games self-played by ERICA with 3,000 playouts per move. Then ERICA with playouts parameters determined by MM, was directly used to evaluate these positions. It took over three days to complete merely the production and evaluation of the training positions. From this viewpoint, SB training costs much more time than MM.

The 9×9 positions were also used to measure the performance of MM in the situation equivalent to that of SB. The same 5k positions, that were served as the training set of SB, were trained on MM to compute the patterns.

The strength of these patterns was measured and shown in Table 1 as 9×9 MM.

4.3.4 Results and Influence of Meta-Parameters

SB has a few meta-parameters that need tuning. For the gradient-descent part, it is necessary to choose M, N, and



. Two other parameters define how the training set was built: number of positions, and number of playouts for each position evaluation.

Table 4.2 summarizes the experimental results with these parameters.

Table 4.2: Experimental results. The winning rate was measured 1,000 games against FUEGO

0.4, with 3,000 playouts per move. 95% condifence is ±3.1 when the winning rate is close to 50%, and ±2.5 when it is close to 80%.

Since the algorithm is random, it would have been better to replicate each experiment more than once, in order to measure the effect of randomness. Unlike MM, SB has no guarantee to find the global optimum, and may have a risk to get stuck at a bad local optimum. Because of limited computer resources, we preferred trying many parameter values rather than replicating experiments with the same parameters.

In the original algorithm, the simulations of outcome 0 are ignored when N simulations are performed to accumulate the gradient. The algorithm can be safely modified to use outcome -1/1 and replace z by (z - b), where b is the average reward, to make the 0/1 and -1/1 cases equivalent (Silver, 2009). The results of the 1st and 4th columns in Table 2 show that the learning speed of outcome -1/1 was much faster than 0/1, so that the winning rate of outcome -1/1 of Iteration 20 (69.2%) was even higher

than that of outcome 0/1 of Iteration 100 (63.9%). This is an indication that -1/1 might be better than 0/1, but more replications would be necessary to make a general conclusion.

The SB algorithm was designed to reduce the mean squared error (MSE) of the whole training set by stochastic gradient-descent. As a result, the MSE should gradually decrease if the training is performed on the same training set ever and again.

Running the SB algorithm through the whole training set once is defined as an Iteration. Figure 4.2 shows that the measure MSE actually decreases.

Figure 4.2: Mean square error as a function of iteration number. M=N=500,



=10, training set has 5k positions evaluated by 100 playouts. The error was measured by 1,000 playouts for every position for the training set.

4.4 Comparison between MM and SB Feature

在文檔中應用於電腦圍棋之蒙地卡羅樹搜尋法的新啟發式演算法 (頁 71-76)