• 沒有找到結果。

應用於電腦圍棋之蒙地卡羅樹搜尋法的新啟發式演算法

N/A
N/A
Protected

Academic year: 2021

Share "應用於電腦圍棋之蒙地卡羅樹搜尋法的新啟發式演算法"

Copied!
106
0
0

加載中.... (立即查看全文)

全文

(1)國立臺灣師範大學資訊工程研究所 博士論文. 指導教授:林順喜 博士 Rémi Coulom 博士. 應用於電腦圍棋之蒙地卡羅樹搜尋法的新啟發式演算法 New Heuristics for Monte Carlo Tree Search Applied to the Game of Go. 研究生:黃士傑 撰. 中華民國一百年七月.

(2) New Heuristics for Monte Carlo Tree Search Applied to the Game of Go. A dissertation proposed by Shih-Chieh Huang. to the Department of Computer Science and Information Engineering. in partial fulfillment of the requirements. for the degree of Doctor of Philosophy. in the subject of. Computer Science. National Taiwan Normal University Taipei, Taiwan, R.O.C. 2011.

(3) 誌謝 感謝我正式的指導教授林順喜老師。林老師在我念研究所時就開始栽培我, 還多次補助我參加電腦奧林匹亞,使我在比賽中累積了許多寶貴的經驗。 這個研究是由 Rémi Coulom 教授所指導的,所以他應該得到我最真誠的感 謝。在 2009 年 6 月時,我陷入了博士班生涯的低潮,迷茫於沒有研究方向,於 是寫信問他一些關於他論文上的問題。他非常有耐心的回答並鼓勵我向前。從 那時候開始我們逐漸形成了一個極有生產力的合作。我們透過 email 與視訊會議 的方式討論,Rémi 勤勉的態度以及許多創新的想法,實在給我極大的幫助。 關於我們的圍棋程式 ERICA 的發展,除了 Rémi 之外,還要特別感謝王一早 提供了許多有趣的想法,Łukasz Lew 在速度最佳化上的實質幫助,還有加藤英 樹慷慨的經驗傳授。 感謝中央研究院的研究員徐讚昇老師,在 2010 年的 UEC Cup 提供我們硬體 設備,幫助我們在這個艱難的比賽中贏得了第 3 名。 本研究的成果以及論文的寫作,乃是得益於以下諸多人士的幫助。關於 Simulation Balancing 的研究,感謝 David Silver 給我們的指正與鼓勵,也感謝林 中雄先生願意提供我們棋譜士網站中大量的棋譜。感謝 David Fotland 夫婦幫忙 逐章修正了許多英文的錯誤。感謝加拿大 Alberta 大學的 Martin Müller 教授與德 國 Friedrich-Schiller 大學的 Ingo Althöfer 教授在論文內容上提出許多精闢的見解。 感謝我的論文口試委員林順喜教授、許舜欽教授、吳毅成教授、徐讚昇教授與 顏士淨教授,他們的批評與指導(尤其是吳毅成教授)幫助這本論文更加完善。 感謝我的家人,特別是我的媽媽以及太太,他們的支持推動我沒有後顧之 憂的完成博士學位。作為一個基督徒,我也要感謝神在暗中永不停止的引導與 幫助,正如聖經所說『信靠祂的,必不至於羞愧』。. I.

(4) Acknowledgement Thanks to my official adviser Professor Shun-Shii Lin, whose cultivation was from the start of my master’s project. For many times, he funded my participation in the Computer Olympiads, which gave me a great deal of valuable experiences. This research was supervised by Professor Rémi Coulom, so he deserves the earnest gratitude from my heart of hearts. On June 2009, I was wandering in my Ph.D. career, without any research direction, and turned to ask him some questions about his paper. He answered very patiently and encouraged me to proceed. Since then we gradually formed an extremely productive cooperation. We discussed through emails and video conference. Rémi’s diligence and innovative ideas have always been my enormous help. Toward the development of our Go-playing program ERICA, besides Rémi, thanks to Yizao Wang for providing many interesting ideas, to Łukasz Lew for the speed optimization and to Hideki Kato for generous sharing of his experiences. Thanks to Professor Tsan-Sheng Hsu, Research Fellow of Academia Sinica in Taiwan, who kindly provided us the hardware resources for the 2010 UEC Cup so that we could win 3rd place in this tough competition. The result of this research and the writing of this dissertation benefitted from the people listed in the following. About the research of Simulation Balancing, thanks to David Silver for his comments and encouragements. Thanks to Lin Chung-Hsiung for kindly providing access to the game database of web2go web site. Thanks to David and Wendy Fotland for correcting the linguistic errors chapter by chapter. Thanks to Professor Martin Müller from the Alberta University in Canada and Professor Ingo Althöfer from the Friedrich-Schiller University in German for proposing plenty of penetrating ideas about the content. Thanks to the committee of my dissertation II.

(5) defense, including Professor Shun-Shii Lin, Professor Shun-Chin-Hsu, Professor I-Chen Wu, Professor Tsan-Sheng Hsu and Professor Shi-Jim Yen. Their criticism and instructions, particularly the ones from Professor Wu, helped to improve this dissertation. Thanks to my family, especially my mother and my wife. Their support drove me to complete my Ph.D. career without any burden. As a Christian, thanks to God for his secret and unstoppable guidance and arrangements, just as what we read in the Bible “he that believes on him shall not be ashamed”.. III.

(6) 摘要 電腦圍棋的研究開始於 1970 年,但圍棋程式卻從未曾被人們認為是強大的, 直到 2006 年,當「蒙地卡羅樹搜尋」(Monte Carlo Tree Search)與「樹狀結構信 賴上界法」(Upper Confidence bounds applied to Trees)出現之後,情況才開始完全 不同。 「蒙地卡羅樹搜尋」與「樹狀結構信賴上界法」所帶進的革命強而有力到 一個地步,人們甚至開始相信,圍棋程式在 10 年或者 20 年之後,將能夠擊敗 頂尖的人類棋手。 在本研究中,我們針對「蒙地卡羅樹搜尋」提出一些新的啟發式演算法, 主要有兩方面的貢獻。第一個貢獻,是成功的將「模擬平衡化」(Simulation Balancing)應用到 9 路圍棋。 「模擬平衡化」 是一種用來訓練模擬的參數的演算法。 Silver 與 Tesauro 在 2009 年提出這個方法時,只實驗在比較小的盤面上,而我們 的實驗結果首先證明了「模擬平衡化」在 9 路圍棋的有效性,具體方法是證明 「模擬平衡化」超越了知名的監督式演算法 Minorization-Maximization (MM)大 約有 90 Elo 之多。第二個貢獻是針對 19 路圍棋,系統式的實驗了各種不同之時 間控制的方法。實驗結果清楚的指明,聰明的時間控制方案可以大大的提高棋 力。所有的實驗都是執行在我們的圍棋程式 ERICA,而 ERICA 正是得益於這些啟 發式演算法與實驗結果,成功取得了 2010 年電腦奧林匹亞的 19 路圍棋金牌。. 關鍵字:人工智慧,圍棋,電腦圍棋,蒙地卡羅樹搜尋,樹狀結構信賴上界法, 模擬平衡化,時間控制,Erica。. IV.

(7) Abstract Research into computer Go started around 1970, but the Go-playing programs were never, in a real sense, considered to be strong until the year 2006, when the brand new search scheme Monte Carlo Tree Search (MCTS) and Upper Confidence bounds applied to Trees (UCT) appeared on the scene. The revolution of MCTS and UCT promoted progress of computer Go to such a degree that people began to believe that after ten or twenty years, Go-playing programs will be able to defeat the top human players. In this research, we propose some new heuristics of MCTS focused on two contributions. The first contribution is the successful application of Simulation Balancing (SB), an algorithm for training the parameters of the simulation, to 9×9 Go. SB was proposed by Silver and Tesauro in 2009, but it was only practiced on small board sizes. Our experiments are the first to demonstrate its effectiveness in 9×9 Go by showing that SB surpasses the well-known supervised learning algorithm Minorization-Maximization (MM) by about 90 Elo. The second contribution is systematic experiments of various time management schemes for 19×19 Go. The results indicate that clever time management algorithms can considerably improve playing strength. All the experiments were performed on our Go-playing program ERICA, which benefitted from these heuristics and the experimental results to win the gold medal in the 19×19 Go tournament at the 2010 Computer Olympiad.. Keywords: Artificial Intelligence, Go, computer Go, Monte Carlo Tree Search (MCTS), Upper Confidence bounds applied to Trees (UCT), Simulation Balancing, Time Management, Erica. V.

(8) Contents. 誌謝................................................................................................................................ I Acknowledgement ....................................................................................................... II 摘要............................................................................................................................. IV Abstract ........................................................................................................................ V Contents ..................................................................................................................... VI List of Figures .............................................................................................................. X List of Tables ............................................................................................................ XII Chapter 1 Introduction ............................................................................................ 1 1.1 Computer Games .......................................................................................................... 1 1.2 The Game of Go ........................................................................................................... 2 1.2.1 History ................................................................................................................... 2 1.2.2 Rules ...................................................................................................................... 3 1.3 Computer Go ................................................................................................................ 6 1.4 Summary of the Contributions ..................................................................................... 8 1.5 Organization of the Dissertation ................................................................................... 9. Chapter 2 Background and Related Work ........................................................... 10 2.1 Monte Carlo Go .......................................................................................................... 10 2.2. Monte Carlo Tree Search (MCTS) ............................................................................. 11. 2.2.1 Selection .............................................................................................................. 12 2.2.2 Expansion ............................................................................................................ 12 2.2.3 Simulation............................................................................................................ 13 2.2.4 Backpropagation .................................................................................................. 15 2.3 Upper Confidence Bound Applied to Trees (UCT).................................................... 18 2.4 State-of-the-Art Go-Playing Programs ....................................................................... 21 VI.

(9) 2.4.1 Crazy Stone ......................................................................................................... 21 2.4.2. MOGO .................................................................................................................. 23. 2.4.3. GNU GO ............................................................................................................... 26. 2.4.4. FUEGO .................................................................................................................. 28. 2.4.5 The Many Faces of Go ........................................................................................ 29 2.4.6. ZEN ...................................................................................................................... 30. 2.4.7 Other Programs .................................................................................................... 33. Chapter 3 3.1. ERICA...................................................................................................... 34. Development History.................................................................................................. 34. 3.1.1 First Version Created on May 2008 .................................................................... 34 3.1.2 Second Version Created on June 2009 ................................................................ 36 3.1.3 Third Version Created on February 2010 ............................................................ 38 3.2 MCTS in ERICA .......................................................................................................... 40 3.2.1 Selection .............................................................................................................. 40 3.2.2 Expansion ............................................................................................................ 41 3.2.2.1 Larger Patterns.................................................................................................. 41 3.2.2.2 Other Features .................................................................................................. 42 3.2.3 Simulation............................................................................................................ 42 3.2.3.1 Boltzmann Softmax Playout Policy .................................................................. 42 3.2.3.2 Move Generator ................................................................................................ 44 3.2.3.3 ForbiddenMove ................................................................................................ 45 3.2.3.4 ReplaceMove .................................................................................................... 46 3.2.4 Backpropagation .................................................................................................. 46 3.2.4.1 Bias RAVE Updates by Move Distance ........................................................... 46 3.2.4.2 Fix RAVE Updates for Ko Threats .................................................................. 47 3.3 KGS Games of ERICA ................................................................................................ 49. Chapter 4 Monte Carlo Simulation Balancing Applied to 9×9 Go .................... 52 VII.

(10) 4.1 Introduction ................................................................................................................ 52 4.2 Description of Algorithms .......................................................................................... 54 4.2.1 Softmax Policy .................................................................................................... 54 4.2.2 Supervised Learning with MM ............................................................................ 54 4.2.3 Policy-Gradient Simulation Balancing (SB) ....................................................... 55 4.3 Experiments ................................................................................................................ 56 4.3.1. ERICA ................................................................................................................... 56. 4.3.2 Playout Features .................................................................................................. 56 4.3.3 Experimental Setting ........................................................................................... 58 4.3.4 Results and Influence of Meta-Parameters .......................................................... 59 4.4 Comparison between MM and SB Feature Weights .................................................. 61 4.5 Against GNU Go on the 9×9 Board ........................................................................... 63 4.6 Playing Strength on the 19×19 Board ........................................................................ 65 4.7. Conclusions ............................................................................................................... 65. Chapter 5 Time Management for Monte Carlo Tree Search Applied to the Game of Go ................................................................................................................. 67 5.1 Introduction ................................................................................................................ 67 5.2 Monte Carlo Tree Search in ERICA and Experiment Setting...................................... 68 5.3 Basic Formula............................................................................................................. 69 5.4 Enhanced Formula Depending on Move Number ...................................................... 70 5.5 Some Heuristics .......................................................................................................... 72 5.5.1 UCT Formula in ERICA ....................................................................................... 72 5.5.2 Unstable-Evaluation Heuristic............................................................................. 72 5.5.3 Think Longer When Behind ................................................................................ 73 5.6 Using Opponent’s Time ............................................................................................. 74 5.6.1 Standard Pondering ............................................................................................. 75 5.6.2 Focused Pondering .............................................................................................. 75 VIII.

(11) 5.6.3 Reducing ThinkingTime According to the Simulation Percentage ..................... 77 5.7 Conclusions ................................................................................................................ 78. Chapter 6. Conclusions and Proposals for Future Work ..................................... 79. 6.1. Simulation Balancing (SB) .................................................................................... 79. 6.2. Time Management ................................................................................................. 80. 6.3. Other Prospects ...................................................................................................... 80. References ................................................................................................................... 82 Appendix A. Publication List .................................................................................... 91. IX.

(12) List of Figures Figure 1.1: A Go board of 19×19 grid of lines, with some played stones. ............................... 4 Figure 1.2: An example of “Removing a string without liberty” ............................................. 4 Figure 1.3: An example of “Prohibiting suicide” ..................................................................... 5 Figure 1.4: An example of “Prohibiting repeating positions” .................................................. 5 Figure 1.5: An example of “Winning by more territory” ......................................................... 6 Figure 2.1: The scheme of MCTS .......................................................................................... 12 Figure 2.2: The first stage of MCTS: selection ...................................................................... 12 Figure 2.3: The second stage of MCTS: expansion................................................................ 13 Figure 2.4: The third stage of MCTS: simulation................................................................... 14 Figure 2.5: The fourth stage of MCTS: backpropagation....................................................... 16 Figure 2.6: The exhibition game at the 2010 Computer Olympiad ........................................ 17 Figure 2.7: The final position of the exhibition match: Kaori Aoba 4p (White) vs. CRAZY STONE (Black), with 7 handicap stones ................................................................. 22 Figure 2.8: An example of sequence-like simulation proposed by MOGO team .................... 24 Figure 2.9: An example of “save a string by capturing” and “save a string by extending” .... 26 Figure 2.10: The final position of the match: Chun-Hsun Chou 9p (White) vs. MOGOTW (Black). .................................................................................................................. 26 Figure 2.11: Round 1 at the 2003 Computer Olympiad: JIMMY (White) vs. GNU GO (Black)27 Figure 2.12: A position of the final match in 4th UEC Cup: ZEN (White) vs. FUEGO (Black)28 Figure 2.13: The final position of the match in round 7 in the 19×19 Go tournament at the 2010 Computer Olympiad: ZEN (White) vs. THE MANY FACES OF GO (Black) .... 30 Figure 2.14: KGS Rank Graph for Zen19D............................................................................ 31 Figure 2.15: The final position of the exhibition match in the 4th UEC Cup: Kaori Aoba 4p X.

(13) (White) vs. ZEN (Black) ......................................................................................... 32 Figure 2.16: The final position of the match in the Computer Go Competition at the 2011 IEEE International Conference on Fuzzy Systems: Chun-Hsun Chou 9p (White) vs. ZEN (Black), with 6 handicap stones ..................................................................... 33 Figure 3.1: The final position of the match in round 2 in the 9×9 Go tournament at the 2008 Computer Olympiad: ERICA (White) vs. AYA (Black) .......................................... 36 Figure 3.2: The final position of the match in round 2 of the 3rd UEC Cup: ERICA (White) vs. AYA (Black) ......................................................................................................... 37 Figure 3.3: A position of the final match in the playoff of the 19×19 Go tournament at the 2010 Computer Olympiad: ZEN (White) vs. ERICA (Black) ............................... 39 Figure 3.4: A position of the match in the 4th UEC Cup: THE MANY FACES OF GO (White) vs. ERICA (Black)...................................................................................................... 40 Figure 3.5: An example of a position in the playout ................................................................. 43 Figure 3.6: An example of ForbiddenMove .............................................................................. 46 Figure 3.7: An example of ForbiddenMove .............................................................................. 46 Figure 3.8: An example of “Bias RAVE Updates by Move Distance” ...................................... 47 Figure 3.9: An example to show the need of “Fix RAVE Updates for Ko Threats”: ajahuang [6d] (White) vs. Zen19D [5d] (Black)................................................................. 48 Figure 3.10: An example of “Fix RAVE Updates for Ko Threats” ........................................... 49 Figure 3.12: The KGS Rank Graph for EricaBot ...................................................................... 50 Figure 3.13: A 19×19 ranked game on KGS: EricaBot 3-dan (White) vs. BOThater36 2-dan (Black)................................................................................................................. 51 Figure 3.14: A 9×9 game on KGS: Erica9 (White) vs. guxxan 5-dan (Black) .......................... 51 Figure 4.1: Examples of Features 2,3,4,5,6 and 7 ................................................................... 58 Figure 4.2: Mean square error as a function of iteration number ............................................ 61 Figure 5.1: Thinking time per move, for different kinds of time-allocation strategies ........... 71 XI.

(14) List of Tables Table 1.1: Complexities of some well-known games ............................................................... 7 Table 3.1: The result of the Computational Intelligence Forum & World 9×9 Computer Go Championship held on September 25-27, 2008, in Tainan, Taiwan........................ 35 Table 3.2: The result of the 9×9 Go tournament at the 2009 TAAI Go Tournament ............... 37 Table 3.3: The result of the 19×19 Go tournament at the 2009 TAAI Go Tournament ........... 37 Table 3.4: The result of the 4th UEC Cup, 2010 ..................................................................... 40 Table 3.5: Pseudocode of the move generator in the playout of ERICA .................................. 44 Table 4.1: Reference results against Fuego 0.4, 1,000 games, 9×9, 3k playouts/move .......... 58 Table 4.2: Experimental results ............................................................................................... 60 Table 4.3: Comparison of local features, between MM and SB .............................................. 62 Table 4.4: 3×3 patterns ............................................................................................................ 64 Table 4.5: Results against Gnu Go 3.8 Level 10, 1,000 game, 9×9, 300 playouts/move ........ 64 Table 4.6: Results against Gnu Go 3.8 Level 0, 500 game, 19×19, 1,000 playouts/move ...... 65 Table 5.1: Fixed playouts per move against GNU GO 3.8, Level 2, 500 games, 19×19 .......... 69 Table 5.2: Basic formula against GNU GO 3.8, Level 2, 500 games, 19×19 ........................... 70 Table 5.3: Enhanced formula (C=80) against GNU GO 3.8, Level 2, 500 games, 19×19 ........ 71 Table 5.4: Enhanced formula (C=80) with Unstable-Evaluation heuristic against GNU GO 3.8, Level 2, 500 games, 19×19 ...................................................................................... 73 Table 5.5: Enhanced formula (C=80, MaxPly=160) with Unstable-Evaluation heuristic and Think Longer When Behind (T=0.4) against GNU GO 3.8, Level 2, 500 games, 19×19 ....................................................................................................................... 74 Table 5.6: Standard Pondering against GNU GO 3.8, Level 2, 500 games, 19×19 .................. 75 Table 5.7: Focused Pondering (N=10) against GNU GO 3.8, Level 2, 500 games, 19×19 ...... 76 Table 5.8: Focused Pondering (N=5) against GNU GO 3.8, Level 2, 500 games, 19×19 ........ 76 XII.

(15) Table 5.9: Self-play: Focused Pondering against Standard Pondering, both with Enhanced Formula (C=180, MaxPly=160), 500 games, 19×19 ............................................... 77. XIII.

(16) Chapter 1 Introduction. The game of Go is a grand challenge of artificial intelligence. In this dissertation, we investigate some new heuristics of Monte Carlo Tree Search (MCTS) applied to the game of Go.. 1.1. Computer Games. Artificial intelligence in games (Herik et al., 2002) has made tremendous progress in the past decades. The theoretical foundation of computer games was laid in 1950, when Claude Shannon published his groundbreaking paper “Programming a Computer for Playing Chess” (Shannon, 1950). It was this paper that proposed the well-known search scheme minimax procedure, in collaboration with an evaluation function for evaluating the terminal positions. Minimax procedure as well as its enhancements (Schaeffer, 1989) such as alpha-beta pruning (Knuth and Moore, 1975), transposition table (Slate and Atkin, 1977), etc, constitute the framework that is still dominant in the area of computer games, particularly computer chess. The rapid and constant development of computer games, from 1950, reached a peak in the year 1997, when the chess-playing super-computer DEEP BLUE (Campbell et al., 2002) built by IBM defeated the world champion Garry Kasparov in a six-game match. This achievement has been regarded as a significant milestone of artificial intelligence. In 1.

(17) spite of the chess-playing programs which have grown to the super-human level, the game of Go is a major challenge that remains open (Burmeister and Wiles, 1995; Bouzy and Cazenave, 2001).. 1.2. The Game of Go. 1.2.1 History Go (Chinese:圍棋, Japanese:囲碁, Korean: 바둑) is an ancient board game that originated in China in the far past. According to the generally accepted legends, Go was invented by the Chinese emperor Yao (2337-2258 B.C.) in order to instruct his son Danzhu. In the long history of China, interest and research toward Go were never scarce. The game of Go was developed to an art, deeply mingled with Chinese culture as one of the “Four Arts of the Chinese Scholar”, namely Qín (Guqin), Qí (Go), Shū (Chinese calligraphy), Huà (Chinese painting). Moreover, the terminology of Go was largely adopted from Chinese idioms to represent specific conceptions. Two evident examples are Ladder (Chinese: 征子, suggesting “a long chase”) and Ko (Chinese: 劫, implying “infinite misfortune”). Several classical writings investigating the playing skills of Go are still circulated nowadays. For instance, “Mystery of MySter -ies ” (Chinese: 玄玄棋經), a well-known Tsumego compilation documented in 1349 A.D., is still popular and studied by Go players. Go was extensively studied and widely played by the general public after spreading to Japan in the 7th century. On account of the diligent research and continual practice of numerous Japanese top Go players, such as the celebrated Honinbo Shusaku (1829-1862 A.D.), the playing level was raised immensely. Go became “the national game of Japan” (Smith, 1908). It was in Japan that the first professional Go institution was built and a number of formal tournaments were held 2.

(18) annually. Such development in Japan not only popularized Go in Japan itself, but also to other countries and even the western world. Now, the four leading countries where Go prevails are Korea, China, Japan and Taiwan. However, people that play Go are increasing in other lands such as America and Europe.. 1.2.2 Rules The description in this section is partly extracted from (Jasiek, 1997) and Sensei’s Library1. Briefly speaking, Go is played by two players, Black (makes the first move) and White, by placing one stone of one’s own color in turn on an empty intersection on the board, called goban, of 19×19 grids of lines (Figure 1.1). A move consists of placing one stone of one's own color on an empty intersection on the board. A player may pass his turn at any time and two consecutive passes end the game. Beginners usually play on a 9×9 board for the purpose of training. Another board size of public interest is 13×13. 13×13 Go is more interesting than 9×9 Go to most Go players, because the concept of corner and edge is meaningful in 13×13 Go but not in 9×9 Go (Huang and Yen, 2010).. 1. Sensei’s Library, http://senseis.xmp.net/. 3.

(19) Figure 1.1: A Go board of 19×19 grid of lines, with some played stones.. The complete rules of Go can be summarized to the following four principles. 1. Removing a string without liberty. A liberty is an empty intersection directly adjacent to a stone. A string (or chain) is a single or upwards of two directly adjacent stones of the same color. Any string without a liberty is captured by the opponent. Figure 1.2 gives an example of this principle.. Figure 1.2: An example of “Removing a string without liberty”. Left: The string of three stones has 8 liberties (marked by ▲). Middle: The string has only 1 liberty. Right: White captures the string, without a liberty, by playing at point A. 4.

(20) 2. Prohibiting suicide. A move is illegal if this move has no capture and the string will have no liberty after it is played. Figure 1.3 gives an example of this principle.. Figure 1.3: An example of “Prohibiting suicide”. Point A and B are both Black’s illegal moves. Point C is Black’s legal move because it can capture White’s string marked by ○.. 3. Prohibiting repeating positions. This principle deals with the repetition of a board position in the game of Go. The simplest case is Ko. The more general case, of a longer cycle (the number of moves) between the repeats, is called Superko, which is further defined as Situational Superko and Positional Superko. Figure 1.4 gives an example of this principle.. Figure 1.4: An example of “Prohibiting repeating positions”. Left: The original position. Middle: Black captures a White’s stone. Right: This White’s move is illegal, because it recreates a formal position (the one in the left).. 4. Winning by more territory. A player's territory consists of all the board points he has either occupied or 5.

(21) surrounded by his own color. There are mainly two types of scoring: Territory Scoring and Area Scoring. In Territory Scoring, used in Japanese and Korean rules, each player's score is the sum of her territory plus prisoners (all of the opponent’s captured stones during the game). In Area Scoring, used in Chinese rules, each player's score is the sum of her territory plus the number of her stones on the board. In order to compensate for Black’s advantage of the first move, White is given a certain points, called komi, in scoring. Figure 1.5 gives an example of this principle. Suppose komi=7.5. By Territory Scoring, since (B,W)=(25,25+7.5)=(25,32.5), White wins by 7.5 points. By Area Scoring, since (B,W)=(41,40+7.5)=(41,47.5), White wins by 6.5 points.. Figure 1.5: An example of “Winning by more territory”.. 1.3. Computer Go. The rules of Go are rather simple, but its variations are almost numberless. The state-space complexity of Go is about 10171 (Tromp and Farnebäck, 2006) and the game-tree complexity of Go is about 10360 (Allis, 1994), as shown in Table 1.1. Also, Go was proved to be PSPACE-hard (Lichtenstein and Sipser, 1978). Such high complexity makes Go a grand challenge for artificial intelligence.. 6.

(22) Game Tic-tac-toe Checkers Chess Chinese chess Shogi Go. State-space. Game-tree. complexity. complexity. 103. 105. 20. 31. Solved in 2007. 10. 10. 123. Programs > best humans. 1048. 10150. Programs ≈ best humans. 71. 226. Programs < best humans. 10. 47. 10. 10. 10. 1071. 10360. Status Solved manually. Programs << best humans. Table 1.1: Complexities of some well-known games. Research into computer Go started around 1970 (Zobrist, 1970). Although the whole search scheme of minimax procedure in most of the computer games has been a success story, it does not work well for the game of Go. The main problems are the difficulty of designing a feasible evaluation function and the huge search space that is considerably larger than other games (Bouzy and Cazenave, 2001; Müller, 2002). Consequently, before the year 2006, the traditional approach to writing a Go-playing program was to build many handcrafted and independent modules, each realizing a concept of certain Go knowledge, such as group (Chen, 1989) and life-and-death (Chen and Chen, 1999), then integrate them into a single knowledge-based expert system. Developing a competitive Go-playing program in those days, as a result, took a great deal of time and required a great deal of Go knowledge. But thanks to large money prize offered by tournaments such as the Ing Cup and FOST Cup (Fotland, 1996), computer Go became popular since the 1980s. World championship competitions drove steady increasing in playing strength among the top programs, including THE MANY FACES OF GO by David Fotland, GO INTELLECT by Ken Chen, GO++ by Michael Reiss, GOLIATH by Mark Boon, JIMMY by Shi-Jim Yen (Yen, 1999), etc. Among these traditional Go programmers, the most well-known is Chen Zhixing, renowned for developing HANDTALK (afterwards known as GOEMATE), the generally 7.

(23) accepted strongest Go-playing program in the 1990s. In 1997, HANDTALK won a match of 11 handicap stones against a 9-year-old, amateur 6-dan Go player. 11 handicap stones away from amateur 6-dan is approximately amateur 5-kyu level, which is far weaker than the top human level. So, that’s why Go-playing programs were never, in a real sense, considered to be strong until the very year 2006, when the brand new search scheme Monte Carlo Tree Search (MCTS) (Coulom, 2006) and Upper Confidence bounds applied to Trees (UCT) (Kocsis and Szepesv´ari, 2006) appeared on the scene. The revolution of MCTS and UCT promoted progress of computer Go to such a degree that people began to believe that after ten2 or twenty3 years, Go-playing programs will be able to defeat the top human players.. 1.4. Summary of the Contributions. In this dissertation, we study and investigate several new heuristics of Monte Carlo Tree Search (MCTS) which had been tested in our Go-playing program ERICA. Excluding from the technical and engineering details, our work can be summarized to two contributions. The first contribution is Monte Carlo Simulation Balancing (SB) applied to 9×9 Go. SB is an algorithm to train the parameters of the simulation. It was proposed in 2009, but only practiced on small board sizes. Our experiments are the first to demonstrate its effectiveness in 9×9 Go by showing that SB surpasses the well-known supervised learning algorithm Minorization-Maximization (MM) by about 90 Elo. The second contribution is systematic experiments of various time management. 2. The 10 years prediction is maintained by Professor Jaap van den Herik in Tilburg Centre for. Creative Computing (TiCC) of the Tilburg University, Netherlands. 3. One of the proponents of the 20 years prediction is David Fotland, the author of THE MANY FACES OF GO. 8.

(24) schemes for 19×19 Go. The results indicate that clever time management algorithms can considerably improve playing strength. All the experiments were performed on our Go-playing program ERICA, the winner in the 19×19 Go tournament at the 2010 Computer Olympiad (Fotland, 2010), which is a strong confirmation of the effectiveness of these new heuristics.. 1.5. Organization of the Dissertation. The organization of this dissertation is as follows. Chapter 1 gives an introduction of computer games, the game of Go, computer Go, a summary of the contributions in this research and the organization of the dissertation. Chapter 2 presents the background and related work of this research. It introduces Monte Carlo Go, explains Monte Carlo Tree Search (MCTS) and Upper Confidence bounds applied to Trees (UCT), and surveys some of the start-of-the-art Go-playing programs as well as their contributions. Chapter 3 introduces our Go-playing ERICA. We narrate its development history and standings in the tournaments that we have participated, and introduce the framework of the program. Chapter 4 presents our first contribution: applying SB to 9×9 Go. Chapter 5 shows the second contribution: time management schemes utilized in 19×19 Go. Finally, conclusions and proposals for future work are given in Chapter 6.. 9.

(25) Chapter 2 Background and Related Work. In this Chapter, we introduce the background and related work of this research. Section 2.1 introduces the progress of Monte Carlo Go until the development of Monte Carlo Tree Search (MCTS) and Upper Confidence bounds applied to Trees (UCT). Section 2.2 explains MCTS and its four stages along with the related work. Section 2.3 explains UCT, which was mainly proposed for the first stage (selection) of MCTS. Finally, Section 2.4 surveys a number of state-of-the-art Go-playing programs as well as their contributions.. 2.1. Monte Carlo Go. The idea of Monte Carlo Go was at the very beginning introduced by Brügmann (Brügmann, 1993). In his paper “Monte Carlo Go”, Brügmann proposed an algorithm which attempts to find the best move by simulated annealing, without including any Go knowledge, except the rule “do-not-fill-eye” in the simulation. Based on Abramson’s expected-outcome model (Abramson, 1990), a position is evaluated by the average score of a certain number of simulations (random games) played from that position on. Remarkably, by this approach, Brügmann’s program GOBBLE achieved a playing strength of about 25-kyu on a 9×9 board. In 2003, on the basis of Brügmann’s work, Bouzy started to make some experiments on Monte Carlo Go (Bouzy, 2003; 10.

(26) Bouzy and Helmstetter, 2003) and accordingly built a new version of his program INDIGO. In the next few years, Bouzy and Chaslot proceeded to bring forward not a few groundbreaking ideas, such as Bayesian generation of patterns for 19×19 Go (Bouzy and Chaslot, 2005), Progressive Pruning and its variants Miai Pruning (MP) and Set Pruning (SP) (Bouzy, 2005a), History Heuristic and Territory Heuristic (Bouzy, 2005b) and Enhanced 3×3 patterns by reinforcement learning (Bouzy and Chaslot, 2006). It was based on these preliminary works on Monte Carlo Go that the significant breakthrough of Monte Carlo Tree Search (MCTS) (Coulom, 2006) and Upper Confidence bounds applied to Trees (UCT) (Kocsis and Szepesv´ari, 2006) independently came to realize in 2006.. 2.2. Monte Carlo Tree Search (MCTS). Monte Carlo Tree Search (MCTS) (Coulom, 2006) is a kind of best-first search that tries to find the best move and to keep the balance between exploration and exploitation of all moves. MCTS was firstly implemented in CRAZY STONE, the winner in the 9×9 Go tournament at the 2006 Computer Olympiad. Together with the emergence of UCT (Kocsis and Szepesv´ari, 2006), the huge success of MCTS stimulated profound interest among Go programmers. So far, many enhancements of MCTS have been proposed and developed, such as Rapid Action Value Estimation (RAVE), proposed by (Gelly and Silver, 2007; Gelly and Silver, 2011), and progressive bias, proposed by (Chaslot et al., 2007), to strengthen its effect. Plenty of comprehensive studies were also focused on the policy and better quality of the playout (Coulom, 2007; Chaslot et al., 2009; Hendrik, 2010). MCTS is commonly classified into four stages (Chaslot et al., 2007): selection, 11.

(27) expansion, simulation and backpropagation, as shown in Figure 2.1. The operation of MCTS consists in performing these four stages ever and again as long as there is time left. The repeated four stages of MCTS and the related work are described in the following subsections.. Expansion. Selection. Backpropagation. Simulation. Figure 2.1: The scheme of MCTS.. 2.2.1 Selection The first stage selection is intent on selecting one of the children, according to a selection function (or selection formula), of a given node and repeats from the root node until the end of the tree. Figure 2.2 gives an example. The selection strategy UCT and the various selection functions adopted by different Go-playing programs will be independently investigated in Section 2.3.. 1. 2. 5. 6. 3. 7. 4. 8. 9. Figure 2.2: The first stage of MCTS: selection. Node 1 (Root) selects Node 2 then Node 2 selects Node 6, which reaches the end of the tree.. 2.2.2 Expansion The second stage expansion is to create a new child node, corresponding to one of the legal moves of the parent node, and store this new node to the memory to “expand” 12.

(28) the tree. Figure 2.3 gives an example. The simplest scheme of expansion is to create a new node in the first visit of a leaf node (Coulom, 2006). However, for RAVE, it is necessary to create all the child nodes in preparation for updating the RAVE statistics in the fourth stage backpropagation. To reduce this memory overhead, a popular solution is delayed node creation, namely to expand a node in the nth (n>1) visit. The NOMITAN team has reported some effective variants of delayed node creation (Yajima et al., 2010). To raise the performance of RAVE, it is suggested to assign a prior value to each created node (Gelly and Silver, 2007). If many features are taken into account for the computation of a prior value, node creation can be costly and slow. To speed up node creation in multithreaded environment, FUEGO uses an independent, thread-specific, memory array for node creation (Enzenberger and Müller, 2009).. 1. 2. 5. 6. 3. 7. 4. 8. 9. 10 Figure 2.3: The second stage of MCTS: expansion. Node 10, the child node of the leaf node 6, is created and stored to the memory to expand the tree.. 2.2.3 Simulation The third stage simulation is to perform a simulation (also called playout) from the 13.

(29) position represented by the new created node. For delayed node creation, a simulation is simply performed from the leaf node. In MCTS, a simulation is carried out by Monte Carlo simulation composed of random or pseudo-random moves. This is the reason for the name “Monte Carlo Go” and “Monte Carlo Tree Search”. When the random game is completed, the final position is scored4 to decide the winner. Then the associated outcome 0/1 is passed to the tree to indicate loss/win of this simulation. Figure 2.4 gives an example.. 1. 2. 5. 6. 3. 7. 4. 8. 9. 10. Simulation outcome=0/1 Figure 2.4: The third stage of MCTS: simulation. After Node 10 was created, a Monte Carlo simulation is performed from the position represented by this node. Finally, the outcome 0/1 is returned to indicate loss/win of this simulation.. Simulation is the most crucial step of MCTS. In general, there are mainly two 4. For the game of Go, a simulation is usually scored by the Chinese rules. 14.

(30) types of Monte Carlo simulation among the current strong Go-playing programs. The first type, called Mogo-type, sequence-like or fixed-sequence simulation (Gelly et al., 2006), also being called Mogo’s magic formula, is used by MOGO, PACHI, FUEGO, and many strong Go-playing programs. Mogo-type, sequence-like simulation will be further investigated in section 2.4.2. The second type, called CRAZY STONE-like, probabilistic simulation, being called CRAZY STONE’s update formula (Teytaud, 2011) that allows more flexibility, was proposed by Rémi Coulom (Coulom, 2007) and is being used by CRAZY STONE, AYA and our Go-playing program ERICA. ZEN was reported to use a mixed type of simulation between Mogo-type and CRAZY STONE-like (Yamato, 2011). CRAZY STONE-like simulation will be further investigated in the next chapter. Recent research on simulation centers on two directions. The first direction is to balance the simulations in the framework of Boltzmann softmax playout policy with the trained feature weights which will be discussed in Chapter 4. The second direction is to improve the playout policy by letting the simulations learn from itself, according to the results of the previous simulations (Drake, 2009; Hendrik, 2010; Baier and Drake, 2010) or the statistical data accumulated in the tree (Rimmel et al., 2010). Such dynamic or adaptive scheme for the simulation is being called adaptive playout.. 2.2.4 Backpropagation The fourth stage backpropagation is to propagate the simulation outcome 0/1 from the new created node, along with the path decided in the selection stage, to the root node. Each node in this path updates its own statistical data by the simulation outcome. Figure 2.5 gives an example. In backpropagation, it is possible to update other statistical data by the information collected from the simulation, to obtain a faster estimation of the child 15.

(31) nodes. For instance, with RAVE (Gelly and Silver, 2007) or other kinds of AMAF (All-Moves-As-First) (Brügmann, 1993; Helmbold and Wood, 2009) a node updates all the moves that were played in the tree and the simulation after the position represented by this node. Some researchers also tried to assign heavier weights to the later simulation outcomes when the tree grows larger (Xie and Liu, 2009), under the assumption that the larger the sub-tree the more promising the simulation outcome.. 1. 3. 2. 5. 6. 7. 4. 8. 9. 10 Simulation outcome=0/1. .. Figure 2.5: The fourth stage of MCTS: backpropagation. The simulation outcome 0/1 is propagated from Node 10 along with the path in the selection stage (Node 6 and Node 2) to the root node (Node 1). Each node updates its own statistical data by the simulation outcome.. A recent topic in backpropagation which calls for much attention is dynamic komi. Dynamic komi was proposed to cure the awful performance of MCTS in handicap games on the 19×19 board. The objective under current structure of MCTS is to maximize the winning rate rather than score. So, MCTS works best if the winning rate of the root node is close to 50%, because it is the very occasion that the 16.

(32) simulation outcomes can reflect good and bad moves to the maximum degree. In the case that the winning rate is close to 100% (the case of 0% can be deduced in the same way), MCTS becomes reluctant to explore (since a 0.5 point win or a 20.5 points win are of the same outcome) and incapable to discriminate between good and bad moves. After all, Monte Carlo simulation is more or less biased and far from perfection. This problem becomes particularly apparent in the handicap games against strong human players, as a result of the huge and early advantage offered by the handicap stones. Figure 2.6 gives a practical example. This position is selected from the exhibition game at the 2010 Computer Olympiad, Rina Fujisawa (White) vs. ERICA (Black), with 6 handicap stones. The stone marked by ∆ is the last move. Point A (extending) is a mandatory move for Black in this case but ERICA played at B, a clearly bad move, and showed over 80% winning rate.. Figure 2.6: The exhibition game at the 2010 Computer Olympiad: Rina Fujisawa (White) vs. ERICA (Black), with 6 handicap stones. White won by resignation.. 17.

(33) The main idea of dynamic komi is to adjust the komi value, by the averaged score derived from the last search, in order to shift the winning rate of the root node closer to 50%. ZEN, THE MANY FACES. OF. GO and PACHI5 have been reported to. benefit from dynamic komi, although each has a different approach.. 2.3. Upper Confidence Bound Applied to Trees (UCT). Upper Confidence bound applied to Trees (UCT) (Kocsis and Szepesv´ari, 2006) is the extension of the UCB1 strategy (Auer et at., 2002) to minimax tree search. The deterministic UCB1 algorithm or policy was designed to solve the Multi-Armed Bandit problem (Auer et al., 1995) and ensures that the optimal machine is played exponentially more than any other machine uniformly when the rewards are in [0,1]. In MCTS, UCT is mainly served as a selection function in the first stage of MCTS and, in general, can be viewed as a special case of MCTS. Under the formulation of UCT, the selection in each node is similar to the Multi-Armed Bandit problem (Coquelin and Munos, 2007). It aims to find the best move and in the meantime keep the balance between the exploration and exploitation of all moves. MOGO was the first Go-playing program that successfully applied UCT (Gelly et al., 2006). The strategy of UCT is to choose a child node which maximizes the selection formula (2.1):. UCT  v uct  C uct . log N uct n uct. (2.1). where vuct is the value of this node, nuct is the visit count of this node and Nuct is the visit count of the parent node. Cuct is a constant, which has to be tuned empirically. 5. The author of PACHI, Petr Baudiš, described his successful implementation of dynamic komi in the draft of his paper “Balancing MCTS by Dynamically Adjusting Komi Value”. 18.

(34) The latter part of the formula (2.1) is usually called “exploration term” for the purpose of balancing the exploration and exploitation. The strategy of UCT was very quickly found not feasible to the game of Go, because it requires that each child node must be visited at least once. Even on a 9×9 board, the branching factor, 81 for the node representing the empty position, is still too large to do such complete search. To remedy this flaw, Rapid Action Value Estimation (RAVE) was proposed (Gelly and Silver, 2007; Gelly and Silver, 2011). RAVE is a kind of the heuristic AMAF (All-Moves-As-First) (Brügmann, 1993; Helmbold and Wood, 2009) that updates all the moves which were played in the tree and the simulation after the position represented by this node. The strategy RAVE is to choose a child node which maximizes the selection formula (2.2):. RAVE  v rave  C rave . log N rave n rave. (2.2). where vrave is the RAVE value of this node, nrave is the RAVE visit count of this node and Nrave is the RAVE visit count of the parent node. Crave is a constant, which has to be tuned empirically. Blending UCT with RAVE, the strategy UCT-RAVE is to choose a node which maximizes the selection formula (2.3):. UCT - RAVE  Coefficient  RAVE  (1 - Coefficient)  UCT. (2.3). where Coefficient is the weight of RAVE (Gelly and Silver, 2007; Silver, 2009). In the past few years, many efforts have been paid to improve the selection function based on the strategy UCT-RAVE. Some new ideas and the various selection functions adopted by different Go-playing programs are listed as follows. 1. Chaslot et al. proposed two progressive strategies for the selection stage and 19.

(35) measured a significant improvement from 25% to 58% (200 games) on their program MANGO against GNU GO 3.7.10 on 13×13 board (Chaslot et al., 2007). The first strategy is progressive unpruning, also called progressive widening (Coulom, 2007), which gradually unprunes the child nodes according to their scores computed by the selection function. The other substantial strategy is progressive bias, realized as an independent term added behind the selection formula aiming to direct the search according to time-expensive heuristic knowledge. 2. Chaslot et al. presented a selection formula combing online learning (bandit module), transient learning (RAVE values), expert knowledge and offline patter-information (Chaslot et al., 2009), which is being used in their program MOGO. 3. Silver, in his Ph.D. dissertation, based on the experiments on MOGO (Silver, 2009), suggested to take off the exploration terms of both UCT and RAVE, namely set Cuct and Crave to 0. 4. Rosin proposed a new algorithm PUCB under the assumption that contextual side information is available at the start of the episode (Rosin, 2010). 5. Tesauro et al. proposed a Bayesian framework for MCTS that allows potentially much more accurate (Bayes-optimal) estimation of node values and node uncertainties from a limited number of simulation trials (Tesauro et al., 2010). 6. THE MANY FACES. OF. GO is using the formula (2.4) in collaboration with. progressive widening (Fotland, 2011):. (1  beta)  (vuct  C uct . beta . log N uct )  beta v rave  mfgo _ bias n uct. 500 500  3  N uct 20. (2.4).

(36) where mfgo_bias is unchanging, per move, within a range of about ±2%, based on the quality of the move estimated by the move generator of THE MANY FACES OF GO. 7. AYA is using the formula (2.5) in collaboration with progressive widening (Yamashita, 2011):. (1  beta)  (vuct  C uct . beta . log N uct log N rave )  beta (v rave  C rave  ) n uct n rave. (2.5). 100 100  3  N uct. 8. PEBBLES is using the formula (2.6) (Sheppard, 2011): (1  beta )  qUCT  beta  qRAVE. (2.6). where beta is set according to Silver’s dissertation (Silver, 2009). Both qUCT and qRAVE incorporate exploration terms from the Beta Distribution (Stogin et al., 2010). 9. PACHI is using a formula similar to that of AYA, except that Cuct and CRAVE are set to 0 (Baudis, 2011). The “Even game prior” is used to set vuct with 0.5 at n playouts, where n can be between 7 and 40. Another important prior is “playout policy hinter”, which uses the same heuristics (and code) as the playout policy to pick good tree moves.. 2.4. State-of-the-Art Go-Playing Programs. In this section, we survey some start-of-the-art Go-playing programs as well as their contributions.. 2.4.1 Crazy Stone CRAZY STONE was created by Rémi Coulom, the inventor of MCTS (Coulom, 2006), 21.

(37) which has been regarded as the most significant contribution to Computer Go in recent years. At the 2006 Computer Olympiad, CRAZY STONE demonstrated the usefulness and effectiveness of MCTS by the overwhelming victory in the 9×9 Go tournament. In this tournament, CRAZY STONE defeated many senior Go-playing programs such as GNU GO, GOKING, JIMMY, etc, and tied with AYA and Go INTELLECT. In the first UEC Cup in 2007, CRAZY STONE won the exhibition match against Kaori Aoba 4p with 7 handicap stones. This game was described as “very beautiful”. Figure 2.7 shows the final position of this game. CRAZY STONE finally killed the whole White’s big group6 in the center (marked by ×) and secured a solid win. The second great contribution of Coulom is the supervised learning algorithm named Minorization-Maximization (MM) for computing the Elo ratings of move patterns (Coulom, 2007), which will be further investigated in Chapter 4. This learning algorithm is still used by some of the top-level Go-playing programs, such as ZEN and AYA.. Figure 2.7: The final position of the exhibition match: Kaori Aoba 4p (White) vs. CRAZY 6. A group consists of one or more loosely connected strings. 22.

(38) STONE (Black), with 7 handicap stones, in the first UEC Cup, 2007. Black won by resignation.. Currently, Coulom is again working on CRAZY STONE after a suspension of about 2 years. Right now, CRAZY STONE is rated 4-dan on the KGS Go Server (KGS) on a 24-core machine (account CrazyStone, retrieved at 2011-07-14 T12:12:42+08:007) on the 19×19 board and reached a Bayes-Elo rating (Coulom,2010) of 2914 in Computer Go Server (CGOS) on the 9×9 board (account bonobot, retrieved at 2011-07-14 T12:15:34+08:00).. 2.4.2 MOGO MOGO was created in the beginning by Yizao Wang and Gelly Sylvain, supervised by Rémi Munos. Olivier Teytaud took the lead of the “MOGO team” after Yizao Wang and Gelly Sylvain left. There are several important contributions from the MOGO team. The first and the greatest contribution is applying UCT (Kocsis and Szepesv´ari, 2006), which was invented by Kocsis et al. independently at the same time as Coulom’s MCTS, to computer Go (Gelly et al., 2006). It is widely maintained that the contributions of CRAZY STONE and MOGO collaboratively enable the Monte Carlo Go programs to be competitive with, and stronger than, the strongest traditional Go-playing programs, such as HANDTALK, THE MANY FACES. OF. GO and GO. INTELLECT. The second contribution of MOGO team lies in the Monte Carlo part. The earliest creators of MOGO, mainly Sylvain Gelly and Yizao Wang, designed a sequence-like simulation (Gelly et al., 2006) that still has dominant influence on almost all the current strong Go-playing programs. This sequence-like simulation was further 7. Presented in ISO 8061 date format. 23.

(39) improved by expert knowledge, such as nakade, and heuristics such as “Fill the board” (Chaslot et al., 2009). Figure 2.8 gives the example of sequence-like simulation.. Figure 2.8: An example of sequence-like simulation proposed by MOGO team, cited from the paper “Modification of UCT with Patterns in Monte Carlo Go”.. The main principle of such sequence-like simulation consists in considering the present move by responding to the previous move played by the opponent. Two important responses to the previous move are “save a string by capturing” and “save a string by extending”. For “save a string by capturing”, it means to save the string and put in atari by the previous move, by capturing its directly neighboring opponent string. “Save a string by extending” means to save the string and put in atari by the previous move by extending its liberty. Figure 2.9 gives an example of these responses. The most powerful part of the sequence-like simulation is considering the 3×3 patterns around the previous move. It is generally stated that the 3×3 patterns designed by Yizao Wang and RAVE are the major factors that enabled MOGO to be the solid strongest Go-playing program in the period of the first half of 2007. This sequence-like simulation, handcrafted policy was improved by the offline reinforcement learning from games of self-play (Gelly and Silver, 2007). Gelly and Silver reported that this generated policy outperformed both the random policy and 24.

(40) the handcrafted policy by a margin of over 90%. The third big idea of MOGO team is RAVE (Gelly and Silver, 2007), which is a kind of the heuristic AMAF (All-Moves-As-First) (Brügmann, 1993; Helmbold and Wood, 2009). Presently, RAVE is reported to be utilized in almost every strong Go-playing program. Some authors even reported that RAVE boosts the playing strength of their programs over 200 Elo (Kato, 2008). Other contributions of MOGO include the parallelization of MCTS (Chaslot et al., 2008; Gelly et al., 2009; Bourki et al., 2010), the never-ending learning algorithms for designing automatically an opening book by MCTS (Chaslot et al., 2009; Audouard et al., 2009; Gaudel et al., 2010) and so on. MOGOTW, a joint project between the MOGO team and a Taiwanese team, led by Chang-Shin Lee, composed of several Taiwanese universities and organizations, won the first 9×9 game as Black against the top professional Go player Chun-Hsun Chou 9p, the winner of the international professional Go tournament LG Cup 2007, in the Human vs. Computer Go Competition at WCCI 2010. Figure 2.10 shows the final position of this game.. 25.

(41) Figure 2.9: An example of “save a string by capturing” and “save a string by extending”. The previous move is marked by ∆. The string marked by × was put in atari by the previous move. Point A is “save a string by capturing” and point B is “save a string by extending”.. Figure 2.10: The final position of the match: Chun-Hsun Chou 9p (White) vs. MOGOTW (Black), with komi 7.5. The stone marked by ∆ is the last move. Black won by resignation.. 2.4.3 GNU GO GNU GO is an open-source Go-playing program authored by many people. The first version of GNU GO was released at March 13th, 1989. It is still the most popular Go-playing program among many internet Go servers and the most common 26.

(42) experimental test bed in the field of computer Go. Before the rising of MCTS and UCT, GNU GO was among the strongest Go-playing programs. It won the gold medal in the 19×19 Go tournament at the 2003 Computer Olympiad and silver medal in the 9×9 Go tournament at the 2004 Computer Olympiad. On the Bayes-Elo rating system of the Computer Go Server (CGOS), the rating of GNUGO-3.7.10-A0 on the 19×19 board is 1800 and the top ranked program is ZENGG-4X4C-TST, rated 2839 (retrieved at 2011-07-17 T17:22+08:00). This fact shows that in the past seven years, from 2004 to 2011, the improvement of the strongest Go-playing program is at least 1000 Elo. Figure 2.11 gives the position in the opening stage of the match between JIMMY (White) and GNU GO (Black) in round 1 in the 19×19 tournament at the 2003 Computer Olympiad. This example shows that both GNU GO and JIMMY can play very good pattern shapes in the opening stage.. Figure 2.11: Round 1 at the 2003 Computer Olympiad: JIMMY (White) vs. GNU GO (Black), with komi 6.5. The stone marked by ∆ is the previous move. Black won by resignation.. 27.

(43) 2.4.4 FUEGO FUEGO (Enzenberger et al., 2010) was created by Markus Enzenberger, Martin Müller and Broderick Arneson of the computer Go research group led by Martin Müller at the University of Alberta, Canada. FUEGO won the first 9×9 game as White against the top professional player Chun-Hsun Chou 9p in the Human vs. Computer Go Competition of 2009 IEEE International Conference on Fuzzy Systems. In 2010, FUEGO, running on a big shared memory machine at IBM with 112 threads (Segal, 2010), won the 4th UEC Cup. Figure 2.12 shows the position in the middle game of the final match between ZEN (White) and FUEGO (Black). FUEGO‘s previous move, marked by ∆, was a severe and strong cut aiming to kill the White’s group marked by ×. After winning this big semeai, Fuego secured the leading to the end of this game and won the 4th UEC Cup.. Figure 2.12: A position of the final match in 4th UEC Cup: ZEN (White) vs. FUEGO (Black). Black won by resignation.. Another contribution of FUEGO team is the release of the tool GoGui for 28.

(44) Go-playing program developers. GoGui allows direct communication to a Go engine by a command shell. It also provides a mechanism of automatic playing between any two Go-playing programs through Go Text protocol (GTP). Another convenience supplied by GoGui is the visualization of numerous features for the user-specific GTP commands.. 2.4.5 The Many Faces of Go THE MANY FACES. OF. GO (Fotland, 1993; Fotland, 2002) was developed by David. Fotland, an American professional Go programmer who has worked on computer Go for over 25 years. THE MANY FACES OF GO is a commercial Go-playing program. It was among the strongest traditional Go-playing programs, then transformed to a Monte Carlo Go program mixed with the old engine. THE MANY FACES. OF. GO has outstanding achievement in tournaments and has. been a competitive program since the 1980s. The older version that did not use MCTS won the 21st Century Cup in 2003 and the 1998 Ing Cup World Championship. At the 2008 Computer Olympiad, THE MANY FACES OF GO won both the 9×9 and 19×19 Go tournaments. It also won the gold and bronze medals in the 13×13 Go and 19×19 Go tournaments at the 2010 Computer Olympiad. In the tournaments of KGS Go Server (KGS), THE MANY FACES OF GO has always been a participant at the top of the list. Figure 2.13 shows the game between ZEN (White) and THE MANY FACES OF GO (Black) in round 7 in the 19×19 Go tournament at the 2010 Computer Olympiad. The stone marked by ∆ is the last move when ZEN resigned. In this position, Black can either play at A or B to secure the center group marked by ○. It clearly shows the strong life-and-death and defense capabilities of THE MANY FACES. OF. GO (Fotland,. 2002) under ZEN‘s continuously large-scale, fierce attack toward the center group.. 29.

(45) Figure 2.13: The final position of the match in round 7 in the 19×19 Go tournament at the 2010 Computer Olympiad: ZEN (White) vs. THE MANY FACES OF GO (Black). Black won by resignation.. 2.4.6. ZEN. ZEN is a Japanese commercial Go-playing program created by Ojima Yoji (nickname Yamato), collaborating with Hideki Kato on cluster parallelization. ZEN was the winner in the 19×19 Go tournament at the 2009 Computer Olympiad. It is now doubtlessly the strongest Go-playing program (up to July 13th, 2011). On the KGS Go Server (KGS), ZEN is the only program that stands firm in 5-dan (account Zen19D, as shown in Figure 2.14, retrieved at 2011-07-14 T12:53:01+08:00) in blitz games and 4-dan (account Zen19S, retrieved at 2011-07-14 T12:53:28+08:00) in longer games, running on a 26-core cluster. It also won all the tournaments of KGS Go Server (KGS) that it had participated up to July, 2011.. 30.

(46) Figure 2.14: KGS Rank Graph for Zen19D. Good at fighting has long been the main feature of ZEN. The success story of ZEN clearly demonstrates the efficacy of heavy and informative playouts, RAVE and larger patterns in the tree (Coulom, 2007), which were reported to contribute to ZEN’s playing strength. In the 4th UEC Cup in 2010, ZEN won the exhibition match against Kaori Aoba 4p with 6 handicap stones. Figure 2.15 shows the final position of this game. In this game, ZEN thoroughly showed its strong capabilities of attack through the whole game. It finally killed the White’s big group marked × and came off with a great victory.. 31.

(47) Figure 2.15: The final position of the exhibition match in the 4th UEC Cup: Kaori Aoba 4p (White) vs. ZEN (Black). Black won by resignation.. In the Computer Go Competition at the 2011 IEEE International Conference on Fuzzy Systems held on June 27-30, 2011, in Taiwan, ZEN defeated the top professional Go player Chun-Hsun Chou 9p with 6 stones. Figure 2.16 shows the final position of this game. The stone marked by ∆ is the last move. In this game, Chun-Hsun Chou 9p consumed only 10 minutes totally, and he explained that “because I want to test the performance of ZEN in a fast game and most of ZEN’s moves were exactly what I would play “.. 32.

(48) Figure 2.16: The final position of the match in the Computer Go Competition at the 2011 IEEE International Conference on Fuzzy Systems: Chun-Hsun Chou 9p (White) vs. ZEN (Black), with 6 handicap stones. Black won by 14.5 points.. 2.4.7 Other Programs Here we briefly and selectively introduce some other state-of-the-art and specific programs that are worth mentioning. PACHI by Petr Baudis and Jean-loup Gailly (Baudis and Gailly, 2010) is now the strongest open source program. VALKYRIA by Magnus Persson (Persson, 2010) features heavy and rich knowledge representation in the playout and is specifically competitive on the 9×9 board. LIBEGO by Łukasz Lew (Lew, 2010) is the fastest implementation of MCTS. OREGO by Peter Drake (Drake, 2011) is one of the popular test beds among computer Go researchers.. 33.

(49) Chapter 3 ERICA. In this chapter, we introduce our Go-playing program ERICA. Section 3.1 briefly reviews the development history of ERICA, as well as its standings in some of the tournaments that ERICA participated up to July 2011. Section 3.2 investigates the implementation of MCTS of ERICA along with some of our own ideas. Finally, Section 3.3 gives several examples picked from the games that ERICA played against human players to indicate its strength.. 3.1. Development History. 3.1.1 First Version Created on May 2008 The first version of ERICA was created in May 2008, based on implementing MOGO’s famous “UCT paper” (Gelly et al., 2006). The work was motivated by the impressive performance of CRAZY STONE and MOGO in the 9×9 competition at the 2007 Computer Olympiad. This earliest version of ERICA was written in pure C programming language. The speed was about 20,000 uniform random simulations per second on a single-core CPU of 2.26 GHz, on the 9×9 board. A board is realized by a single array, keeping the related information of a position, such as each string’s color, liberty, owner8 and size.. 8. The owner of a string is the representative stone of it. 34.

(50) MOGO-type, fixed-sequence simulation and RAVE form the basic MCTS framework of the program. In this period, the personal communications with Yizao Wang, one of the creators of MOGO, helped considerably for the author’s understanding of the “UCT paper” and to make ERICA stronger. In the Computational Intelligence Forum & World 9 × 9 Computer Go Championship held on September 25-27, 2008, in Taiwan, ERICA ended up in the 4th position. Table 3.1 shows the result of this competition. In this tournament, ERICA won a game against GO INTELLECT but lost to JIMMY, the strongest Taiwanese Go-playing program at that time.. Position 1 2 3 4 5 6 7 8 9 10. Program MOGO GO INTELLECT JIMMY ERICA FUDO GO CPS GOSTAR GOKING HAPPYGO CHANGJUAN1. Wins 10 6 6 6 6 6 4 4 2 0. Country France America Taiwan Taiwan Japan Taiwan Taiwan Taiwan Taiwan Taiwan. Table 3.1: The result of the Computational Intelligence Forum & World 9×9 Computer Go Championship held on September 25-27, 2008, in Tainan, Taiwan.. In the 9×9 Go tournament at the 2008 Computer Olympiad held on September 28 to October 5, 2008, in Beijing, China, ERICA finished in 11th place among the 18 participants. Figure 3.1 shows the game between ERICA (White) and AYA (Black) in round 2. In this game, thanks to the correct handling of seki in the playout, ERICA reversed the bad situation and won.. 35.

(51) Figure 3.1: The final position of the match in round 2 in the 9×9 Go tournament at the 2008 Computer Olympiad: ERICA (White) vs. AYA (Black). The stone marked by ∆ is the last move. The groups marked by ○ form a seki. White won by resignation.. At the 2009 Computer Olympiad held on May 10-18, 2009, in Pamplona, Spain, ERICA participated in the 9×9 Go tournament with the same version that played in the previous year. Finally, ERICA scored the 6th position among the 9 participants.. 3.1.2 Second Version Created on June 2009 In June 2009, a new version of ERICA was created, under the supervision of Rémi Coulom. The main advancement in this new version consists in the implementation of the Boltzmann softmax playout policy that was successful in Coulom’s CRAZY STONE. In addition to RAVE, prior information was taken into account in the formulation of progressive bias (Chaslot et al., 2007). The supervised learning algorithm Minorization-Maximization (MM) (Coulom, 2007) was used to train the pattern weights. At the 2009 TAAI Computer Go Tournament held on October 30-31, 2009, in Taiwan, ERICA won the 3rd and 2nd position in the 9×9 Go (Table 3.2) and 19×19 Go (Table 3.3) tournaments respectively.. 36.

(52) Position 1 2 3. Program ZEN MOGO ERICA. Country Japan France Taiwan. Table 3.2: The result of the 9×9 Go tournament at the 2009 TAAI Go Tournament.. Position 1 2 3. Program ZEN ERICA DRAGON. Country Japan Taiwan Taiwan. Table 3.3: The result of the 19×19 Go tournament at the 2009 TAAI Go Tournament.. In the next month, ERICA participated in the 3rd UEC Cup held on November 28-29, 2009, in Japan, and at last scored the 6th position. In the round 2 of this tournament, ERICA for the first time defeated the well-known strong 19 × 19 Go-playing program AYA, as shown in Figure 3.2. The last move is marked by ∆. In this game, ERICA (Black) killed several White’s groups, marked by ×, and acclaimed a great victory.. Figure 3.2: The final position of the match in round 2 of the 3rd UEC Cup: ERICA (White) vs. 37.

(53) AYA (Black).. Black won by resignation.. 3.1.3 Third Version Created on February 2010 In February 2010, Łukasz Lew, the author of. LIBEGO. was a great help in speed. optimization of ERICA. The author re-wrote many primary data structures and created a new version of ERICA, under the supervision of Coulom. For instance, macros, a sort of preprocessor in C programming language, were used extensively for loop unrolling. The speed of the simulation was accelerated by a factor of 2 compared to the previous version. In this period, we concentrated on 19×19 Go, trying hard to make use of larger patterns in the tree and improve the quality of the playout. At the 2010 Computer Olypmiad held on September 24 to October 2, 2010, in Kanazawa, Japan, ERICA won the gold and silver medals in the 19×19 Go (Fotland, 2010) and 9×9 Go tournaments respectively. In the 19×19 Go tournament, after the final round is finished, three programs, ZEN, THE MANY FACES OF GO and ERICA were in a tie. The final positions were decided in the second playoff, when both ZEN and ERICA defeated THE MANY FACES OF GO and ERICA defeated ZEN. This indicates that the three programs were competitive in playing strength. Figure 3.3 shows the final match between ZEN (White) and ERICA (Black). This game was decided by a large-scale semeai in the opening stage. ZEN misread the semeai so that ERICA killed White’s big group (marked by ×) and secured the lead until the end.. 38.

參考文獻

相關文件

Suggestions to Medicine Researchers on Using ML-driven AI.. From Intelligence to Artificial Intelligence.. intelligence: thinking and

• But Monte Carlo simulation can be modified to price American options with small biases..

We focus on this part and propose a search method called pre-selected-pulses replacement method to replace the focused search method in G.729 to reduce the complexity for

• Appearance: vectorized mathematical code appears more like the mathematical expressions found in textbooks, making the code easier to understand.. • Less error prone: without

• But Monte Carlo simulation can be modified to price American options with small biases (pp..

此行文字的特別意義,是讓 MATLAB 藉由 lookfor 指令 搜尋並顯示此函式用途。.. 語法:

ScoreFlag 是否是 0,若為 0 則分數扣四然後 ScoreFlag 改為 1,這樣如果待在帶刺樓梯 上的話,就會因為 ScoreFlag 是 1 而無法進行扣分,而一離開樓梯則

• The randomized bipartite perfect matching algorithm is called a Monte Carlo algorithm in the sense that.. – If the algorithm finds that a matching exists, it is always correct