Effect of Using Real Declarer - 機器學習於合約橋牌叫牌上之應用

In the previous experiments, we assume for simplicity that the player who can win more rounds for the contract is the declarer when we generate the cost vector c. This is different from the setting of a real bridge game, where the player from the bid-winning team who called the trump suit first become the declarer. Table 4.4 shows the average cost per deal of the best proposed model and the Wbridge5 software when we use the real declarer. We can observe that the performance of the proposed model and the Wbridge5 software decrease about the same amount. This shows that the effect of the declarer is minor under our problem setting.

We think that there are two reasons for making declarer less important. First, since the declarer only influence the player who play the first card, most deals in the data are declarer independent. That is, the North and the South players usually win the same number of rounds for most of the contracts in a deal. Second, whereas the information revealed to the opponent team varies for different declarers in a real bridge game, the double dummy analysis is based on perfect information and thus not influenced.

Chapter 5 Conclusions and Future Works

We formally defined the problem of bridge bidding without competition by learning, and proposed an innovative model for undertaking this problem. The model predicts a bidding sequence with layers of classifier (bidding) nodes, and trains each classifier with the aid of UCB algorithms for contextual bandit. The UCB algorithms allow the machines to learn their own bidding system by balancing the exploration for less-considered bids and the exploitation of well-learned bids. We show in experiments that the proposed model can achieve a performance similar to the champion-winning program in the computer bridge.

Our initiative justifies the possibility that machine learning may be able to do better than human-designed bidding systems on bridge bidding problem.

As an initiative of bidding by learning, the proposed model has reached promising performance. One possible direction on improving the model is to use more data to train a deeper model, which hopefully improve the performance of the model towards valuable contracts such as the . The ultimate challenge is the other sub-problem: bid-ding with competition by learning. Such a challenge may call for a mixture of the proposed model (collaboration between teammates) and well-studied models for competition-based games such as Chess.

Appendix A

Table of Opening Bids

Table A.1 compare the opening bids of the best tree model with ℓ = 4 and ℓ = 6 with the SAYC bidding system [22], which is widely used by human players. The opening bids of the proposed model is generated by enumerating and predicting for all the combinations of features. As the prediction of the proposed model is made by CSC classifiers, there is no explicit rule for each opening bid. Instead, an approximate rule is provided in the table.

Several observations can be made from Table A.1. First, the opening rules of the proposed model is very different from the SAYC bidding system. This shows that the bidding methods learned by computer may be dissimilar to a human designed one. Second, whereas the terminal opening bids (

{1NT, · · · }) of the two tree models are similar, the

non-terminal opening bids (

{, · · · , 1♠}) are completely different. This shows a property

of the proposed model. For terminal bids, a deterministic estimation of the reward can be generated from the cost vector c, thus the corresponding CSC classifiers learned each time are similar. On the other hand, there is a randomness in the learning process of the non-terminal bids, thus the CSC classifiers learned each time could be very different. Third, the

“Not used” bids in the proposed model show that the bidding process is not fully utilized in the proposed model. There is still a room for improvement if we can further enhance the information exchanging process.

Table A.1: Table of Opening Bids

Bid Tree model, ℓ = 4 Tree model, ℓ = 6 SAYC

 0-11 HCP 0-12 HCP 0-11 HCP

1♣ 10-19 HCP, no many♡ 9-19 HCP, 4-6♡ 12+ HCP, 3+♣

1♢ Not Used 8-18 HCP, short♠ and 4-6 ♣ 12+ HCP, 3+♢

1♡ 9-19 HCP, 4-6♡ 12-23 HCP, w/o long suit 12+ HCP, 5+♡

1♠ 16-23 HCP, near balanced 10-19 HCP, 4-6♠ 12+ HCP, 5+♠

1NT Not used Not used 15-17 HCP, Balanced

2♣ 0-17 HCP, long♣ 0-17 HCP, long♣ 22+ HCP

2♢ 0-17 HCP, long♢ 0-17 HCP, long♢ 5-11 HCP, 6+♢

2♡ 0-13 HCP, long♡ 0-13 HCP, long♡ 5-11 HCP, 6+♡

2♠ 0-13 HCP, long♠ 0-13 HCP, long♠ 5-11 HCP, 6+♠

2NT Not used Not used 20-21 HCP, balanced

3♣ 14-19 HCP, long♣ 15-19 HCP, long♣ 5-11 HCP, 7+♣ 3♢ 14-19 HCP, long♢ 15-19 HCP, long♢ 5-11 HCP, 7+♢

3♡ Not used Not used 5-11 HCP, 7+♡

3♠ Not used Not used 5-11 HCP, 7+♠

3NT 19-29 HCP, w/o a long suit 19-29 HCP, w/o a long suit 25-27 HCP, balanced

4♣ Not used Not used 5-11 HCP, 8+♣

4♢ Not used Not used 5-11 HCP, 8+♢

4♡ 10-29 HCP, long♡ 11-29 HCP, long♡ 8+♡

4♠ 10-29 HCP, long♠ 11-29 HCP, long♠ 8+♠

4NT 27-29 HCP, near balanced 27-29 HCP, near balanced Not used 5♣ 16-27 HCP, long♣ 16-27 HCP, long♣ very long♣ 5♢ 17-25 HCP, long♢ 17-25 HCP, long♢ very long♢

Bibliography

[1] Tuomas Sandholm. The state of solving large incomplete-information games, and application to poker. AI Magazine, 31(4):13–32, 2010.

[2] Michael Bowling, Johannes Fürnkranz, Thore Graepel, and Ron Musick. Machine learning and games. Machine learning, 63(3):211–215, 2006.

[3] Marc JV Ponsen, Jan Ramon, Tom Croonenborghs, Kurt Driessens, and Karl Tuyls.

Bayes-relational learning of opponent models from incomplete information in no-limit poker. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1485–1486, 2008.

[4] Luís Filipe Teófilo, Nuno Passos, Luís Paulo Reis, and Henrique Lopes Cardoso.

Adapting strategies to opponent models in incomplete information games: a rein-forcement learning approach for poker. In Autonomous and Intelligent Systems, pages 220–227. 2012.

[5] Matthew L Ginsberg. Gib: Imperfect information in a computationally challenging game. Journal of Artificial Intelligence Research, 14:303–358, 2001.

[6] Matthew L Ginsberg. Gib: Steps toward an expert-level bridge-playing program. In

Proceedings of International Joint Conference on Artificial Intelligence, pages 584–

593, 1999.

[7] Takahisa Ando and Takao Uehara. Reasoning by agents in computer bridge bidding.

In Computers and Games, pages 346–364. 2001.

[8] Asaf Amit and Shaul Markovitch. Learning to bid in bridge. Machine Learning, 63(3):287–327, 2006.

[9] Lori L DeLooze and James Downey. Bridge bidding with imperfect information. In

IEEE Symposium on Computational Intelligence and Games, pages 368–373. IEEE,

2007.

[10] Ming-Sheng Chang. Building a fast double-dummy bridge solver. 1996.

[11] Alina Beygelzimer, Varsha Dani, Tom Hayes, John Langford, and Bianca Zadrozny.

Error limiting reductions between classification tasks. In Proceedings of the 22nd

International Conference on Machine Learning, pages 49–56, 2005.

[12] Zhi-Hua Zhou and Xu-Ying Liu. On multi-class cost-sensitive learning.

Computa-tional Intelligence, 26(3):232–257, 2010.

[13] Han-Hsing Tu and Hsuan-Tien Lin. One-sided support vector regression for multi-class cost-sensitive multi-classification. In Proceedings of the 27th International

Confer-ence on Machine Learning, pages 1095–1102, 2010.

[14] Wei Li, Xuerui Wang, Ruofei Zhang, Ying Cui, Jianchang Mao, and Rong Jin. Ex-ploitation and exploration in a performance based contextual advertising system. In

Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining, pages 27–36, 2010.

[15] Wei Chu, Lihong Li, Lev Reyzin, and Robert E Schapire. Contextual bandits with linear payoff functions. In International Conference on Artificial Intelligence and

Statistics, pages 208–214, 2011.

[16] John Langford and Tong Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. Advances in Neural Information Processing Systems, 20:1096–1103, 2007.

[17] Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the mul-tiarmed bandit problem. Machine learning, 47(2-3):235–256, 2002.

[18] John Langford and Tong Zhang. The epoch-greedy algorithm for multi-armed ban-dits with side information. In Advances in Neural Information Processing Systems

20, pages 817–824, 2008.

[19] Yves Costel. Wbridge5 bridge software, 2014. URL:

http://www.wbridge5.

com/.

[20] Introduction to bridge scoring, 2005. URL:

http://www.acbl.org/learn/

scoreTeams.html.

[21] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector ma-chines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at

http://www.csie.ntu.edu.tw/~cjlin/

libsvm.

[22] Standard american, 2014. URL:

http://en.wikipedia.org/wiki/

Standard_American.

在文檔中機器學習於合約橋牌叫牌上之應用 (頁 31-0)