• 沒有找到結果。

Chapter 2 Depth-First Backtracking Algorithm with Branch-and-Bound Pruning

2.4 Chapter Conclusion

Previously, an exhaustive search was applied to find the optimal strategy for Mastermind. But it may not be adopted in other harder problems or games because of its huge search time. In this chapter, a more efficient depth-first backtracking

algorithm with branch-and-bound pruning (DBB) for Mastermind in the expected

case is introduced, and an alternative optimal strategy is obtained eventually.

Moreover, an admissible heuristic, which can be applied to various deductive games, is presented as well. From the experimental results, the effect of expanding promising queries during the search is significant to the performance of DBB. Meanwhile, DBB is significantly superior to and is over 25 times faster than the traditional search algorithm. How to design a more precise admissible heuristic is yet another critical issue. Furthermore, it may be interesting to compare our method with other search algorithms or other heuristics mentioned in the previous studies with the consideration of the qualities of solutions and the search time.

Chapter 3 Refined Branch-and-Bound

Algorithm with Speed-up Techniques

Another famous deductive game is AB game, which is popular in Asia and England. However, to date, there have been no optimal expected-case strategies for AB game in formal literature since its appearance. Since the complexity of these deductive games grows at an exponential rate with higher dimensions, DBB can not be directly applied to efficiently solve AB game in the expected case.

In this chapter, a refined branch-and-bound algorithm with speed-up techniques, which is abbreviated to RBB, is demonstrated for AB game in the expected case. This algorithm is based on DBB and three useful techniques such as the incremental update of the lower bounds, the hashing technique, and the reduction of equivalent queries are invented to integrate with it. Therefore, RBB will lead to the hope that the optimal tactic of AB game in the expected case is attained. Section 3.1 reviews our handled problem and compares the search space between Mastermind and AB game. Section 3.2 introduces a refined branch-and-bound algorithm while new techniques and significant improvements are demonstrated here as well. In Section 3.3, some experimental results and discussions are given. Section 3.4 summarizes the

remarkable results in this chapter.

3.1 Introduction

AB game, which is also called “Bulls and Cows” in England, is another popular deductive game around the world for decades as well. Its dimension is 4×10 and there are also two opponents involved in this game, which are called the codemaker and the codebreaker respectively. There are ten symbols appearing in possible secret codes of AB game, e.g., 0, 1, 2, …, and 9. Note that the repeated symbols are not allowed in a single secret code. Thus, there are 10!/(10-4)! = 5040 valid secret codes in AB game.

Meanwhile, the 14 legal responses of AB game, which are [4, 0], [3, 0], [2, 2], [2, 1], [2, 0], [1, 3], [1, 2], [1, 1], [1, 0], [0, 4], [0, 3], [0, 2], [0, 1], and [0, 0], are the same as those of Mastermind. The accurate definitions are exhibited in Chapter 1 and therefore, these descriptions are omitted here.

The search space, which means all possible strategies the codemaker and the codebreaker can adopt, for 4×6 Mastermind and 4×10 AB game is compared in the following equation:

( )

( )

5 12

7

14 10 1296

14 5040 >

×

×

Notice that the upper part of the equation is the search space for 4×10 AB game while the lower one is that for 4×6 Mastermind. Clearly, the search space for 4×10 AB game is far larger than that for 4×6 Mastermind. Moreover, the search space represents the required time to discover an optimal strategy for the codebreaker since the expected number of queries is considered. Hence, it is clear that the difficulty of solving AB game is much harder than that of solving Mastermind.

To the best of our knowledge, the optimal strategy of 4×10 AB game for the codebreaker has never been discovered and meanwhile, its corresponding expected

number of queries has not been determined yet due to its difficulty. In Chapter 2, a fruitful pruning framework, DBB, relied upon the admissible heuristic in the A* search was proposed to solve 4×6 Mastermind. However, it is not capable of solving 4×10 AB game right away since it has much huger search space than 4×6 Mastermind. In this chapter, our goal aims at finding an optimal strategy of 4×10 AB game for the codebreaker to minimize the expected number of queries.

3.2 A Refined Branch-and-Bound Algorithm with Speed-up Techniques

A full search is theoretically conducted to our problem so as to consider the optimal tactic in the expected case. Because DBB can not solve the concerned problem directly, a refined approach based on it is demonstrated. Furthermore, the idea of DBB will be introduced briefly to make this chapter self-contained.

3.2.1 The Fundamental Framework in Terms of Branch-and-Bound Pruning

Although DBB proposed in Chapter 2 can not explore the game tree directly within a reasonable time, it remains a vital basis for us. Therefore, a brief introduction to DBB is still given here.

DBB and the A* search act in a similar way. The A* search is regarded as a tree (graph) search algorithm which looks for a path from an initial state to a final goal with the lowest cost. It will terminate if a best solution is obtained. However, a full search is necessarily engaged in dealing with our problem because we need to calculate the value of the external path length of the game tree. Hence, DBB will carry out a search of the whole game tree and prune the useless states by taking advantage of an admissible heuristic. Notice that a solution described here denotes a

strategy for the codebreaker to identify a secret code with respect to our problem.

Let h denote the cost from the root to the current state and h* be an estimated cost from the current state to a final state. Then, h* is called admissible if it never overrates the cost to reach the final state. In other words, the actual cost is less than or equal to h + h*. It can also be viewed as a theoretical lower bound for the problem we cope with.

DBB first traverses the game tree in depth-first fashion until a final state is reached. It then gets an actual cost s which is initially assigned to be the current-best solution. Note that the actual cost s results from the query q1 in its traversed path.

Afterwards, it soon backtracks to the current state, and picks one of the other queries, e.g., the query q2, and uses an admissible heuristic to estimate the cost h* of q2. The search continues if s is larger than h + h*. Otherwise, a cut happens because s is less than or equal to h + h*. This continues in a similar manner until the full game tree is searched. Figure 4 shows roughly the scenario and Figure 5 exhibits this algorithm.

The current state is what we consider presently. An admissible heuristic will be used to estimate its cost h* and thus, h + h* is compared with the actual cost s to determine whether it should be cut or not.

In accordance with the analyses in Table 1, the search space for AB game is (5040×14)7 ≈ 1034. Figure 8 shows the game tree of AB game by applying DBB directly. The circles in Figure 8 mean the states which are the sets of eligible secret codes while the diamonds are the valid queries the codebreaker can choose (5040 queries in each ply). In the game tree, the 14 branches yielded by the codemaker’s responses should be traversed completely and the 5040 branches expanded by the codebreaker may be pruned by the admissible heuristic since we are aiming at finding an optimal strategy for the codebreaker. Let’s take the situation exhibited in Figure 8 into account. The search to the subtrees of q1 (in bold style) is just finished and q2 is

now considered. An estimated value h* is obtained by using the admissible function.

The subtrees below q2 do not have to be expanded if the result of expanding q1 is better than h*.

Figure 8. The game tree of AB game by applying DBB directly

The admissible heuristic presented in Section 2.2.2.2 with slight modifications of the volume of each legal class is utilized to estimate the lower bounds of the numbers of queries. Likewise, different queries in a certain ply result in different distributions of the eligible codes in the 14 responses. Similarly, the volume of a response [x, y] is also defined as the maximum value of the numbers of the eligible codes when the codemaker responses with [x, y]. The first query made by the codebreaker has only one choice here because all of the queries are equivalent at the first query. As a result,

g = “0123” is selected as the representative for the first query. The numbers of eligible

codes of each class after g is made form these volumes are concluded in Table 6.

From the analyses in Section 2.2.2.2, the actual expected number of queries is thus larger than or equal to the value of estimations. An example to illustrate the

calculation of the EPL about some class (state) is shown in Figure 9. Notice that the only difference between Figure 7 and Figure 9 is their volumes.

Providing a state with a size of 17, as shown in Figure 9, we imagine that the theoretical optimal strategy will distribute the 17 codes into 14 responses evenly without exceeding the corresponding volumes and so does the optimal strategy in each of the following levels of the game tree. The number in the lower half of the circle is the volume of each response and the number in the upper half is the number of secret codes in it.

Since there is 1 leaf at level 1, 13 leaves at level 2, and 3 leaves at level 3, it is obvious that the external path length of the tree is 1×1 + 2×13 + 3×3 = 36. Thus, the actual external path length of a state with a size of 17 must be larger than or equal to 36. The heuristic is therefore admissible because it never overrates the expected number of queries.

Table 6. The volumes of 14 classes in AB game

class [4, 0] [3, 0] [2, 2] [2, 1] [2, 0] [1, 3] [1, 2] [1, 1] [1, 0] [0, 4] [0, 3] [0, 2] [0, 1] [0, 0]

volume 1 24 6 72 180 8 216 720 480 9 264 1260 1440 360

26

17

28 2

9 241 1

72 1

180 1

216 1

264 1

360 1

480 1

720 1

1260 1

1440

16 1

6 1

6

Figure 9. An example of the calculation of the admissible heuristic for AB game

3.2.2 The State-of-the-Art Techniques

The fundamental framework has been reviewed in Section 3.2.1. It has been proven dramatically that the algorithm is highly suitable for addressing deductive games. However, it is not enough to handle AB game in the expected case. Some attributions of the game are observed seriously so that three critical challenges are summarized as follows.

„ How to increase the precision of the lower bound?

„ How to avoid expanding the redundant states?

„ How to prune the equivalent queries?

An optimal strategy will be discovered providing that these challenges are able to be coped with totally. Fortunately, a refined branch-and-bound algorithm with

speed-up techniques (RBB) is designed and three useful techniques contained in it are

introduced and discussed among the follow-up contents.

3.2.2.1 Technique 1: Incremental Updates of the Lower Bounds

During the gaming process, there will be generally 5040 queries for the codebreaker in each ply. When a new state is met, a current best solution s is acquired after DBB undertakes a search to one of the 5040 branches. Thus, DBB has to check other queries and two possible cases are going to take place. One case is that the rated lower bound of the query is less than s, and then the search into it occurs. The other case is that the search will be omitted according to branch-and-bound pruning because

s outperforms this rated lower bound. Obviously, this mechanic of the process comes

up with a new idea naturally. The percentage of the cutoffs is going to increase markedly if the estimated lower bounds become higher by calculating it more accurately. Concrete steps are offered below.

Suppose that the current best solution s is provided by the query g. There is

another query called g* that we analyze now and moreover, s* refers to the lower bound which has been rated by the admissible heuristic H at the beginning. Assume that s* is less than s. It is clear that the subtree yielded by g* has to be explored in accordance with our proposed manner. However, we come up with an idea to update the lower bound incrementally during the exploring process of g* so as to stop searching as soon as possible providing that s* becomes equal to or larger than s. In the detailed considerations, g* divides the current state into 14 classes (responses) so that H is able to rate its external path length (EPL) with the 14 classes. Hence, s* is summed with the 14 rated numbers. When every class has been traversed, a real EPL of this class is available as well. Once a real cost of exploring the class has been acquired, an update to s* happens immediately. Furthermore, s* grows gradually as we explore these classes one by one.

. . .

g g*

s

s*

s* grows gradually during the search to this subtree.

Figure 10. A situation that depicts the exploring process

When an update happens, s competes with the up-to-date s* at the same time. The exploring process of g* stops if s* is equal to or larger than s. Otherwise, it keeps on

working until the subtree formed by g* is searched entirely. And the follow-up actions are performed with the use of DBB as usual. A situation that depicts the searching process is shown in Figure 10. Meanwhile, the bold lines and shaded areas highlight whatever has already been searched and s* is the latest lower bound until now.

3.2.2.2 Technique 2: Earlier Terminations

It is trivial that the game is over if there exists only one choice for the codebreaker and he has just figured it out. It is also clear that the searching process should be terminated if we are aware of the external path length (EPL) of some states precisely. Accordingly, a critical issue for obtaining the exact EPL of some states has arisen. It is highly difficult to know the exact EPL without conducting a search when the state is larger. In this case, there is a chance to get it more early only if the state is smaller enough. In order to cope with this, two types of pruning methods are proposed to achieve the goal of earlier terminations if the size of a state is below 12.

„ Theoretical pruning

If the size of a state is 2, it is easy to notice that the game tree in Figure 11 is optimal and its EPL is therefore 3.

[4, 0]

[4, 0]

Size = 2

Size = 1

Figure 11. An optimal strategy for a state with a size of 2

On the other hand, the size of a state is 3 is then taken into account. We notice that two situations occur. One is that the tactic for this state chooses one of these three eligible codes as the next query. This will result in a [4, 0]

class appearing in its game tree. The left portion, i.e. (a) and (b), of Figure 12, in which there exist two kinds of possible trees, indicates the phenomenon. The other situation is also offered in the right portion, i.e. (c), (d), and (e), of Figure 12, where there are three possibilities in addition. The right part implies that the codebreaker chooses one query from all possible codes except the three ones in this state. Note that the scenario of (e) describes that the size of the state still remains 3 after the query in this ply is taken, where EPL' is the external path length of the following state. In other words, there is no use making this query but to increase its EPL by 3 in addition.

EPL = 7 Size = 3 EPL = 6

Size = 3

EPL = 6 EPL = 5 [4, 0]

Size = 3

[4, 0]

EPL = EPL' + 3

...

Size = 3

Size = 3 [4, 0]

[4, 0]

Size = 3

Size = 2 (a)

(b)

(c)

(d)

(e)

Figure 12. All possible game trees for a state with a size of 3

By perceiving the overall figure, the EPLs for the left trees are 5 and 6 respectively while those of the right ones are 6 and above. It means that an optimal strategy will be generated only by taking the left two trees into account. In this case, the correlations among the three eligible codes should be considered. Assume that the three queries (secret codes) are named as g1,

g

2, and g3, where their correlations are r12, r23, and r31 respectively. The correlation here indicates the response made by the codemaker providing that one of these three queries is his secret code when the codebreaker takes another query from the two residual codes. Note that the optimal EPL for a state with a size of 3 is 6 if r12, r23, and r31 are all equal, i.e. the situation of Figure 12(b). Otherwise, the optimal EPL must be 5 as shown in Figure 12(a). With this observation, the optimal EPL can be easily calculated without searching all the 5040 valid queries.

From the above theoretical analyses, we know that the EPL of a state can be easily determined if the state is able to be analyzed. In other words, theoretical pruning of valid queries is feasible if the size of a state is 2 or 3.

„ Practical pruning

In accordance with the previous analyses, it seems to be intuitive that DBB will terminate and backtrack earlier if the optimal EPL can be decided as earlier as possible. Furthermore, a crucial property is realized by investigating the game tree when the size is sufficiently small. It reveals that the full game tree is filled with duplicated states with smaller sizes.

This discovery comes up with a good idea which is able to reduce the searching time by storing the EPLs of explored states, whose size are between 4 and 12. By utilizing the concept, a hash table is implemented naturally to meet the requirement. The Zobrist hashing approach [74] is

adopted as a hash function and a simple replacement method, in which a new record just replaces the value that is already in the corresponding slot, is employed to resolve collisions. Due to the low collisions of the Zobrist hashing method, the simple replacement policy is highly efficient for our problem. Before the use of the Zobrist hashing method, a random number is generated for each possible secret code and represents this corresponding code in the searching process. Suppose that we now have a state with n eligible codes, where the value of n is between 4 and 12. All corresponding random numbers of the n codes are XORed together and the result modulo the size of the hash table is computed to acquire a hash key, which represents the corresponding position for storing this state in the hash table.

So, the information of the state and its corresponding optimal EPL are stored in this position after the state has been explored. Once a collision occurs, the new record just replaces the old one that is already in the corresponding position. The EPLs are going to be looked up in the hash table when new states are encountered. Although the hash table is designed in a basic manner, it has contributed substantial performance improvements.

The experimental results will clarify the progress in the later discussions.

Note that the hash table occupies about 1.6 Gbytes memory because it has 225 entries and each entry contains 13 integers (one for storing the EPL and 12 for keeping the 12 secret codes at most). From an informal test, the performance is better if an entry stores the state whose size is at most 12.

Remember that the larger the state which is stored in an entry of the hash table is, the more time our program should take if the program has to decide whether the current state is traversed or not.

Due to the huge number of codes in larger states and the huge amount of memory

space for storing the larger states and their EPLs, the states whose sizes go above and beyond 12 will not be held in the hash table. This implies that a normal search is carried out to them.

space for storing the larger states and their EPLs, the states whose sizes go above and beyond 12 will not be held in the hash table. This implies that a normal search is carried out to them.