Implement normalize function for BD-Complexity Calculate Model

Section 4.1.6: Result of Phase 5: Complexity Sample Mean

From Figure 18, we can see it validates our Complexity Property 1: when solved step increase, puzzle‘s complexity should increase.

Next, in order to show more clear evidence about Complexity Property 2, Figure 19 is the average complexity over all steps.

Figure 18 Average complexity of each step

Figure 19 Result of Average all solved step complexity.

From result, we can see cross_out_4 is most complexity one, it is because Property 1 is true, therefore the result above is affected.

From the Figure 20, we can see not all cross_out game can reach solved step 20, therefore we decrease solved step to step 16 and step 11, and then we can see our Complexity Property 2 is proved: When cross_out is about half of game board, then it complexity should be highest. In this experiment is cross_out_5 (10 / 2 = 5). Refer to Figure 21 and Figure 22.

Figure 20 More Detail of complexity mean in puzzle game space.

Figure 21 Average complexity before step 16.

50 Figure 22 Average complexity before step 11.

Section 4.1.7: Conclusion

Figure 23 shows the result of puzzle‘s complexity generated in phase 1 that classify into five groups of basic difficulty. Although boundaries between groups were not validated, the results are shown here as a convenient way to illustrate the

distribution of puzzles in terms of difficulty level.

The results indicate that approximately one-half of the puzzle levels generated by the program could be classified as very easy. According to complexity theory, when a system goes beyond a ―complexity barrier‖, a behavior pattern will be emergent. In puzzle games, this pattern is represented by the numbers of branches and dead ends, which increase exponentially. In Crossblock, the boundary value between periodic and complexity system is approximately 0.125, which occupy about half of puzzle in puzzle database, when value beyond it and goes higher, then branch and dead ends will increase dramatically more and more. Figure 24 shows the average complexity of each difficulty level that supports our observation. Why? Try to consider following facts:1.Complexity interval between very easy and easy is 0.18 - 0.053=0.127;

2.Between normal and hard is 0.37 – 0.18 = 0.19; 3.between normal and hard is 0.59 – 0.37 = 0.22; 4.between hard and very hard is 0.8 – 0.59 = 0.21.

As shown in first three, their complexity interval is gradually increased that means it must beyond a ―complexity barrier‖, and when complexity level is ―very hard‖, we know system almost reach ―chaotic level‖ which must have highest

complexity value and will gradually decrease it complexity, that why interval between hard and very hard stop to increase.

Total Puzzle Amount: 15155

Figure 23 Ratio of basic difficulty in puzzle database.

This experiment is about Puzzle Game Space not about correct rate of puzzle levels sorting. Although I had proved both Complexity Property in this section for

“Cross Block”, but we still need further result to show that the method proposed in

this research is practical for real puzzle game sorting problem. In next experiment I will validate the correct rate between Complexity and Difficulty.

Figure 24 Complexity Average of each Crossblock difficulty level

Section 4.2: Experiment Two

Section 4.2.1: Phase 1: Select Puzzle Levels

In this experiment, I want to test if human can really tell the difficulty if all puzzle levels have close complexity. Therefore I select 10 puzzles that all complexity in easy group and fixed those puzzles when release to player. In this research, we have 17 human evaluation data.

Section 4.2.2: Result of Phase 3: Average difficulty and Sorting

Table 2 is the result of puzzle‘s complexity and difficulty in this Test Experiment.

We can see the difference between Small Base and Large Base more clearly, that max value in database will affect our normalize function, all puzzle‘s complexity in Experiment Test that compute by large base are in very easy and easy group. Because what we want to know is their sorting correct rate, therefore sorting those puzzles according to the value in table above, we can get the rank for each puzzle. Like Figure 25:

Table 2 Complexity and Difficulty result.

Actually, we get different sorting rank for small base, large base and human, it is not convenience for us to compare the result by figure. Therefore, we must design a method that can tell the sorting similarity rate between each rank list.

Section 4.2.3: Implement: Calculate Sorting Similarity

Here is my implement method for sorting similarity:

 Set if we have two sorted puzzle lists: listA and listB, all puzzles in lists are same but sorted by different method.

 set listA is sorted by complexity

 set listB is sorted by difficulty(human or static difficult)

 If puzzle‘s rank in two lists is same, then similarity add 1

 If puzzle‘s rank in two lists is different, then similarity add (1 – different of two list) / list size

 Finally, before return the value, divided it by rank list size in order to normalize result.

Code 10 is actual implement code for sorting similarity:

Figure 25 Complexity and Difficulty rank result

55 Code 10 Implement for sorting similarity.

Section 4.2.4: Result of Phase 5: Sorting Similarity

We compare two rates for two ranked lists in Table 3. First is the percentage of match, it means number of same rank in both. Second is Similarity, it means how similar sorted of two lists.

We can see small base and large base actually have different rank because max value in database will affect normalize function. Furthermore, compare to small base, large base has higher sorting similarity rate between human. Finally, we compare each people‘s sorting similarity, their sorting similarity only reaches 68%, it seem

surprising that every people have different feeling about difficulty when puzzle have near complexity levels.

Section 4.2.5: Conclusion

In this experiment, we see when puzzle have near complexity, then people tends to have different rank because of different skill they have. Therefore, I think the ability that can classify a puzzle into basic difficulty is more important than tell their actual degree.

Table 3 Result of match and sorting similarity.

Section 4.3: Experiment Three

In this experiment, I want to validate the correct rate of complexity sorting by using Sudoku that had been classified by other method. You can find the puzzle sample we used in TSA(W. Kuang-Chen (巫光楨), 2008).

Section 4.3.1: Phase 1: Select Puzzle in Each Rank

Like Figure 26, every puzzle in TSA is marked with a difficulty level. Number of

―★‖ of a puzzle indicates difficult rank calculated by TSA, they classify all Sudoku into 5 ranks.

Every puzzle is marked with a difficulty level. Number of ★indicates how difficult it is, upmost to five star. Meaning in each column: puzzle id, puzzle, number of challenge, number of success solved, solved rate, average time, newest record, fastest record, start challenge the puzzle.

The method used by TSA to measure difficult of a ―Sudoku‖ is to evaluate number of solve technique that a puzzle solving program require. The more difficulty technique a puzzle required, and then the puzzle is more difficult. But, because we don‘t know whether the difficult level that marked by TSA is really correct or not, therefore when choice the puzzle from it, we must take care of this issue. Fortunately, TSA also provide solved rate in the column five for each puzzle, therefore we can choice the puzzle based on this value that will reflect their difficulty more correctly. In

Figure 26 Sudoku Puzzles provides in TSA.

this experiment, we select 100 Sudoku puzzle for each difficult level. (5 * 100 = 500 puzzles)

Section 4.3.2: Result Phase 3: Calculate Branch and Dead Ends

Before calculate complexity for each puzzle, we must decide parameter B and D.

By observe result in Figure 27 and Figure 28, we know branch is positive relation and dead ends is somehow negative relation (normal and hard are not) when difficulty increase, therefore, we set B as 1 and D as -1.

Section 4.3.3: Result Phase 4: Compute Complexity

Figure 27 Average branch for each difficulty levels.

Figure 28 Average dead ends for each difficulty levels.

By using complexity calculate model describe in chapter 3, we get the result in Figure 29:

We can see complexity is increase according to difficulty level. Therefore, our method is successful to approximate difficulty of puzzle at minimum requirement.

How about overall success for each puzzle? Let examine more detail about complexity we calculate in Figure 30:

Figure 29 Average Degree of Complexity for Each Difficulty Level

It just put every puzzle into a rank from left to right in Figure 31, and we can see this method is weak on those puzzles have both high or low branch and dead ends which means our complexity calculation will become too high or too low. Another problem may be the puzzle in normal and hard, we can‘t classify the puzzle in these two groups clearly──I think both of problems is caused by the property of our

method. Because we simply combine branch and dead ends as a polynomial, therefore the method used to calculate branch and dead ends will affect result very large. In this experiment, we only introduce a heuristic that simply skip ―unique method‖ step, which every novice player will know this technique, when we doing calculation. In order to get more concrete result, we may need to figure out more concrete heuristic when calculate branch and dead ends.

Figure 30 Puzzle Samples Sorted by Complexity

Figure 31 Rank of complexity Sorting for each puzzle samples.

Section 4.3.4: Result Phase 5: Compare Rank Result

Because there five marked level difficulty in our puzzle database, therefore we can randomly select one sample from each difficulty level (total 5puzzles) as listA, sort it by our complexity as listB, and then we can compute sorting similarity between these two lists. Select process like Figure 32:

By repeat large enough iteration of this comparing process, then we can validate the correct of our method. Figure 33 is the similarity result that iterates over 50000 times:

Figure 32 Process of select sample from puzzle database as sorting list.

Average: 0.8

Figure 33 Result of Sorting Similarity

It shows that our sorting looks quite good on most of case, but there still have space for improve. I will try to adjust parameter B and D by machine learning to find out best result of complexity sorting.

Section 4.3.5: Conclusion

This experiment shows the ability of the method we propose can calculate different type of puzzle games. But because different puzzle have different emergent phenomena on their branch and dead ends, therefore sorting correctness will

dependent on play feature of different game. By separate all pure puzzle game as following three types: Movement type puzzle like “Sokoban”, Elimination type puzzle like “Cross Block” and Fill Out type puzzle like “Sudoku”.I think most suitable puzzle game for apply the method we propose is Elimination and Movement type. Because possibility of action that player can operate is too large, that generate more exception than other two types of puzzle.

In appendix, I collect more puzzle games according to this classification.

Although complexity measure for Fill Out type puzzles in this “Sudoku” experiment doesn‘t perform as good as previous “Cross Block” experiment, but I think it is good enough for real application.

Section 4.4: Experiment Four

In this experiment, we use simulated annealing to adjust our parameter B and D in order to get more correct complexity evaluation for experiment three.

Section 4.4.1: Phase 1: Select Training Sample

Because simulated annealing is a machine learning technique, therefore, we need training sample before beginning our tuning program. Figure 34 is our training samples select process: we randomly make 1000 training sample from puzzle database

Section 4.4.2: Implement Phase 2: Parameter Tweak

Because our purpose is to improve sorting similarity, therefore we can implement our energy method for simulated annealing as Code 11:

Because the concept of simulated annealing is to reduce energy (or error, cost) when repeat training iteration, therefore we minus 1 before returning the result.

Figure 34 Training sample select process.

Code 11 Implement for energy function in Simulated Annealing.

Section 4.4.3: Result of Phase 2: Parameter Tweak

Figure 35 is the result of training process, our adjustion is successfully converge error (1 – similarity) to 0.13.

Figure 36 shows the parameter that are adjusted over iteration in this training iteration:

Finally, we get B = 18.1952 and D = 2.02334 is one of state that has lowest error.

The result may be changed when we start another training iteration.

Section 4.4.4: Result of Phase 3: Calculate New Complexity

Figure 37 is the result of average complexity for each difficulty levels, we can

Figure 35 Error and iteration of simulated annealing.

18.1952

2.02334

Figure 36 Result of parameter Band D adjusts over 1500 iteration.

see their value is more close between each level compare to the result in experiment three:

But it is actually improved it result, especially for those low complexity puzzle in each level. Figure 38 and Figure 39 shows detailed sorting result:

Figure 37 Average complexity for each difficulty level after parameter tweak.

Figure 38 Complexity of each puzzle sample after parameter tweak.

Figure 39 Rank of complexity sorting after parameter tweak.

Section 4.4.5: Result of Phase 4: Compare Rank Result

Figure 40 shows average sorting similarity is improved from 0.8 to 0.86.

Section 4.4.6: Conclusion

From the result, we can see although our method is quite simple, but it is a general method that can be used to measure difficult for different puzzle. Although there still have some error, but I think if we can figure out complexity measure heuristic for each different puzzle game, then it sorting correct rate will be improved.

Average: 0.86

Figure 40 Sorting Similarity after training.

Chapter 5: Conclusion

Section 5.1: Complexity Sorting and Difficulty Mapping

Determining game difficulty is a challenging issue requiring detailed understanding of game parameters. For puzzle games, Scott Kim has identified

branches and dead ends as universal puzzle components; in this project we tried to use the two features to measure puzzle complexity. According to our experiment results, the proposed method holds potential as an efficient method for mapping complexity to static difficulty. We used simulated annealing to identify optimal parameters, but our final sorting similarity data still suffered from a 14% error rate. Since different puzzles have different emergent phenomena on their branches and dead ends, correct sorting depends on play features that differ across different games. To achieve more accurate results using our proposed method, it is therefore necessary to use

game-specific features when calculating numbers of branches and dead ends in order to improve the fit between our process and behavior patterns (e.g., the ability to quickly filter out bad choices and dead ends).

For example, in Sudoku, there exist some solving techniques to help us solve the problem, like Last Digit, Hidden Single in Box…etc.,. In order apply those technique into our complexity calculate process, it is necessary to find out their emergent phenomena on branch and dead ends that can help us to identify which node we need to expand or count. We believe, more difficult technique a puzzle has, means higher complexity value it will.

However, the use of game-specific features contradicts our goal of creating a method that can be used for all puzzle games. Therefore our plans include designing a more sophisticated complexity calculation model that considers a wider range of search tree behavior features—for example, backtracking rates (indicating incorrect choices) or number of cycled nodes.

Section 5.2: Measuring Digital Game Complexity

Does our proposed method can apply to other games? Generally speaking, our proposed model can always apply to any kind of task──if we formulate target problem as search tree form, and then branch and dead ends can be calculated to measure complexity of the task. But, there may cause some problems when we want

to map complexity to difficulty, because there have much games require player many different kind of skill that will diverse subjective feeling about difficulty. For example, Tetris may require player eye-hand coordination, but not all people can follow the speed of falling object; and boggle will require player English ability, player who familiar with English will have obvious advantage.

Therefore, our complexity measuring result will limited to certain high skill player group and meaningless to others. Because for those players that without certain skill or knowledge can‘t even start play the games. Furthermore, for those medium skill players, game specific skill and knowledge will always be the source of difficulty.

Because different player will have different skill, thus, diverse feelings about

difficulty trouble us from map complexity to static difficulty. Therefore, we must try to find out a method to combine complexity and game-specific feature first. For example, in Tetris, how do we measure the challenge of falling object‘s speed? And, how do we combine challenge with complexity into a formula? But, as we discuss before, it will break generality of our model.

Appendix A: Puzzles in Experiments

B.1: Cross Block

Cross Block is kind of pure puzzle that invented by DJ Trousdale(DJ Trousdale,

2009), where it goal is to clear all square on game board by drawing vertical or horizon line.

Example of Cross Block, each line must equal to specific cross out number. (a) Cross out 2 squares at one time, it requires 2 steps to solve. (b) Cross out 7 squares at one time, it requires 8 steps to solve.

Generally, we can simply increase difficulty for this puzzle, by putting more squares into game board. Like example in Figure 1.5, when solved step increase, then it difficulty also increase. Although there exists some exception, but we don‘t discuss about the detail here. I will show overall puzzle game space results in chapter 4.1 for

Cross Block. Next, let‘s return to our problem: How to measure the difficult for a

puzzle?

(a) (b)

B.2: Sudoku

Another puzzle I will use in my experiment is “Sudoku” that is a very famous puzzle.

The goal of ―Sudoku‖ is to fill all square with a number 1 ~ 9, but constrain with following rule: 1. the number in each row and column can‘t repeat. 2. The number in each 3*3 box region can‘t repeat. For example in above figure, here have 9 box regions that marked with yellow and white color.

Example of Sudoku Puzzle

Appendix B: Collection of Pure Puzzle

In this appendix, I simply collect some puzzle from internet according to following classification: movement type, fill out type, elimination type.

B.1: Elimination Type

Marble Solitaire Minim

NingPo Mahjong

B.2: Movement Type

Exorbis 2 Flashmaz

Mummy Maze Open Doors 2

Telescope Rush Hour

Sliding Puzzle Sokoban

B.3: Fill Out Type

3D Logic 2 Cross word

Appendix C: More Result of Experiment One

Appendix D: Calculate Branch and DeadEnds

Appendix E: Game Data Format

Following data are format example I store that used in my experiment.

E.1: Cross Block

[id]

4782 [cross out]

2 [sizex]

10 [sizey]

[game state]

Implement Branch and Dead Ends calculate function for Cross Block.

Appendix F: Puzzles in Experiment Two

Following puzzle had used in experiment two:

Reference

Adamatzky. Andrew. (2010). Game of Life Cellular Automata: Springer-Verlag New York Inc.

Amanita Design (Producer). (2009). machinarium. Retrieved from

http://machinarium.net/demo/

Apple. App Store Retrieved Jun 23, 2011, from

http://app-store.appspot.com/?url=viewGrouping%3Fmt%3D8%26id%3 D25204%26ign-mscache%3D1

B. R. Clarke. (1994). Puzzles for Pleasure. Cambridge, England: Cambridge University Press

Ben Weber. (2010). Infinite Mario with dynamic difficulty adjustment Retrieved April, 2011, from

http://users.soe.ucsc.edu/~bweber/dokuwiki/doku.php?id=infinite_ada ptive_mario, http://www.youtube.com/watch?v=kYbKNAmZ1z4

Bernard suits. (2005). The Grasshopper: Games, Life and Utopia: Broadview Press.

C. Crawford. (1984). Art of Computer Game Design. New York:

McGraw-Hill/Osborne Media.

C. D. Güss, E. Glencross, Ma. T. Tuason, L. Summerlin, & F. D. Richard. (2004).

Task Complexity and Difficulty in Two Computer-Simulated

Problems:Cross-cultural Similarities and Differences. Paper presented at

the Proc. 26th Annual Conf. Cognitive Science Society, Mahwah.

C. E. Shannon. (1948). A Mathematical Theory of Communication. Bell System

Technical Journal, 27, 379-423, 623-656.

C. Pedersen, J. Togelius, & G. N. Yannakakis. (2010). Modeling Player Experience for Content Creation. IEEE Trans. Computational Intelligence and AI in

Games, 2(1), 54-67.

Christopher G. Langton (Ed.). (1995). Artificial Life: An Overview: Cambridge:

MIT Press.

DJ Trousdale (Producer). (2009). Cross block. Retrieved from

http://djtrousdale.com/games/crossblock/

Elina M.I. Koivisto. (2006). Mobile Games 2010. Paper presented at the CyberGames '06: Proceedings of the 2006 international conference on Game research and development

Erich Gamma, Richard Helm , Ralph Johnson, & John M. Vlissides. (1994). Design

Patterns: Elements of Reusable Object-Oriented Software

Addison-Wesley.

Gerald M. Edelman, & Joseph A. Gally. (2001). Degeneracy and complexity in

biological systems. Paper presented at the Proceedings of the National

Academy of Sciences of the United States of America.

H. Robin, & C. Vernell. (2004). AI for dynamic difficulty adjustment in games.

Paper presented at the Proc. of the Challenges in Game AI Workshop, Nineteenth National Conf. on Artificial Intelligence, San Jose.

Holling, C. S. (2001). Understanding the Complexity of Economic, Ecological, and Social Systems. Ecosystems, 4(5), 390-405. doi:

10.1007/s10021-001-0101-5

J. Kim. (2005). Task Difficulty in Information Searching Behavior: Expected

Difficulty and Experienced Difficulty. Paper presented at the Proc. 5th

ACM/IEEE-CS Joint Conf., New York.

James Paul Gee. (2005). Why Video Games Are Good for Your Soul: Common Ground

Jeremy Campbell. (1982). Grammatical Man: Information, Entropy, Language,

and Life New York: Simon & Schuster.

Johan Huizinga. (1954). Homo Ludens—Study of the play-element in culture.

John H. Holland. (1999). Emergence: From Chaos to Order: Basic Books Kai-Ju Chen, & Kou-Yuan Huang. (2007). Simulated Annealing for Pattern

在文檔中益智遊戲難度與複雜性之衡量 (頁 58-0)