• 沒有找到結果。

應用虛擬題庫理論-電腦化方塊計數測驗之實作

N/A
N/A
Protected

Academic year: 2021

Share "應用虛擬題庫理論-電腦化方塊計數測驗之實作"

Copied!
126
0
0

加載中.... (立即查看全文)

全文

(1)國立臺灣師範大學資訊教育研究所 博士論文. 指導教授: 指導教授:何榮桂博士. 應用虛擬題庫理論-電腦化方塊 應用虛擬題庫理論 電腦化方塊計數 電腦化方塊計數測驗之實作 計數測驗之實作 Designing a Computer Cube Enumeration Test System based on Virtual Item Bank Theory. 研 究 生:廖 文 偉. 中 華 民 國 一 百 零 二 年 一 月.

(2) 致謝辭 本論文能順利完成,除了感謝指導教授-何榮桂博士的悉心指導。 也要感謝口試委員 簡茂發教授、郭靜姿教授、劉長萱教授及鄭海蓮 教授在口試時給予我非常多的建議。另外我也要感謝我的父母及家 人,在我讀書的這幾年給我很多的鼓勵及幫助。也感謝我的好同學 顏永進,每天都跟我討論到深夜。太多人要感謝,無法一一列出, 但是我真心感謝你們。僅以本論文獻給您們,謝謝你們。.

(3) 應用虛擬題庫理論應用虛擬題庫理論-電腦化方塊計數測驗之實作 摘要 立方體(方塊)為體積的基本單位,因此常用來介紹體積概念。在測驗方面, 方塊計數測驗,也常被用來測量或促進立體幾何物件的心像與操弄能力。方塊 計數測驗通常隱密且不公開,一般人對其的瞭解也侷限於坊間書局所出版的練 習題本。不公開之主要原因在於維持測驗的安全。本研究結合古典測驗理論 (CTT)、電腦化測驗(CBT)、自動化試題產生理論(AIG)、虛擬題庫理論(VIB)及 方塊計數測驗等相關理論,建立方塊計數測驗之虛擬題庫及方塊計數學習系統。 方塊計數之虛擬題庫並無實際題目存在,取而代之的是物件及題目產生之方法。 而在測驗當中,測驗受試者之題目,將由物件配合測驗理論直接產生,故題庫 不會有試題曝光率之問題。本研究之受試者為國小六年級學生共267名學生,本 研究最後發現:. 1. 題目可翻轉會導致試題難度降低,但可翻轉試題有助於空間能力之教學。 2. 方塊完整度及方塊隱藏個數皆影響試題難度。 3. 測驗的結果顯示男女學童在前青春期的空間視覺能力上並無顯著差異。 4. 電腦化虛擬題庫方塊計數測驗亦可測量受試者之空間視覺能力。 5. 應用方塊計數學習平台進行學習也有助於提升受試者空間能力。 關鍵字:AIG、CBT、CTT、VIB、空間能力、方塊計數 關鍵字. i.

(4) Designing a Computerized Cube Enumeration Test System Based on Virtual Item Bank Theory Abstract Cube is the basic unit of volume and hence it is often used to introduce its concepts. For tests, cube enumeration is always used to measure or promote mental images and manipulation ability of 3-D geometric objects. Cube enumeration tests usually are not open for the public whose understanding is only limited to exercise books found at general bookstores. The main reason for this limitation is to maintain the test security. This study combined Classical Test Theory (CTT), Computer Based Test (CBT), Automatic Item Generation (AIG), Virtual Item Bank Theory (VIB) and theories related to cube enumeration to construct a VIB-based cube enumeration test and a learning system. The item bank of the VIB-based cube enumeration test was a virtual one. The VIB contained only basic element – cube. With these cubes, items would be directly generated and therefore there wouldn’t be item exposure problem for the item bank. 267 sixth graders of New Taipei City elementary school participated in this experiment. Their mid-term math grades were used as the external criterion in the test. This study found out that: 1. Both the number of invisible cubes and the integrity of cubes were positively correlated to item difficulty. 2. In addition to the number of invisible cubes and the integrity of cubes, whether test items could be rotated also influenced item difficulty. 3. Based on the results of both tests, there is no significant difference in spatial visualization ability between male and female preadolescence students. 4. The scores of the computerized cube enumeration test also indicated that this system could measure their spatial visualization ability. 5. Applying the cube enumeration learning system did improve users' spatial ability. Key Words:AIG, CTT, VIB, Spatial Ability, Cube enumeration. ii.

(5) Table of Contents Chapter 1.. Introduction ..........................................................................................1. 1.1.. Background and Motivation .............................................................................1. 1.2.. Purposes ..........................................................................................................3. 1.3.. Research Questions ..........................................................................................3. Chapter 2. 2.1.. Literature Review .................................................................................5. Classical Test Theory (CTT) ............................................................................5. 2.1.1. Item Difficulty .................................................................................................6 2.1.2. Item Discrimination .........................................................................................6 2.2.. Item Response Theory......................................................................................8. 2.3.. Computer-Based Testing ................................................................................13. 2.4.. Automatic Item-Generation (AIG) .................................................................15. 2.5.. Virtual Item Bank Theory...............................................................................21. 2.6.. Cube enumeration ..........................................................................................32. Chapter 3.. Methods .............................................................................................39. 3.1.. Definition.......................................................................................................39. 3.2.. Experimental Design ......................................................................................46. 3.3.. Procedures .....................................................................................................48. 3.4.. Subjects .........................................................................................................49. 3.5.. Research Hypotheses .....................................................................................51. 3.6.. Research Tools ...............................................................................................53. Chapter 4.. Results ...............................................................................................85. 4.1.. Results of stage 1: Constructing CAT and analyzing the results ......................85. 4.2.. Results of stage 2: Constructing CBT and analyzing the results ......................86. iii.

(6) 4.3. Results of stage 3: The performance of learning system .....................................90 4.4.. Analyzing the security of the VIB ..................................................................92. 4.5.. Discussion .....................................................................................................94. Chapter 5.. Conclusion and Suggestion .................................................................97. 5.1.. Conclusion .....................................................................................................97. 5.2.. Suggestion .....................................................................................................98. 5.3.. Limitation ......................................................................................................99. References ............................................................................................................... 101 Appendix A: Midterm math quiz ........................................................................... 109 Appendix B: Items of invisible cube enumeration test ........................................... 111 Appendix C: Items of cube enumeration test ......................................................... 113. iv.

(7) List of Tables Table 3-1 The distribution of the subjects..................................................................50 Table 3-2 Distribution of the subjects.........................................................................51 Table 3-3 Descriptive statistics of the midterm math quiz (N = 267). .........................53 Table 3-4 Examinees’ answering status of the pilot-test (Number of items =25, N=30) ..........................................................................................................................55 Table 3-5 The descriptive statistics of item difficulty and discrimination of pilot test (Number of items =25, N=30) ............................................................................56 Table 3-6 The descriptive statistics of item difficulty and discrimination of the invisible cube enumeration test (number of items = 20, N=30. .........................................56 Table 3-7 The descriptive statistics of item difficulty and discrimination of the rotatable invisible cube enumeration test (number of items = 10, N=30. .............56 Table 3-8 The contents of invisible cube enumeration test..........................................58 Table 3-9 Statistics properties of test items of the invisible cube enumeration test (number of items = 20, N = 267. ........................................................................60 Table 3-10 The examinees’ answering status of the invisible cube enumeration test ...61 Table 3-11 The descriptive statistics of item difficulty and discrimination of the invisible cube enumeration test (number of items = 10, N = 267) .......................62 Table 3-12 Item difficulty and discrimination of the cube enumeration test (N=30)....63 Table 3-13 The descriptive statistics of item difficulty and discrimination of the cube enumeration test (number of items = 36, N=30) .................................................64 Table 3-14 The descriptive statistics of item difficulty and discrimination of the cube enumeration test (number of items = 30, N=30...................................................65 Table 3-15 The contents of cube enumeration tes. ......................................................65. v.

(8) Table 3-16 The overall 30 items’ difficulty parameters of Rasch model ......................68 Table 3-17 Statistical properties of the test items of the cube enumeration test (number of items = 30, N = 267) ......................................................................................69 Table 3-18 The CTT item difficulty and discrimination of the cube enumeration test 70 Table 3-19 Statistical properties of the test items of the cube enumeration test (number of items = 30, N = 267). .....................................................................................71 Table 3-20 Mean item difficulty and discrimination of groups of invisible cubes numbers of the cube enumeration test ................................................................72 Table 3-21 Average item difficulty and discrimination of groups within different integrity of cubes in the cube enumeration test ...................................................73 Table 3-22 Descriptive statistics of male and female examinees and correct item of the invisible cube enumeration test ..........................................................................75 Table 3-23 t-test results of correct item of males and females in the invisible cube enumeration test .................................................................................................75 Table 4-1 Pearson’s product-moment correlation coefficient of independent and dependent variables of the regression of item difficulty ......................................88 Table 4-2 Statistic of item difficulty by a regression analysis using Enter method ......88 Table 4-3 Parameter test of item difficulty regression using Enter method.................89 Table 4-4 Descriptive statistics of the cube enumeration test score and midterm math score (N = 267) ..................................................................................................89 Table 4-5 Correlation coefficient of the cube enumeration test score and midterm math score ..................................................................................................................90 Table 4-6 Summary of ANCOVA of adjusted posttest score (N=60) ..........................92 Table 4-7 Item exposure rate of each rule ..................................................................93. vi.

(9) Table 4-8 Correlation coefficient of the Computer adaptive cube enumeration test score and midterm math score (N=29)................................................................94. vii.

(10) List of Figures Figure 2-1. Example of 1PL ICC with b = 0 ...............................................................10 Figure 2-2. Two examples of 2PL ICC with b = 0 while a=0.5 and 1.5 respectively ... 11 Figure 2-3. A typical ICC for the 3PL model with b = 0, a = 1, and c = 0.1 ...............12 Figure 2-5. An example item of APM (Liu et al., 2001) .............................................20 Figure 2-6. Flow chart of VIB....................................................................................24 Figure 2-7. Item generation rules without binary operations.......................................25 Figure 2-8. Deciding the location of normal figural objects........................................28 Figure 2-9. Deciding on the image processing rules ...................................................28 Figure 2-10. Choosing the next figural objects and store them into the VIB. ..............29 Figure 2-11. CAT system ...........................................................................................30 Figure 3-1. Illustration of item rotation ......................................................................40 Figure 3-2. An example item that the total cube number is 8 ......................................41 Figure 3-3. Two-variable Karnaugh map ....................................................................42 Figure 3-4. Three-variable Karnaugh map..................................................................43 Figure 3-5. Four-variable Karnaugh map ...................................................................43 Figure 3-6. An item is divided into 3 layers by horizontal method. .............................44 Figure 3-7. Calculate the integrity of cubes. ...............................................................44 Figure 3-8. An item is divided into 3 layers by vertical method. .................................45 Figure 3-9. Calculate the integrity of cubes. ...............................................................45 Figure 3-10. The research framework of this study ....................................................47 Figure 3-11. The flowchart of taking the invisible cube enumeration test ...................59. viii.

(11) Figure 3-12. Flowchart of taking the cube enumeration test .......................................67 Figure 3-13. The test characteristic curve ...................................................................69 Figure 3-14. The line chart of mean item difficulty and discrimination of groups of invisible cubes numbers of the cube enumeration test ........................................72 Figure 3-15. The line chart of mean item difficulty and discrimination of groups within different integrity of cubes in the cube enumeration test .....................................74 Figure 3-16. To use cube enumeration learning system help learning. ........................76 Figure 3-17. An item stack example by using cube enumeration learning system. ......77 Figure 3-18. Definition of spatial orientation. ............................................................78 Figure 3-19. The item definition file format. ..............................................................78 Figure 3-20. The item definition example. .................................................................79 Figure 3-21. An item with 24 cubes and 8 invisible cubes ..........................................80 Figure 3-22. The development procedure of VIB. ......................................................81. ix.

(12) Chapter 1. Introduction This study attempted to investigate factors that affect the item difficulty of cube enumeration test and to evaluate the performance of VIB-based cube computer based enumeration test (VIB-based CBT) and the learning system on improving student’s cube enumeration ability. In Chapter 1 of this thesis, the background, motivation and purpose of this research were introduced. Chapter 2 reviewed literatures on classical test theory (CTT), item response theory (IRT), virtual item bank (VIB), automatic item generation (AIG) and modern technologies used to generate items. Chapter 3 described a series of experiments designed to investigate the factors that affect the item difficulty of cube enumeration test, the item exposure rate of VIB-based CBT and the performance of the learning system. Chapter 4 showed the results of experiment. Chapter 5 attempted to answer and discuss proposed research questions and propose suggestions for further research.. 1.1. Background and Motivation Like Zhouyi is used to predict one’s fate in ancient China, IQ tests are used to evaluate one’s IQ, future direction and potential for development in modern days. The present study builds two cube enumeration tests and a VIB-based cube enumeration test to evaluate an examinee’s spatial ability and develop a learning system to enhance the spatial ability of students. Moreover, as there are different combinations of the eight trigrams in Zhouyi, the present study also combines different basic graphic elements for different tests, hoping to estimate an examinee’s spatial ability accurately with this item generation system.. Spatial ability is not only an essential part of everyday life but also an important skill for mechanists, architects, chemists, surgeons, surveyors, and cartographers. Nowadays, the mathematical programs in elementary and junior high schools implementing in the country or abroad list spatial ability as one of the geometry course. 1.

(13) materials. Researchers were interested in how to measure and improve spatial ability. Currently there are plenty of related tests that can be used to evaluate spatial ability. However, it is very difficult to develop and only very experienced test constructors can do this job. Moreover, the feasibility can only be evaluated with massive test administration. For the improvement of spatial ability, now there are lots of learning systems and software that we can use, but their effectiveness in improving spatial ability has to be further analyzed and verified.. According to IAEP (1992), the fifth and sixth graders used cube enumeration to learn the concepts of volume because spatial ability was an important concept of geometric. IAEP indicated that spatial ability was positively correlated with the concept of volume. Since cube was the basic unit of volume, cube enumeration tests are often used to assess one's mental ability and operational skills of 3-D geometric objects. Cube enumeration tests are usually kept confidential and exempt from public access. The main reason for this limitation is to keep the items bank private. Another important reason is that it’s highly difficult to develop cube enumeration test. To develop a cube enumeration test, there is a specific process to follow to ensure the consistency of test content and test purpose, and to accurately measure examinees’ ability. In other words, much manpower, time and budget were required to construct enough items to avoid the reduction of test quality due to item exposure.. As information rapidly evolves, how to apply information technology in research becomes a significant issue. This study tried to apply information technology in the evaluation and learning of spatial ability. By combining the VIB, CBT, and AIG, this study intended to establish a whole set of cube enumeration test. With the VIB system, the test would no longer consisted of any substantial forms of the items. Within this system, the item bank only consisted of cubes as the basic component and no longer encountered the risk of the item exposure due to the fact that the items would be dynamically generated by the basic components and the generate methods.. 2.

(14) 1.2. Purposes Based on the above-mentioned background, this study aimed to investigate related literature and develop a virtual item bank of computerized cube test. By using this item bank the researcher could further generate a system to measure spatial ability. The research purposes of this study were:. 1.. To analyze the influence of the allowance of item rotation on item difficulty.. 2.. To analyze the influence of the number of invisible cubes on item difficulty.. 3.. To analyze the influence of the integrity of cubes on item difficulty.. 4.. To develop difficulty formula.. 5.. To develop a VIB-based cube enumeration test and evaluate its performance.. 6.. To evaluate the item exposure rate of the VIB.. 7.. To investigate whether the examinees’ performance of the computerized cube enumeration test is related to their spatial ability.. 8.. To construct an online cube learning system and perform teaching by using the strategy of rotatable cubes.. 9.. To evaluate the examinees’ learning performance by using the online learning system.. 1.3. Research Questions According to the above research purposes, the following research questions would be answered in this study: 1.. Does the allowance of cube rotation influence item difficulty?. 2.. Does the integrity of cubes influence item difficulty? 3.

(15) 3.. Does the number of invisible cubes influence item difficulty?. 4.. What is the difficulty formula derived from the above data?. 5.. Can we calculate the examinees’ spatial ability by using the VIB-based cube enumeration test constructed and derived from the above data?. 6.. Is there a difference of the examinees’ spatial ability in gender calculated by the cube enumeration test?. 7.. Does this virtual item bank constructed by the above formula have a good item exposure rate?. 8.. Is the learning system constructed by using the rotation strategy effective?. 4.

(16) Chapter 2. Literature Review This chapter explored the literature that was relevant to understanding the CTT, IRT, CBT, AIG, cube enumeration test and the VIB in computer-based figural testing. The first two parts of this chapter gave an introduction to the main issues associated with the rationale of CTT and IRT. Next, the literature review on modern technologies for automatic item generation and CBT were provided. The last two part of this chapter was the issue of VIB and the cube enumeration test.. 2.1. Classical Test Theory (CTT) Classical test theory (CTT), the most common measurement theory at the last century, is a simple useful model in measurement theory that describes how errors in measurement can influence observed scores or measurements. An essential premise of CTT is that any observed score for person j on a test is the composite of a true score and a random error for that person on the test. It is usually represented by the following formula: Xj = Tj + E. Equation 2-1. where Xj is the observed score for person j on a test, Tj is the true score, and Ej is the error score (also known as “measurement error”). It is based on the assumption that an individual’s scores contain error that can only be decreased but never totally eliminated (Meadows & Billington, 2005). The theory assumes that the error is random and normally distributed.. In CTT, the total score of a test is considered the sum of scores on the individual items, and the individual item is of interest through its effect on the total test score (Lord & Novick, 1968). Thus, item analysis in CTT is focus on the degree to which each item influences the whole measurement. According to Hambleton and Jones. 5.

(17) (1993), “the major advantages of CTT are its relatively weak theoretical assumptions, which make CTT easy to apply in many testing situations.”. 2.1.1. Item Difficulty The CTT collectively considers a group of examinees and examines their success rate on an item. For dichotomously scored items, the success rate, expressed as the ratio of the number of examinees getting the item right to the total number of examinees, is known as the p-value of the item and is used as the index for the item difficulty. The p-value of an item is usually represented by the following formula: P=. R × 100% N. Equation 2-2. where P was the item difficulty index, N was the total number of the responses, and R denoted the number of correct responses. The range of proportion is always from .00 to 1.00. The higher the p-value, the less difficult is the item.. In general, for an item to discriminate well between examinees, the item difficulty should not be too high or too low. Extremely low values may indicate that the question is too difficult, written poorly, or has problems with item content. Questions with a high item difficulty index are avoided, as they may be too easy and not measure knowledge acquisition. The observed value of item difficulty is affected by both the examinees’ true score and the effect of their guessing.. 2.1.2. Item Discrimination Using the p-value, discrimination index (D) can be calculated for each dichotomous item. Item discrimination is an index of how effectively the item separates examinees who vary in their degree of knowledge tested and their ability to use it. The higher the D, the more the item discriminates. Items with p levels in the midrange usually have the best D values.. 6.

(18) There are three simple steps to calculate D (Kline, 2005). First, those who have the highest and lowest overall test scores are grouped into upper and lower groups. The upper group is made up of the 25%-33% who are the best performers, and the lower grouped is made up of the bottom 25%-33% who are the poorest. Step two is to examine each item and determine the p-value for the upper and lower groups, respectively. Step three is to subtract the p-value of the two groups. According to Kelly (1939) and Cureton (1957), the most appropriate percentage to use in creating these extreme groups is to use the top and bottom 27% of the distributions. Therefore, in this study the statistic for discrimination index was given by the formula D = PH − PL. Equation 2-3. where D was item discrimination index, PH was the percentage of examinees whose total scores ranked top 27%, and PL was the percentage of examinees whose total scores ranked bottom 27%.. In general, items are considered appropriate when they exhibit the proper difficulty and discrimination value in terms of the intended purpose of the test. Parameters that are suggested in psychometric literature are used. These “rules of thumb” can be adjusted up or down, according to circumstances. For this study, a good item is identified by an item difficulty index of 0.20 to 0.80 and discrimination index ≥ 0.30.. CTT is straightforward and easy to understand and apply in testing practice. Many useful models and methods have been formulated from the CTT framework and have addressed effectively important issues in measurement. The mathematical analyses required for CTT are usually simple compared to those required in IRT. Analyses do not require strict goodness–of–fit study to ensure the good fit of a model to actual test data. Unlike IRT models, CTT do not require large sample sizes for analyses (Hambleton & Jones, 1993).. 7.

(19) 2.2. Item Response Theory On the other hand, IRT is a model-based paradigm: it starts with modeling the relationship between latent variable being measured and the item responses. IRT is generally regarded as an improvement over CTT. The name item response theory is due to the focus of the theory on the item, as opposed to the test-level focus of CTT. Thus IRT models the response of each examinee of a given ability to each item in the test.. IRT model rests on two basic postulates (Hambleton, Rogers, & Swaminathan, 1995): (a) the performance of an examinee on a test item can be predicted or explained by a set of factors called abilities or latent traits, expressed mathematically by θ; and (b) the relationship between examinees’ item performance and the set of trait underlying item performance can be described by a monotonically increasing function called an item characteristic function (ICF). Though some research indicated that CTT methods and the IRT Rasch model are functioning almost identically (Lawson, 1991), most research has shown that IRT provides a more accurate and precise estimate when compared to CTT (Schulz, Kolen, & Nicewander, 1999; Pommerich, 2006; Schulz & Lee, 2002). The primary advantages of IRT over CTT are: (a) Examinees can be placed on the same scale, and (b) item parameter estimates obtained from different samples are equivalent within sampling fluctuation up to a linear transformation (Lord, 1980).. 2.2.1. IRT Assumptions To achieve useful characteristics described above, IRT models currently in use require stringent assumptions, including unidimensionality, local independence, and nonspeededness, which needed to be sustained before the model could be used to analyze the data (Ackeman, 1989; Hambleton & Swaminathan, 1985).. 8.

(20) 1. Unidimensionality: For IRT model, a unidimensional trait denoted by θ was measured. The trait is further assumed to be measurable on a scale, typically set to a standard scale with a mean of .00 and a standard deviation of 1.00. This assumption was very strict and hard to meet in reality since several cognitive, personalities, and test-taking factors could affect the test performance. Fortunately, research designed to assess the impact of violations of the unidimensionality assumption had suggested that the unidimensional IRT models were relatively robust with respect to moderate violations of strict unidimensionality, and that the most important issue concerned the relative degree to which the item pool was dominated by a single latent trait (Harvey & Hammer, 1999).. 2. Local independence: This assumption means that items are not related except for the fact that they measure the same trait. It meant that abilities specified in the model were the only factors influencing examinees’ response to test items.. 3. Nonspeededness: A special case of the preceding assumptions is nonspeededness. That is, examinees who fail to answer test items correctly do so because of limited ability and not because they failed to reach test items (Hambleton & Swaminathan, 1985). However, in most educational assessment settings, tests were designed to be power test, which meant that even given unlimited time, not every student would get a near perfect score. Since it was unrealistic to provide examinees with unlimited time in educational tests, the time limit still had an effect on examinees performance (Yen, 2010).. 2.2.2. One, two, and three parameter logistical IRT model According to the number of parameter describing the item, IRT model can be classified into three categories: one-parameter logistic (1PL), two-parameter logistic (2PL), and three-parameter logistic (3PL) IRT model.. 9.

(21) If one wishes to construct a measurement scale with a limited sample size, the 1PL model may be the most appropriate model to use (Lord, 1983). The 1PL, or Rasch (Lord & Novick, 1968) model was one of the simplest IRT models; as its name implies, it assumed that only a single item parameter was required to represent the item response process. In 1PL model, the probability that an examinee with ability θ could answer an item with difficulty b correctly could be mathematically expressed as P1PL (θ ) =. 1 , 1+ exp[−D(θ − b)]. Equation 2-4. P1PL (θ ) =. 1 , 1+ exp[−D(θ − b)]. Equation 2-5. where the difficulty parameter, b, indicated how difficult the item is, and D was a scaling factor (usually equal to 1.702) to make the logistic function as close as possible to the normal ogive function (Baker, 1992).Figure 2-1 illustrated an examples of ICC for 1PL IRT model. The x-axis represented the examinee’s latent ability scale and the y-axis represented the probability of answering correctly on an item by the examinee.. Figure 2-1. Example of 1PL ICC with b = 0. 10.

(22) One main potential drawback to the 1PL IRT model was its assumption that all items in the test share identically shaped ICCs, and this would be quite unusual in most applied assessment situations (Yen, 2010). A slightly more complex IRT model is called the 2PL model. According to the 2PL model, an examinee’s response to an item is determined by his/her ability (θ), the item difficulty (b), and the item discrimination (a). The mathematical form of the 2PL model could be written as (Lord, 1980). P2PL (θ ) =. 1 . 1+ exp[−Da(θ − b)]. Equation 2-6. The new parameter a allowed an item to discriminate differently among the examinees. Figure 2-2 showed two examples of ICC for 2PL IRT model with b = 0 while a = 0.5 and 1.5 respectively. P2PL (θ ) was the probability of an examinee with ability θ could answer an item with difficulty b correctly. In both 1PL and 2PL models, the probability of passing ranged from 0 to 1 as θ went from -∞ to ∞.. Figure 2-2. Two examples of 2PL ICC with b = 0 while a=0.5 and 1.5 respectively. 11.

(23) For the 1PL and 2PL models it is a tacit assumption that as examinee ability levels become very low (approaching negative infinity), the probability of a correct response approaches zero. For many assessments, however, this may not be appropriate. For example, on multiple-choice assessments a low-ability examinee may get an item correct simply by guessing. The 3PL model allows for this possibility through the inclusion of a guessing parameter. The resulting 3PL model was. P3PL (θ) = c + (1−c)P2PL (θ) ,. Equation 2-7. where the c represented the probability that an extremely low ability examinee would get the item correct.. If P3PL (θ ) is plotted as a function of ability θ , the result would be an ICC as shown in Figure 2-3. The greater the value of ability was, the greater possibility of getting item right, and, of course, the more accurate the item parameters was, the more precise the estimation of examinee’s ability.. Figure 2-3. A typical ICC for the 3PL model with b = 0, a = 1, and c = 0.1. 12.

(24) 2.3. Computer-Based Testing Generally speaking, computer-based testing (CBT) refers to kinds of tests in which examinees take the examination on computers. The common type of CBT is to transform paper-and-pencil form tests into computerized ones. Examinees get their scores after they take all testing items sequentially or randomly. As testing is an important component in on-line learning environment, a number of testing systems, with the rapid development of e-learning, have functions to grade test takers’ performance, to provide them of scores and fitting feedback, and to determine their knowledge proficiency at the beginning of their study (Gibson et al., 1995; Roever, 2001; Gao & Liu, 2003; Chen, 2003). No matter how simple or complex the CBT systems are, there are three main stages in the procedure of CBT - starting, continuing, and stopping. The procedure of CBT could be divided into three stages as shown in Figure 2-4.. 13.

(25) Start starting First item selection. Item presentation Item is responded by the examinee Item selection. collection of the examinee's responses. continuing Is stopping rule satisfied? N Y estimation of the examinees' latent traits. Stop. stopping. Figure 2-4. The procedure of CBT. In the starting stage, CBT system picks an initial item for examinees. Two common strategies for the selection of the first item are shown as follows.. Sequential selection: Based on classic test theory, a test was composed of items with item difficulty index from high to low. The item difficulty index functions are defined as Equation 2-2. The higher the item difficulty index, the easier the item. In this method, examinees take the easiest one as the first item.. 14.

(26) Random selection: Examinees take the first item selected randomly from the item pool. Each item has equal chance to be the first item, whether the item difficulty index is high or low.. The second stage is the iteration for the collection of the examinee's responses, estimation of the examinee’s latent traits, and the selection of items. After answering each item, CBT system keeps the examinee's responses. Then, it selects the next item sequentially or randomly.. After the examinee finishes all items, it scores the. examinee’s latent traits. Finally, in the stopping stage, there are two situations at which the testing stops: either when examinees finishing all testing items or when the test time is up.. 2.4. Automatic Item-Generation (AIG) Within recent years, automatic item-generation system has become a hot topic in the testing field. The current automatic item-generation systems have been widely applied in the tests of vocabulary, reading, grammar, cloze and figural tests. This study would investigate literature related to AIG, discuss the production method of distracters, and also understand the fields that AIG could be applied to. In addition, this study also expected to understand whether the application of AIG was suitable for teaching system and whether the system that AIG produced could help the students’ learning or not. Finally, this study also investigated figural tests produced by AIG theory to understand its system architecture, performance, and method to generate figural items by using image processing expecting that those could assist the system construction of this study.. In the following sections, this study would investigate studies related to understand its generation mechanism of distractor hoping that it could be applied in this study. Steven (1991) proposed a method for constructing vocabulary tests; he utilized the technology of concordance of natural language processing to produce. 15.

(27) vocabulary item from general corpora. On the other hand, Wilson (1997) suggested automatically generating practice items of CALL (Computer Assisted Language Learning) from electronic dictionaries and parsed corpus.. Coniam (1997) used the information of word frequency from the statistics of corpora to generate multiple response items of cloze tests automatically. The method was to obtain a large amount of corpus and then use Automatic Grammatical Tagging System (AGTS) as the tool to mark the lexical category of the words and make statistics on the part of speech of each word one by one. Utilizing these two information, Coniam’s system could accept three different item-generation conditions: (I) always remove the nth word in each sentence and use it as the answer (in order to allow students to have a preliminary concept about the whole article, the answer from the second sentence was usually removed), (II) limit the answer to certain ranges of word frequency, and (III) limit the part of speech of the answer. After deciding the answers of the cloze test, the system would choose 3 words of the same part of speech and similar word frequency with the answer to be distractors in each item to construct a cloze test. Coniam provided a basic process generating cloze tests automatically from a corpus. However, the generated items still needed to be reviewed by human mainly due to two major reasons. First, the strategy of deciding answer was not optimized and this may lead to that items were too easy or too hard to answer. Secondly, after distractors were generated, the system didn’t examine whether they could replace the answer to be the best choice and this led to that there were too many correct choices to choose from in a multiple-choice item.. Poel and Weatherly (1997) proposed another method to construct cloze tests. Their strategy of selecting answers was to remove the answer every 6 or 7 words starting from the 3rd sentence of an article. These items stems with removed answers were then answered by a student group with certain ability in the form of “fill in the blanks“ item, and they generalized 3 wrong words that these students answered most frequently to be distractors. This method had a more serious problem of choosing inappropriate answers because when the answer was a definite article then the items would be too easy and students could answer without understanding the whole article, 16.

(28) and when the answer was a proper noun, students couldn’t answer it. Besides, whether student’s wrong answers could be distractors needed to be proved by experiments.. AWETS (Automatic Web-based English Testing System) developed by Kao (2000) integrated sub-systems including automatic generation of vocabulary item, test delivery, automatic scoring, and grade recording. The process of automatic generation of. vocabulary. item. was. to. collect. the. articles. of. Project. Gutenburg. (http://promo.net/pg/) and Taiwan Panorama magazine (http://www.sinorama.com.tw/) from the internet and then save the sentences in a corpus after the pretreatment of natural language technology. Test editors could choose answers or difficulty parameter of items and then retrieve sentences complying with the conditions from the corpus, and use information provided by electronic dictionaries and word classification to generate distractors. According to the ranking of word frequency of answers, the difficulty parameter was broadly divided into hard, middle, and easy. Professor Kao’s method still couldn’t make sure whether distractors would confuse the structure of multiple-choice item due to the lack of mechanism of examining distractors. In addition, without monitoring of words’ parts of speech would also lead to the inconsistence between tenses of distractors and answers.. Mitkov (2003) developed a system automatically generating English multiplechoices tests automatically, in which the items were essay questions and the answer was limited to noun/noun phrase because the corpus wasn’t a large one. Mitkov used electronic teaching materials as the source of items, utilized nouns/noun phrases to be answers from the sentences that stated facts, changed the original statements into interrogative sentences to be question stems, and finally selected words or phrases with semantic similarity to the answer to be distractors. For instance, we could obtain “introductory modifier“ to be the answer from the sentence of “A prepositional phrase at the beginning of a sentence constitutes an introductory modifier,“ and then change it to an interrogative sentence and make it become the item as the following:. 17.

(29) What does a prepositional phrase at the beginning of a sentence constitute? 1. a modifier that accompanies a noun 2. an associated modifier 3. an introductory modifier 4. a misplaced modifier. Mitkov’s experimental results indicated that items generated by this algorithm weren’t significantly different from the man-made ones in difficulty and discrimination parameter. Moreover, the quality of distractors was promoted and the efficiency of question generation was even improved several times.. We would also discuss studies related to the application of AIG in teaching in the following sections. Li (2011) developed a Chinese idiom practice system based on ontology. He first analyzed multiple messages from the literature to construct idiom ontology. After generalized common reasons that students used wrong idioms to make sentences, a diagnosis mechanism workable on computers was then designed. Finally, he constructed a Chinese idiom practice system based on ontology by the system development method.. Li’s system could provide on-line idiom teaching materials, generate true/false, multiple choice, and matching questions automatically. Besides, it could use the method of situational sentence making – utilizing idioms and words to form a complete sentence at a certain situation – to let people practice idioms. This method could also judge the answer, diagnose whether the sentence was reasonable or not, and provide immediate feedbacks according to the blind spots in using idioms. Yang’s system broke the limit that the current on-line idiom practice systems could only generate questions from the pre-set item bank and also provided an environment of applying idioms to situational sentence-making.. As for system performance and satisfaction evaluation, the results showed that this system could improve student, especially for those with middle and low. For using idioms to make sentences, students who were assisted by the system of this study had a 18.

(30) significantly higher level of correctness in using syntax and meaning of idioms than those who received traditional teaching. Students also felt that the system was easy to use and useful and felt satisfied about it. Finally, as for system feasibility, teachers who were interviewed all thought that the system of this study could be applied in students’ self-study of idiom.. In 2010, a template-based item generation system was proposed. By specifying multiple item templates which used different words of subject, verb, object, a question sentence, and variable numbers to substitute the elements of a template, Jiang (2010) proposed an automatic item generation system based on templates. In order to keep the generated sentence going smoothly and fluently, the number of artificial intervention was decreased and the rationality among elements was considered. The proposed automatic item generation system can quickly create a series of items, and improve the time consuming and manpower cost.. Next we would discuss the application of AIG in figural tests and hope that it could assist the system development in this study. Liu, Liang and Lin (2001) has researched computer adaptive figural testing since 1998. Their researches are based on the analysis of Raven’s Advanced Progressive Matrices (APM) test structure. Besides, Lin was also responsible for the development of the New Figure Reasoning Test (NFRT).. NFRT contains two main systems: the automatic item-generation system and the online testing system. The online testing system based on IRT theory is just an interface for collecting and evaluating the ability of examinees. The point of this study is an automatic item-generation discussed in the following paragraphs.. The IRT parameters of APM in Lin’s study are discussed as follows: 1.. Difficulty: According to Hambleton and Swaminathan (1985), the value of item difficulty parameter was set to between -2.0 and 2.0. Based on this criterion, the difficulty of APM items was between -2.0 to 2.0 with average of -.868. An APM example item is as Figure 2-5. 19.

(31) 2.. Discrimination: In terms of ability tests, the value of discrimination parameter was more than 0 and relatively low in APM, and the item 8 had the lowest discrimination (.014).. 3.. Guessing: According to the estimation, the supposed value of guessing parameter in APM items was .219. Since there were 8 choices in APM, the predicated value should be 12.5%. The average guessed value was higher than expected.. Figure 2-5. An example item of APM (Liu et al., 2001). An automatic item-generation system contains an item generation algorithm and an item-generation engine based on APM. The functions, strengths and restrictions of this system are described as follows. 1. Item generation engine: The engine can automatically generate a specific item with particular content features, and combine different types of geometric figures in a systematic fashion for producing and measuring the item which matches the goal. The purpose of the measurement was to evaluate examinees’ reasoning ability on the conclusion (inference on relations) and deduction (inference of relativity) through the figure partition characteristic of the item. 20.

(32) and the manipulation of the relationships between figures in space. An example item of APM is shown in Figure 2-5. 2. Item generation algorithm: The algorithm for item-generation was based on the analysis of features in APM items. The key points were the parameters in IRT theory and the problem solving processes of APM.. 2.5. Virtual Item Bank Theory Next, this study would discuss literature related to item security. In addition to investigating researches related to parameter control of item exposure by using algorithm, this study would also describe VIB theories that this study used in detail. When VIB theories were used, item exposure rate and item overlap rate would maintain within a secure range so that item security could be well controlled.. CBT or CAT is administrated by selecting items from item bank. However, test security problems concerning item overexposure will arise when a great number of examinees have participated in the test over time. The test security could be assessed by two key indicators: the item exposure rate and the item overlap rate. Chang (2003) tried to protect test security by randomly selecting items for a more even item distribution, however, no desirable results were seen in this method. Some researchers solely focused on the control of the item exposure rate in the hopes that this problem could be solved. One of the most discussed control method in their study is SH Procedure (Sympson & Hetter procedure) proposed by Sympson and Hetter (1985). This method was done by using the ability distribution of a group of simulated examinees to control the item exposure rate prior to the test. To achieve better control, the ability distribution of this group should be similar to the real world. To make this happen, different exposure control parameters were used in examinees with different levels of ability.. 21.

(33) In 2003, Chang proposed SHC (Sympson & Hetter conditional procedure). SHC is a kind of control mode which divides examinees with different levels of ability into different groups, obtains the exposure control parameters of each item in different levels of ability, and combines the parameters into an exposure control matrix as the basis of exposure control in a real test. For fewer examinees with higher and lower ability, the maximum expected exposure parameter should be adjusted higher. On the contrary, for more examinees with medium ability, the maximum expected exposure parameter should be adjusted lower to increase the usage rate of the item (Chen, 2007). Other methods which can control item exposure rate includes unconditional multinomial (SL) procedure (Stocking & Lewis, 1995), conditional multinomial (SLC) procedure (Stocking and Lewis, 1998), Davey & Parshall procedure (DP, 1995), and SH online procedure with freeze control (SHOF) (Chen, 2005). However, these methods do not take item overlap rate into consideration so that item overlap problems remain.. Based on the argument that item exposure rate and item overlap rate are not independent but interdependent (Chen, 2004), Chen and Lei (2005) developed SHT that controlled both the item exposure rate and the item overlap rate to complement SH. Like SH, SHT requires pre-simulated exposure parameters as they both have timeconsuming and test scenario problems. To solve this problem, Chen, Lei and Liao (2008) extended SHT into SHTO so that the efficiency of controlling item exposure problems can be dramatically enhanced by controlling item exposure rate and item overlap instantly online without having to pre-simulate exposure parameters. Nevertheless, both SHT and SHTO can only control the item overlap rate between two examinees. In fact, an examinee can obtain test information from more than one person. Therefore, it is necessary to control the item overlap rate between one prospect examinee and a group of examinees who have already taken the test. To broadly control item overlap rate, Chen (2008) proposed SHGT control method. Similar to SHTO, SHGT can instantly control item exposure rate and overlap rate on line. They differ from each other in that SHTO can only control item overlap rate between two examinees, while SHGT can do so for one prospect examinee and á past examinees (á ≥ 1). 22.

(34) Although researchers have come up with different ways to control both item exposure rate and item overlap rate, test disclosure remains a problem when there are too many users over time (Chang, 2003). Thus, some researchers use Automatic Item Generation (AIG) technique to generate items. Although it has been proposed for 30 years, AIG has not been used until recently (Irvine & Kyllonen, 2002). There are numerous approaches for generating items using a computer (Millman & Westman, 1989), but they generally require the existence of an item model. An item model (Bejar, 2002; Drasgow et al., 2006) is a general prototypical representation of the items to be generated. Furthermore, each component of an item model can contain both fixed and variable elements (Lai, Alves & Gierl, 2009). Using item model, AIG can generate countless items to solve item exposure rate and overlap rate problems. However, this method cannot be applied in CBT or CAT as it cannot accurately calculate examinee’s ability.. Designing CAT and CBT is challenging as it takes a lot time and resources to create the item bank. According to a study conducted by Chen (2007), only 78 research papers done by PhDs and graduate school students in Taiwan are on tests (10 on traditional CBT, 35 on CAT, 33 on Online Testing). It is even rare to see papers on figural testing. Therefore, it is an important job for researchers to help test editors to design the item bank for figural tests using fewer manpower and resources in a shorter time.. The term Virtual Item Bank was first seen in the research, “Design a Virtual Item Bank Based on Image Processing Technique“ by Liao(2002). Different from traditional item banks, VIBs are not physical lists of items but innumerable items generated via item generation rules, which avoid the issue of item exposure rate in the traditional item banks.. Liao’s study proposed a new concept – Virtual Item Bank, showed how this concept is used in the figural test. Their research has developed two research tools: Virtual Item Bank System and CAT system. The system structure and functions of these two tools are described as follow. 23.

(35) 2.5.1. Virtual Item Bank System In VIBS, the item database no longer stores large amounts of items; instead, it saves two elements to replace the traditional items: 1. Basic figure elements: This system no longer requires saving a large amount of figural items. Instead, items were built upon three basic figure types: line, circle and multilateral. Not only does this lower the memory space requirements, but it also reduces the probability of item exposure. 2. Solution processes: In VIBS, examinees’ solving processes and abilities were defined by specialists and converted to mathematical formulas which could be manipulated by computers and stored in the hypothetical item database.. The VIB which replaces the traditional item bank is illustrated by the flow chart below. Analyze. Convert the. The system. Identify the. the ability. ability needed by. analyzes the. ability to. and process. the subject into. subjects’ testing. be tested.. the subject. mathematical. needs and. and the. formulas, and. produces items. item that. store them in the. according to the. needs to be. database.. formulas in the. solved.. database.. Figure 2-6. Flow chart of VIB. The VIBS contains three subsystems: item rule definition subsystem, item generation subsystem, and answer retrieval subsystem. Each subsystem has different tasks and functions and is described below. 1. Item rule definition subsystem: This subsystem provides test editors with a number of figural objects, processes the needed information to solve the problem. Through the system interface, users can determine the figure’s. 24.

(36) position on the system interface and choose the method to process images. The subsystem then estimates the item difficulty and asks the test editors to adjust the difficulty level. Finally, the item initiation subsystem would save this information into the data. The item generation rules and the items generated by these rules are shown as Figure 2-7:. Addition Rule. Allocation Change Rule. Diagonal Rule. Oblique Rule. Quantitative Pair Wise. Size Change Rule. Progression Rule. Move Rule. Straight and Oblique Rule. Overlap Rule. Bias Rule. Angle Change rule. Image operation with no changes. Figure 2-7. Item generation rules without binary operations. 25.

(37) These 12 item generation rules can be combined with 4 binary operations yielding 48 different rules, and the difficulty parameters of these 48 item types were then calibrated.. 2. Item generation subsystem: The main idea of this system was to generate all kinds of data in the item generation subsystems in the hope of producing an infinite number of items. It contains three subsystems: a. Image processing subsystem: This subsystem performed image processing based on the variation demanded for items. In addition to the 2D process described above, there were object size variations, position movements, and other special processing functions. b. Data retrieval subsystem: This subsystem can retrieve items and make sure there is only one reasonable solution for each produced item. Therefore, it will delete unreasonable items from appearing in the test to ensure item accuracy. c. Item shape control subsystem: This function can control the item shape, and generate the answer zone as well as make sure that the answer is the only correct one. It also restrains the item’s body logic from the top to the bottom, from left to right, from right to left, and from the bottom to the top in order to provide a variety of items to examinees.. The features of the item generation subsystem are: a. Defining the needed abilities and strategies in order to solve the item. b. Determining the object shape of each item c. Identifying the difficulty parameter d. Parameter conversion: The system converts the data mentioned above into mathematical formulas, and saves them in the VIB. e. Automatic generation system: The item generation subsystem can automatically generate items according to the defined strategy, difficulty level, and selection.. 26.

(38) 3. Answer retrieval subsystem: Alternative options of each item were generated by image comparison. First, the RGB value of the figures’ pixel was computed as the characteristic value. Then, the figure characteristic was saved into a 2-dimension matrix, and compared it with figures in the database. The similarities of the two figures were used to calculate the Euclidean distance (as shown in Equation 2-8) of the characteristic value, and the lowest three were selected as the alternative options. d(Q ,I)=. ∑( f. Q. −. f. I 2. ). Equation 2-8. The VIBS is composed of those subsystems which control the item shape, item difficulty, answer, and all parameters.. 2.5.2. CAT system The feature of CAT system is to select appropriate items for examinees and evaluate examinees’ ability based on IRT model. In producing items, CAT system is only an application interface, and does not perform image process, item design or retrieval. These tasks are done by VIBS, and the results are sent back to CAT system to administer tests. In terms of ability evaluation, this system uses IRT to process.. Since the system simplifies factors that affect the items, the Rasch model is used in their study. The functions of the editing tool are described as follows. 1. Editing tool: The major functions of this system are to obtain all initial parameters of figural items, and store the item parameters into the VIB. When an examinee takes a test, the VIBS will generate suitable items for examinees. The following are the system instructions of how to input the elements and functions into VIBs.. 27.

(39) Step 1: Deciding the location of figural objects Select the locations of figural objects. Figure 2-8. Deciding the location of normal figural objects Step 2: Deciding the image processing operations rules. Decide the image processing rules. Figure 2-9. Deciding on the image processing rules. 28.

(40) Step 3: Choosing the next figural objects and save them into the VIB.. Save the rule into VIBs and generate items. Figure 2-10. Choosing the next figural objects and store them into the VIB.. 2. CAT system: The major functions of the CAT system are collecting and evaluating the ability of examinees. The figure above represents the issues of the problem and demands of item bank generation, in addition to the development of research tools. These tools helped test editors to solve the problem of the item exposure rate. A simulation of the item overlap rate be discussed and proved in this research.. Finally in their research, they proposed a new technique called, “VIB”, to address test security problems. This technique integrates AIG, Content Based Image Data Retrieval, item exposure rate control, and item overlap rate control to do so. Using VIB to administrate a test can rule out item exposure and overlap problems. Using VIB can also precisely calculate examinee’s real ability without an error.. 29.

(41) Figure 2-11. CAT system. To validate the study, they conducted a test using APM as the material to build a VIB for figural testing and using CAT system to link the VIB. Their research found out the combination rule of APM tests some research on APM and uses image processing operations, such as And, Or, Xor, and Sub to establish these rules. In their research, using image processing techniques helped us to easily and quickly generate items.. To address the technical problems on distracters, the purpose of their research aims to prevent similar distracters that may confuse examinees. They used the contentbased image retrieval technique to analyze the similarity of two options. Options with. 30.

(42) higher level of similarity will be removed by VIB. Likely, items with similar stems will also be taken out by VIB so that the items will make more sense to examinees.. Working with all the above techniques, their research developed research tools that included item rule definition subsystem, item generation subsystem, answer retrieval subsystem and CAT system. Using these tools, test editors can easily build a VIB. This study refers to APM to build the basic element of figural testing and transform the item combination rule of APM into image processing actions to be into the VIB for final test and validation.. The result of their research shows a positive correlation with that of using APM and demonstrates a desirable correlation coefficient(r = 0.683, n = 301, p = .000). The item exposure rate was extremely low with the rate ranging from 0 to 1.0128E-4, while the item overlap rate was 2.43488E-10 which could be excluded from calculation. Conclusively, when VIB is used, test security is the highest and an examinee’s ability can be correctly calculated.. Above all, the VIB building process proposed in their research are well-acclaimed by both test editors and experts. The researchers can use this technique to build a VIB on all tests. With regards to research tool manipulation, both test experts involved in this study think the tools are easy to use. Using graphic design technique to build objects and rules make it easy to build a VIB. On the test interface, CAT can quickly generate an item. Besides, both test operators and experts have not seen any duplicate items during their study, which means test security was ensured along the way.. 31.

(43) 2.6. Cube enumeration This section was the discussion on literature related to the abilities evaluated by the cube enumeration test. Besides, there was also a simple investigation on the factors that would affect item difficulty of the cube test expecting that those would assist the evaluation of item difficulty in this study.. Cube, or regular hexahedron cube, can be used to introduce the concept of volume as the basic unit for volume. Previous studies show that enumerating cube arrays are used to measure or build the mental image and manipulation capability of 3D geometric objects.. In the past, most studies related to cube enumeration test used 2-D cuboid figure stacked by cubes, and those researchers mostly explained the examinees’ cognitive process by the perspective of spatial orientation. Ben-Haim (1985) regarded that the examinees’ major difficulties were whether they understood these 3D figures on 2D planes and whether they could consider invisible cubes or not when solving the questions. This cognitive process involved with their ability to understand relative spatial relationship among all cubes and to create mental images appropriately, and this was categorized as spatial orientation ability.. The serial studies of Battista and Clements (Battista, 1999; Battista& Clements, 1996, 1998) pointed out how one coordinated and integrated messages coming from different viewing angles and then formed the mental image of a 3D object in one’s mind when structuralizing an object’s position. This mental image could be stored as an object that could be manipulated mentally in the future. Generally speaking, the cube enumeration process that Battista and Clements investigated still belonged to spatial orientation. How to understand 3D figures represented by 2D ones, form mental image of 3D objects, and perform spatial orientation on each cube were the spatial abilities that a cube enumeration test tried to measure.. 32.

(44) In addition, some scholars indicated that factors affecting the sixth graders’ cube enumeration were their abilities to form correct mental image of invisible cubes and group scattered cubes by manipulating and moving them and finally calculate these groups by the volume formula and add them. The former - the production of mental image - was spatial orientation and the latter one – psychological manipulation - was spatial visualization, and calculation ability was related to mathematical ability of volume calculation. Besides, the fifth graders’ mathematical ability of volume calculation was measured by a cube enumeration test in Taiwan’s elementary school curriculum and the teachers also used cube enumeration to teach the concepts of volume. This also demonstrated that the teaching of cube test could be used in the teaching material of mathematical ability of volume calculation. After summarizing the above literature, it was found out that the abilities measured by a cube enumeration test were closely related to the examinees’ spatial orientation, spatial visualization and mathematical ability of volume calculation.. Next this study described the factors affecting the difficulty that the examinees considered and their strategies when enumerating the cubes expecting that those could provide important reference for difficulty evaluation in this study. Students often consider that cube enumeration test is difficult (Battista & Clements, 1996). For example, in a research done by National Assessment of Educational Progress (NAEP), the hit rate for cube enumeration in cuboids was lower than 50% for students at the age of 9, 13 and 17. The results of this research indicated that cube enumeration tasks, even in regular cuboids, are difficult even for older students (Ben-Haim , Lappan & Houang, 1985).. In order to design a spatial visualization ability test and examine the effects of instruction intervention, Ben-Haim (1985) referred to the option design of cube enumeration item of Michigan Educational Assessment Program in 1983. Each item was composed of a 2D cuboid stacked by cubes, and the test asked students to select the cube amount that was needed to form the cuboid. The designing principle of the other four distracters were (1) visible surface amount, (2) multiply the visible surface amount by 2, (3) visible cube amount, (4) multiply the visible cube amount by 2. 33.

(45) Ben-Haim et al. (1985) regarded that there were two major difficulties for the students; one was that whether they could realize 2D figures by a 3D way, and options (3) and (4) were cube amounts and students selecting these two options could realize that 2D figures represented 3D objects while students selecting (1) and (2) couldn’t. The other difficulty was that whether the students could consider invisible cubes; options (2) and (4) were to multiply visible surface or cube amount by 2, and therefore the sixth graders selecting these two options could consider invisible cubes while the sixth graders selecting (1) and (3) couldn’t. That study conducted the instruction experiment on the fifth to eighth graders in 3 areas (there were more than one hundred to 4 hundred students in each grade); before the instruction, these students’ correctness rates respectively were 25%, 40%, 45%, 50% and the result generally agreed with the result of NAEP(1977~1978) (47%), the survey on 127,420 seventh graders. Their selecting rates of the wrong answers from (1) to (4) respectively were 17%, 22%, 8%, 8%.. Battista and Clements had conducted a series of cube enumeration study and proposed that spatial structuring, mental model and scheme were related to the reasoning of cube enumeration (Battista, 1999; Battista & Clements, 1996). Spatial structuring was the process of constructing a 3D object, and the process determined the shape or nature of an object by identifying and integrating each component. This process included constructing units, constructing the relations among units (such as relative position) and cognizing that correct repetition of units could produce the whole; for instance, repeating “row” could produce the whole cuboid. Mental model was the nonverbal and lively mental image activated by physical situations and the image could be used by an individual to explain or reason in the process of doing or thinking. Scheme was a set of organized sequence of action or operation, abstracted from past experiences, and it enabled people to do the same response in similar environments.. Battista and Clements (1996) pointed out that there were 4 development stages for the students to construct a 3D cuboid in their minds. Stage 1 was medley of viewpoints which meant that the students only paid attention to a surface at one time and then. 34.

(46) pieced viewpoints together to perceive a 3D structure; the spatial organization was partial.. In stage 2 composite units, the students could form composite units and then processed them psychologically but their composite units weren’t necessarily appropriate; it would be more appropriate for students to organize by small 3D units than by 2D units. However, even though some students could tell the difference of “square” that they saw and “cube” that they need to enumerate, the complicated process of cube surface enumeration always made them forget to pay attention to the difference particularly when they enumerated cube surfaces near the edges because different surfaces represented the same cube, and therefore they needed the ability of stage 3 coordination to coordinate different angles caused by the cuboid surfaces such as the coordination of right angles of the front and the right side and the top and the left side; they would do the enumeration right when they understood that the cube surfaces on the edges represented the same cube.. In stage 4 integration, students had to collect all separate viewpoints to form the whole picture of the object and this stage included two psychological mechanisms. The first process was recollection which enabled an individual to activate the mental models in the past experiences. Facing an object, an individual first compared the different viewpoints of the realistic object and the existing mental models to activate the appropriate mental model. If there was no suitable mental model, then the individual would start to perform a series of integration and transformation to produce new psychological objects through the second psychological mechanism – scheme.. Battista and Clements (1996, 1998) had interviewed with 45 third graders and 78 fifth graders for their enumeration performances; the materials were figures of cuboids stacked by cubes and they found out 5 different strategies. In strategy 1, the students thought by the unit of horizontal or vertical “layer” of the cuboid and the possible methods of enumerating “layer” were doing enumeration one by one or repetitive accumulation or multiplication, and students using this strategy had the highest correctness rate. In strategy 2, they saw the cuboid as a filled-up space and would try 35.

(47) to enumerate the cube amount inside and outside the cuboid; they may use “row” as the unit to enumerate cubes or they may use unsystematic and random method to do it, but usually the error rate was more than 50%. In strategy 3, students could only enumerate the cubes on the visible surfaces of the cuboid and would ignore the cubes inside the cuboid, and they often would doubly enumerate the cubes on the edges of the cuboid, and hence the error rate was high. In strategy 4, although they could use the volume formula L by W by H to calculate, some of them couldn’t explain why they had to multiply these 3 numbers and hence this didn’t indicate that they understood its meaning. Strategy 5 referred to the strategies other than the above four ones including misusing the volume formula; for instance, they used the wrong numbers of all sides of the cuboid for the multiplication.. Battista and Clements (1996) found out that the students who could do enumeration by “layer” had a higher correctness rate - about 60% of the fifth graders and lower than 20% of the third graders. For those 3 items, there was only 29% and 7% of them who could use this strategy to correctly answer the question for the fifth and third graders. This showed that it still would be difficult for the fifth graders to do enumeration by “layer”. On the contrary, about 60% of the third graders and 20% of the fifth graders used strategy 3 - only calculated the cubes on the cuboid surfaces. It was quite common for them to do repetitive enumeration; 64% of the third graders and 21% of the fifth graders did it at least once, and 33% of the third graders and 6% of the fifth graders did it in all items.. Battista and Clements (1996, 1998), Ben-Haim et al. (1985) and Olkun and Knaupp (2000) used cuboids stacked by cubes or frameworks formed by length, width, and height as the materials, and their purposes were to investigate how individuals construct 3D psychological objects from 2D figures. It could be discovered from the review of spatial ability that this was generally categorized as spatial orientation. In other words, the above researchers thought that the error of cube enumeration was caused by that people couldn’t transform 2D figures to 3D figures or couldn’t coordinate viewpoints of the right angles of cuboids correctly and also couldn’t consider invisible cubes. Nevertheless, more items were cuboids stacked by cubes 36.

參考文獻

相關文件

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =>

For pedagogical purposes, let us start consideration from a simple one-dimensional (1D) system, where electrons are confined to a chain parallel to the x axis. As it is well known

The observed small neutrino masses strongly suggest the presence of super heavy Majorana neutrinos N. Out-of-thermal equilibrium processes may be easily realized around the

Define instead the imaginary.. potential, magnetic field, lattice…) Dirac-BdG Hamiltonian:. with small, and matrix

incapable to extract any quantities from QCD, nor to tackle the most interesting physics, namely, the spontaneously chiral symmetry breaking and the color confinement.. 

(1) Determine a hypersurface on which matching condition is given.. (2) Determine a

• Formation of massive primordial stars as origin of objects in the early universe. • Supernova explosions might be visible to the most

The difference resulted from the co- existence of two kinds of words in Buddhist scriptures a foreign words in which di- syllabic words are dominant, and most of them are the