Margaret Wu Relative Strengths of Western and Asian Students+ 67 +
Journal of Research in Education Sciences 2011,56(1), 67-89
Using PISA and TIMSS Mathematics
Assessments to Identify the Relative
Strengths of Students in Western and Asian
Countries
Margaret Wu
Assessment ResearchCentr巳
University of Melbourne,Australia
Associate Professor
Abstract
A study was carried out that compared Programme for International Student Assessment (PISA)
20的 Mathematicsresults with Trends in International Mathematics and Science Study (TIMSS) 2003 Grade 8 Mathematics results (Wu, 2009). It was found that Western countries generally performed better in PISA than in TIMSS, and Eastern European and Asian countries generally performed better in TIMSS than in PISA. In this paper,TIMSS released items are divided into two sets: one that fits the PISA framework and one that does not. In particular, many geometry and algebra items in TIMSS are
“
inner mathematics" (mathematics without a real-life context),and such items do not appear in the PISA test. Differential performances of six countries on each set of items are examined. In this way,the relative strengths and weaknesses of Western and Asian countries are identified. These strengths and weaknesses are then linked back to the contents of the PISA and TIMSS tests. There is strong evidence that differential performance between Western and Asian countries in PISA and TIMSS can be directly attributed to the types of items in the respective tests.Keywords: international study,Mathematics,PISA,TIMSS
68
•
• Relative Strengths of Western and Asian StudentsIntroduction
Margaret Wu
A study was carried out that compared Programme for International Student Assessment (PISA) 2003 mathematics results with Trends in International Mathematics and Science Study (TIMSS) 2003 grade 8 Mathematics results,using count可 meanscores for 22 participating countries of both studies (Wu,2009). Itwas found that Western countries generally performed better in PISA than in TIMSS,and Eastern European and Asian countries generally performed better in TIMSS than in PISA. Furthermore,two factors, content balance and years of schooling,accounted for most of the vanatlOn III PISA count可 mean scores after controlling for TIMSS count可 mean score Consequently,the rankings of countries in the two studies can be reconciled to a reasonable degree of accuracy. Further, the Organisation for Economic Co-operation and Development (OECD) publication, Learning mathematics for life (OECD, 2009, chap. 4), showed that the relative performances of countries by traditional mathematics content areas (Algebra, Data, Geometry,
Measurement and Number) differ significantly, and that countries are grouped according to their similarities in students' responses to the items. Gronmo and Olsen (2008) also showed that the compositions of the PISA and TIMSS tests differ considerably in terms of mathematics content and item format. They concluded that content differences between PISA and TIMSS contributed to the observed differences in the rankings of countries in PISA and TIMSS.
The fact that content balance has a significant effect on count可 performance suggests that students in different countries have particular relative strengths and weaknesses. If these specific strengths and weaknesses are identified beyond the level of broad content categories, mathematics educators in eachcount可 canbe informed of the specific skills students have or lack. This will also provide a further insight into what PISA and TIMSS are each assessing,beyond the usual rhetoric about curriculum-based and non-curriculum-based focus.
Thisp叩erattempts to examine item level skills in order to identi 命 strengths and weaknesses,
thus providing specific instructional feedback to mathematics educators as well as test designers. The item analysis is based on data from six countries: Australia,England,United States,Hong Kong,
Taiwan and Korea. The results for the three Western countries show a similar pattern, as do the results for the three Asian countries. There is strong evidence that cultural traditions of learning mathematics have an impact on students' performances on different types oftest items.
Margaret Wu Relative Strengths of Western and Asian Students
.
69 •PISA and TIMSS Mathematics Frameworks and Tests
Comparisons of PISA and TIMSS mathematics frameworks can be found in a number of publications (American Institutes for Research,2005; Hutchison & Schagen,2007; National Center for Education Statistics,2008; Neidorf,Binkley,Gattis,&Noha凹,2006). These comparisons tend to produce a descriptive list of similarities and differences between the two published frameworks,such as the classifications of the content domains and the cognitive domains. However, few published comparisons critically examined the differences. For example,would one framework lead to a test that is completely different from a test based on the other framework? Is one framework a subset of the other? If so,what is missing in that framework? Are the two frameworks essentially the same,
other than nomenclature of the classi日cations?This paper looks at these issues and,in doing so,it is hoped that a better understanding is gained in relation to key differences between PISA and TIMSS at the level of the tests and items,not just at the level of broad aims and orientations,as reflected in the frameworks.
At a first glance,most would conclude that both PISA and TIMSS mathematics frameworks are comprehensive. It does not appear that one framework is necessarily a subset of the other, or that something is glaringly missing from either framework. However,the PISA mathematics framework suggests its more inclusive approach, with the following line at the beginning of the framework document:
Rather than being limited to the curriculum content students have learned, the assessments
focus on determining
if
students can use what they have learned in thesituati仰s theyarelikely to encounter in their daily lives. (DEeD,2003,p. 24)
The word
“
limited" suggests that PISA is attempting to be more inclusive in terms of coverage of the mathematics domain. It also suggests that the school curriculum, whether intended,implemented,or attained,does not focus on whether students can use what they have learned. Does this mean that TIMSS,being more curriculum based,does not assess whether students can use what they have learned? The TIMSS mathematics framework states the following:
While the assessment of abilities such as solving nonroutine problems and reasoning with
.
70 • Relative Strengths of Western and Asian Students Margaret Wuthat 戶rm the initial base 戶rthe development and implementation of these skills will also
be assessed. Problem solving and communications are key outcomes of mathematics
education that are associated with many of the topics in the content domains. They are
regarded as valid behaviors to be elicited by test items in most topic a陀的 (International
Association 戶rthe Evaluation ofEducational Achievement,2003,p. I I)
Itwould appear that TIMSS does not preclude problem solving in the framework,but there may be less emphasis on this than in PISA,where the mathematics framework is almost entirely built on
theapplication of mathematics competencies that may reasonably be expected of students. In short,
one may expect a different balance in the two tests in tenns of problem solving items and fact/procedural items,but one might not necessarily expect a complete absence of one type of items.
However,a close examination of the test items in the two surveys reveals that there is one class of items in TIMSS that does not appear in PISA. One might label this class of items as
“
content rich,"“formal mathematics,"or“naked mathematics." Some of these items are illustrated in FigureItemA In squareEFGH.which ofth的心isFALSE"
tn-d > mt-eu-e mmwmm mm 叫啥 叫叫 CH C 了 。 C YHH( 1 、lfta YHm 叫叩 L 后 UMG A且 fl 川心血
叫州州叫
‘ FIE-也 wr 吼叫 nM EGEVL A且 AdAdAd@@@@
去
E F Item BIfnis a negativeinte早日民 whichof these is theI肘'gestnumber'?
®
3+n®
i3 xn©
3-n@
,
1+nFigure I. Examples ofTIMSS items not appropriate for the PISA test
Margaret Wu Relative Strengths of Western and Asian Students
.
71 • of competency list number 8: Using symbolic, formal, and technical language and operations (OECD,2003,p. 41). While these items could fit the PISA framework at a theoretical level,they are not tasks that "(students) are likely to encounter in their daily lives (outside the classroom context)." These may be items that an adult working in a scientific field may encounter in his or her daily life,but not 15-year-old students studying at school. Consequently,items focusing on technical language and formal mathematics are typically not in the PISA test. This is not because the PISA framework does not cover such competencies, but because the operationalisation of the PISA framework as applied to I5-year-old students precludes these kinds of items. So, at one level one could say the PISA framework is comprehensive,but at another level,typically at the level of the construction of test items,a set oftypical school mathematics items are excluded.
In contrast,when PISA items are examined against the TlMSS framework,no group of items appears to be misfitting theTlMSS framework to the extent that such items could not be included in the TlMSS test. However,there are more PISA items thanTlMSS items where,within an item,the competencies tested cover multiple content domains.
Because formal mathematics items are not included in the PISA test,one may ask if,and how,
count可 resultsmay be affected,particularly in the context of differential performance of countries in PISA and TlMSS. This paper attempts to answer this question, by examining the differential performance of countries on the set ofTlMSS items that do not fit with the PISA test.
Methodology
Because there is a large pool of released items from TlMSS 2003 Grade 8 assessment (99 items in total),we examined these items and divided them into two sets: one that fits the PISA test and one that does not. In particular,many geometry and algebra items in TlMSS are
“
inner mathematics" (mathematics without a real-life context),and such items do not appear in the PISA test. Owing to the amount of the work involved in rescaling the item responses for all countries,we selected six countries in particular to examine their differential performances on the two sets of items. The six countries were Australia,England,and the United States (Western countries,abbreviated as West); and Hong Kong,Taiwan,and Korea (Asian countries,abbreviated as East). Item response modeling (Rasch model) was used to calibrate the item difficulties. The item difficulties for the 99 released items were estimated for each country,with the average difficulty for each country centred at zero. These item difficulties were compared across the six countries. If there was no differential item functioning across the six countries,then an item that was relatively more difficult than another item.
72 • Relative Strengths of Western and Asian Students Margaret Wu for one country should be relatively more difficult in another country. On the other hand,ifthere was differential item functioning,then one item may be found to be relatively easier in one count可, butrelatively difficult in another country. We carried out the comparisons of item difficulties particularly with respect to items not fitting the PISA test, and with respect to Western countries and Asian countnes.
Results
Of the 99 released items,42 were deemed not fitting the PISA test. That is,around half of the TIMSS items are not likely to appear in the PISA test. This proportion is surprisingly high.Itcould mean that a large part of mathematics taught in schools is not included in the PISA test. Table I shows the classifications (made by the author) so that other researchers can cross-check if desired. An
‘
'n" in the column headed“
In PISA?" means that the TIMSS item is not likely to be in the PISA test, and an“
N" means that the item is almost certainly not a PISA item. ATIMSS item could be a凹ISA i此tem,P and a
“
Y" means that the item is almost certainly a PISA item. This classification was made before any item calibration was carried out, to ensure that the classification process was independent of any knowledge of the relative item difficulties for eachcount可﹒
The second step in our analysis was to calibrate the items to obtain their difficulty measures for each country,using item response modeling (Rasch model). The average item difficulty for each country was set at zero,so that the calibrated item difficulty for each item was a measure relative to the average item difficulty in that country. In this way, item difficulties across countries can be compared, even if the abilities of students vary across countries. That 函, the calibrated item
difficulties are relative item difficulties for each count可; they are not absolute difficulties such as percentages correct. Table 2 shows the calibrated item difficulties for Australia and Taiwan,arranged in order of the difference in item difficulties. The items at the top left part of Table 2 are those that
Aus仕alian students found relatively easier (as compared to other items) than Taiwan students did. The items at the bottom right part of Table 2 are those that Australian students found relatively more difficult than Taiwan students did.Itis worth noting that,as one scans down from the top left part to the bottom right part of Table 2,the number of
“
n" and“
N" in the column“
In PISA?" increases. In fact,of the first 20 items that Australian students found much easier (relative to other items) than Taiwan students did,only one item has a“N" classification(“almost certainly not a PISA item"). InMargaret Wu Relative Strengths of Western and Asian Students
.
73 • Table 1 Classification of TIMSS 2003 Released Items into Categories of Appropriateness in thePISA Test
Item Seq Unique ID In PISA? Item Seq Unique ID In PISA? Item Seq Unique ID In PISA?
234567890l2345678901234567890123 lIll11All--22222222223333 M012001 M012002 M012003 M012004 MOl泊的 MOl2006 M012013 MOl2014 M012015 MOl2016 MOl2017 MOl2025 MOl2026 M012027 M012028 M012029 M012030 M012037 M012038 M012039 扎1012040 MOl2041 MOl2042 M022002 M022004 M022005 M022008 M022010 M022012 M022016 M022021 M022127 M022135 n y n YNYYYNNYYN n N n y y n N Y n N N n y n y n n yyy 34 M022139 n 35 M022142 36 M022144 37 品1022146 38 品1022148 39 品1022154 40 M022156 41 品1022185 42 M022188 43 M022189 44 M022191 45 M022194 46 M022196 47 M022198 48 M022199 49 M022202 50 M022227A 51 M022227B 52 M022227C 53 M022251 54 M022252 55 M022253 56 M02226IA 57 M022261B 58 M022261C 59 M032036 60 M032044 61 M032046 62 M032079 的 M032208 64 M032210 的 M032228 66 M032233 N yyNYNNYYYN NNYYYNYNYYYN n N n NNyy n 的 M032261 68 M032271 69 M032403 70 M032447 71 M032489 72 M032533 73 M032545 74 M032557 75 M032570 76 M032588 77 M032609 78 M032612 79 M032643 80 M032647 81 M032649A 82 M032649B 83 M032652 84 M032670 85 品1032671 86 品1032678 87 M032689 88 M03269。 89 M032693 90 M032699 91 M032727 92 M032728 93 M032732 94 M032743 95 M032744 96 M032745 97 M032762 98 M032763 99 M032764 n NYNYYYNN n n n NNyyyy n yNN n NYYN n yyyYYY
• 74 • Relative Strengths of Western and Asian Students Margaret Wu
Table 2 CalibratedItem Difficulties (in IRT logits) for Australia and Taiwan
Seq Unique ID AUS
T\\小~
Diffi 冒且圓n Seq Unique1D AUS T\\小~ Differ- ~.~nence PISA? ence PISA?
97 M032762 0.06 2.17 -2.108 Y 14 MOl2027 -0.68 -0.65 -0.02 n 58 M022261C 2.25 4.28 -2.036 y 69 弘1032403 -0.17 -0.18 0.01 N 38 M022148 -1.22 0.59 -1.811 y 54 M022252 -1.42 -1.44 0.01 y 33 M022135 -0.42 1.36 -1.774 y 29 弘1022012 0.36 0.31 0.05 n 18 MOl2037 -0.39 1.23 -1.613 Y 35 M022142 0.13 0.07 0.06 N 34 M022139 0.29 1.70 -1.402 n 49 弘1022202 1.30 1.21 0.09 N 26 M022005 0.17 1.57 -1.399 y 81 M032649A 0.73 0.64 0.09 y 80 M032647 -0.44 0.92 -1.362 y 78 M032612 0.30 0.20 υ0.1υA N 94 M032743 -1.22 0.06 -1.28 y 42 M022188 0.34 0.19 0.15 N 90 M032699 -2.16 -0.94 -1.225 Y 65 M032228 -0.21 -0.39 0.18 y 96 M032745 2.10 3.27 -1.169 y 32 M022127 1.24 1.06 0.18 Y 2 MOl2002 -1.51 -0.40 -1.104 y 72 M032533 0.06 -0.13 0.18 y 27 M022008 0.62 1.72 -1.096 n 88 M032690 0.17 -0.03 0.20 n 8 MOl2014 -2.16 -1.17 -0.992 y 的 M022189 -1.58 -1.78 0.20 y 36 M022144 0.20 1.12 -0.921 n 的 M032233 1.78 1.55 0.23 y 95 M032744 -0.11 0.81 -0.92 y 19 M0l2038 -1.54 -1.84 0.30 n 10 M012016 -0.25 0.58 -0.827 N 92 M032728 0.25 -0.06 0.31 N 98 M032763 1.90 2.72 -0.818 Y 53 M022251 0.99 0.66 0.33 N 99 M032764 1.82 2.62 -0.802 Y 39 M022154 -0.23 -0.57 0.34 N 84 M032670 -2.48 -1.68 -0.8 n 的 M032271 -0.23 -0.62 0.39 Y 16 M012029 司1.05 -0.26 -0.789 n 40 M022156 0.10 -0.50 0.40 y 62 M032079 0.64 1.42 -0.781 n 93 M032732 -0.12 -0.55 0.43 n 45 M022194 -0.88 -0.12 -0.755 y 60 M032044 0.08 -0.35 0.43 n 47 M022198 -0.26 0.43 -0.693 n 28 M022010 -0.51 -0.98 0.47 y 30 M022016 0.17 0.83 -0.66 n 57 M022261B 1.15 0.65 0.50 y 11 M012017 -0.78 -0.13 -0.648 y 的 M032261 0.54 0.02 0.52 N 12 扎1012025 -1.38 -0.92 -0.462 Y 87 M032689 0.55 -0.03 0.58 N 83 M032652 1.11 1.57 -0.461 y 15 M012028 -0.91 -1.54 0.63 N 4 M012004 -0.12 0.34 -0.46 y 61 M032046 1.94 1.29 0.65 N 的 M032671 -1.49 -1.04 -0.451 y 13 MOl2026 0.36 -0.31 0.70 N 75 M032570 -0.89 -0.44 -0.447 n 25 M022004 -0.04 -0.72 0.68 n 17 M012030 -0.07 0.37 -0.438 y 41 M0221的 0.28 -0.45 0.73 N 7 M0120日 -1.32 -0.88 -0.433 y 6 M012006 -1.07 -1.80 0.73 Y
Margaret Wu Relative Strengths of Western and Asian Students
.
75 • Table 2 (continue) CalibratedItem Difficulties (in IRT logits) for Australia and TaiwanSeq Unique ID AUS TWN Differ- In Differ- In
ence PISA? Seq Unique ID AUS T\\-干4 ence PISA?
3 M012003 -1.15 -0.74 -0.42 n 64 M0322l0 0.71 -0.03 0.74 N 的 M032208 -0.98 -0.58 -0.40 N 20 M012039 -0.17 -0.93 0.76 N 21 M012040 -1.41 -1.01 -0.40 Y 86 M032678 0.43 -0.33 0.76 N 22 M012041 -1.03 -0.66 -0.38 n 24 M022002 1.86 1.04 0.79 N M012001 -0.85 -0.59 -0.26 n 74 M032557 1.66 0.76 0.90 N 44 M022191 -0.81 -0.56 -0.25 y 70 M032447 0.10 1.01 0.91 y 77 M032609 1.11 -0.91 -0.20 n 79 M032643 0.70 -0.25 0.95 N 82 M032649B 1.67 1.87 司0 .20 y 46 M022196 -0.06 -1.02 0.96 N 9 M012015 -0.91 -0.72 -0.19 N 71 M032489 -2.16 -3.13 0.97 y 91 M032727 -0.16 0.02 -0.18 y 52 M022227C 2.14 1.16 0.99 y 56 M022261A -0.19 -0.12 -0.06 y 55 M022253 0.09 -0.92 1.01 N 37 M022146 -0.44 -1.48 1.03 y 59 M032036 0.49 -0.89 1.38 N 89 M032693 1.10 0.06 1.04 N 51 M022227B 1.71 0.19 1.53 y 31 M022021 0.41 -0.64 1.05 y 23 M012042 0.67 -1.01 1.68 N 5 M012005 0.26 -0.88 1.13 N 73 M032545 2.40 0.44 1.97 N 50 M022227A 0.01 -1.18 1.18 y 48 M022199 1.06 -1.04 2.11 N 76 M032588 -0.26 -1.52 1.26 n
contrast,of the 20 items that Australian students found much more difficult (relative to other items) than Taiwan students did,12 items are classified as“N"(“"a叫lmos叫tcertainly not a PISAit記em刊")
Figure 2 shows a抖10叫tp of the average item diffi日ICU叫lltie臼s for Western c∞ountne臼s(AUS,ENG,USA) and average item difficulties for Asian countries (HK
G,
TWN, KOR), arranged in order by the magnitude ofthe difference between the average difficulties. That吟,the items on the left side of the plot are those that Western countries found relatively easier than Asian countries did. The items on the right side ofthe plot are those that Western countries found relatively more difficult. Interesting 旬,the majority of the items on the right side of the plot are items that are deemed unlikely to appear in the PISA test (mostly labeled
“
N"). The numerical values of item difficulties are shown in AppendixA.
In summary,both Table 2 and Figure 2 show that Asian countries have a tendency to perfonn relatively better on TIMSS items that are deemed not appropriate for the PISA test. These items are mostly
“
content rich" items that involve formal mathematics.,獨歹哲明、唱“骨頭告也會
aeiJit
Margaret Wu • Relative Strengths of Western and Asian Students
扭曲唱凶 4 但明。語 JF+ 76
•
HEZ ∞Eω £ gω 且可 -DSEωHZZH 且也且注音咱哥占百三百津巴 ωω 孟晶宮 ZB 眉毛 Eg332EZH 且已〈 內心 K迫切河 hh 卜〈∞ HL 口 -ODEa-己也可丐。 υ zhzzzhzfzzzzhzzh 星 ZZ 區 zzzh 口 Zhhh 回去 zzhEhEZhzhhEEEEzh 口 CEE 口 hhhEEZ 口 hhhzzhEhhphhELFECh 口 hhh 口 CC +一一----OIIl +_.,.+----4 +--<11 +--<11 +-<11It:
+<11 ++4+44 +<II -M.~凡
E--14 -M 4 #..
..
....+ .+ +-+ +-+ +-+ 4一++有喝一一+
+4一-+
<ll--+ + 4一4一一++ 4一一+ 4一←+ 4一一+ 4一一+ 4一一+ 4一一+ 4喝一一一一一一++ 喝一一一+ 喝一一』一一+ 吾土一←+ 4 ++ l斗 oil可
4 + + 守,、 P也 -o AlIn:ly]!Pill剖IP也
守,、
M訂g訂et Wu Relative Strengths of Western and Asian Students
.
77 •Mean Square F Sig.
1.86 1.47 .23
0.03 0.02 .88 10.82 8.56 .00
1.26
Analysis of Variance of Item Difficulty Measures
To formally carry out statistical tests of the effects of item context on the item difficulty measures for the two count可 groups, an analysis of variance (ANOYA) of item difficulties was carried out with respects to two main effects: East/West and in-PISA,and the interaction term. Table 3 shows the ANOYA results.
Table 3 ANOYA Tests ofEffects onItem Difficulties Source Sum of Squares df
InPISA 1.86 EastWest 0.03 InPISA
*
EastWest 10.82E叮or 745.82 590 Total 758.51 594
Note. Dependent variable:Item difficulties
The two main effects of the ANOYA are not statistically significant(p value equals .22 and .88 respectively). This is not surprising. For the main effect,InPISA,the ANOYA shows that the overall item difficulties for items that fit the PISA framework are not greatly different 企om the items not fitting the PISA企amework. For the main effe仗, EastWest, the non-significant result is expected
since the item calibrations were carried out by fixing the average item difficulty at zero for all six countries so that there is no difference in average item difficulty across the East and West groups. However,the interaction term,InPISA by EastWest,is statistically significant (p = .004),showing
that the item difficulties are indeed likely to be different between Asian and Western countries for InPISA items, and for non-InPISA items. The statistically significant interaction term provides a strong support for the finding that there is differential item functioning across the two country groups.
Reasons for Differential Item Functioning
.
78 • Relative Strengths of Western and Asian Students Margaret Wu differential item functioning between Western and Asian countries. While the items in which Asian countries performed relatively well tend to be those with formal mathematics,what types of items are those that Asian countries performed relatively poorly in,and what are the reasons for this poor performance? There appear to be three main reasons for a relatively lower performance of the three Asian countries on some items:I. where explanations are required in the responses;
2. where the English language is not easily translated into Asian languages in the question; 3. where everyday life knowledge and experience can be drawn upon to answer the question. We take a look at some example items as an illustration for the above three points.
The item where there was the largest difference between item difficulties for Australia and Taiwan was item 97 (M032762) in the TIMSS released item set,as shown in Figure 3.
Betty talks for less than 2 hours per month. Which plan would be less expensive for her? Less expensive plan
Explain your answer in terms ofboth the monthly fee and free minutes.
Figure3. TIMSS 2003 released item M032762
This item follows a piece of stimulus material where phone plans with different fee structures are presented. The coding scheme for this item is given in Table 4.
Table 4 Coding Scheme ofItem M032762
Code Response
Correct Response
20 Plan B with explanation that includes free minutes used and explicit reference to lower monthly fee for PlanB.
Partial Response
10 Plan B with explicit reference to lower monthly fee and no reference to free minutes Incorrect Response
70 Plan B with inadequate (only free minutes) or no explanation 71 Plan A with or without explanation
79 Other incorrect Non Response 99 Blank
Margaret Wu Relative Strengths of Western and Asian Students
.
79 • A comparison of the percentages of students in each coding category between the six countries is given in Table 5.Table 5 Percentages ofStudents by Coding Categories ofItem M032762
Code Australia England USA Hong Kong Taiwan Korea
20 44% 45% 37% 28% 27% 40% 唱且團
o
5% 4% 9% 5% 6% 6% 70 30% 33% 32% 56% 48% 42% 71 12% 12% 14% 10% 15% 8% 79 3% 3% 6% 1% 2% 3% 99 6% 5% 2% 1% 3% 2% 20+10+70 79% 82% 78% 89% 81% 88%Codes 20, 10 and 70 all relate to the correct choice of Plan B, but they differ in the extent to which adequate explanations are given.Itis striking that the percentages for Code 70(co汀ectanswer,
but inadequate explanations) differ greatly between Western countries and Asian countries, in that Asian countries have higher percentages for Code 70 as compared to Western countries. In the case of Hong Kong and Taiwan,the percentages correct (Code 20) are much lower than for the Western countries. If we ignore the explanation part of the answer and just focus on whether students selected the correct plan (Plan B),we can look at the sum ofthe percentages for Codes 20,10 and 70. In fact,
the three Asian countries obtained comparable, if not higher, percentages for choosing the correct Plan. Where Asian students fell down was mostly in providing adequate explanations for the correct answer. Recently I was involved in a project in Macao where teachers constructed PISA-like mathematics items and administered them to the students. A number ofreports 企om the teachers included comments like
“
students could not be bothered to read long worded questions or to provide extended responses for explaining their answers." This would seem to support the results shown in Table 5.The second type of items where Asian students performed relatively less well is those where the English language is not readily translatable into Asian languages.Item 33 (M022135) in the TIMSS released item set is such an example. This item is shown in Figure 4.
Table 6 shows the percentage of students choosing each multiple choice option of the question,
Margaret Wu
A beakerof\v犯 terwhich 113前 re凡cbedboiling point i兵几!lowed t府 coo!.
The temper'ature ofthe water is recorded at five minutL'interv,扎扎扎ncl 泣
temperatuI'• time gra ph isell'川、ln.
• Relative Strengths of Western and Asian Students
Cooling Curve 100 目。 60 4 日 2 日 25 。 Time (minutes)
About how manv minutl's did ittake 心 rthe watpl'I 仆(肘。 I tlw flrs! 20 degrees'! H 50 (υ0) 的」三伺 LU 且 Eω ← :37
@@@@
80•
TIMSS 2003 releaseditem 恥1022135
Figure 4.
Percentage ofStudents in Each Response Category ofItem M022135 Table 6 Korea Taiwan Hong Kong USA England Australia MC Option 11% 45% 26% 52% 55% 59% A 4% 6% 6% 8% 14% 12% B 6% 9% 8% 11% 12% 6% C 79% 40% 60% 27% 19% 21% D
From Table 6,it can be seen that about twice as many students in Western countries chose the correct option A than option D. In contrast,for Asian countries,a large proportion of students chose the incorrect option D compared to optionA.
Margaret Wu Relative Strengths of Western and Asian Students
.
81 • degrees" instead of "to cool the 戶rst 20 degrees." This is most likely because of language conventions,where the phrase“the戶rst20 degrees" is not a commonly used structure for the Asian languages. In Chinese,it would typically be said as "to cool 20 degrees" (i.e.,there is no explicit words“
the first"). In Hong Kong,the question has been translated as“
From the start till cooled the initial 20 degrees." (從開始至冷卻了最初的二十度)1 While this is a correct literal translation,it is not a sentence structure that people would normally use in speech. The word“
initial" in this context sounds out of place in Chinese. This word can also mean“
original," or“
the very beginning." The word "till" may lead to an interpretation of“
to cool to 20 degrees." Simply put,in Chinese,there is no suitable literal translation of“
to cool the戶rst20 degrees." One would need to drop the word“
the first" to make it sound like natural speech, but the translator clearly thought that this might disadvantage the students. By adding the word "initial," while more“
correct" in matching the English version,it made the sentence sound unnatural and foreign to Chinese speakers. The result was that students were still confused. In general,such differences in language usage are likely the cause for differential item functioning. This is not strictly a translation verification issue,because the translation may be literally correct,yet the sentence structure cannot be the same across languages,so the propensity to misunderstand the question will vary between language groups. The discrimination index of .18 for Hong Kong and .41 for Australia further supports the conjecture that the question was confusing to Hong Kong students.
The third type of items in which students from the three Asian countries performed relatively less well is related to everyday life experience and skills,such as estimation,and the understanding of measuring units. These problems generally do not require high levels of mathematical skills to solve. For this type of items,students from Western countries have particular strengths as compared to students from Asian countries. Three such items (M032699,M022005,and M032647) are shown in Figure 5 to Figure 7. Appendix gives the differences in average item difficulties for East and West countnes.
We will takeItem M032647 (Figure 7) as an example and compare the item statistics in more detail. This is a real-life application of mathematics. The item statistics for the six countries are shown in Table 7. The three western countries performed relatively better on this item than the three Asian countries did (see item delta logits). One interesting observation about this item is that there is evidence that some higher ability students chose the incorrect Option D (labeled 4 in the item analysis). For all six countries,the average ability (column headed mean Ab in Table 7) for Option
可 Inthe Taiwanversi冊, the translation is“從一開始最初冷卻了二十度" This translation is a little less confusing than the Hong Kong version,but it is still problematic.
• 82 • Relative Strengths of Western and Asian Students Margaret Wu
Which of these units would usually be used for an area the size of a soccer field?
@s叫qu叫a
@c叫u昀1巾圳b愉IC C仰削cen削en叫1札t叫lme叫M叫正e亨rs
©s叫qu盯e me叫t。叫叭rs.鳴S
@c山ic meters
Figure5. TIMSS 2003 released item M032699
The number of 250 milliliterb叫tIes that can be filled from 400 liters of waterIS
®
16®
160。 1600
。 1600。
Figure6. TIMSS 2003 released item M022005
Orange后 are packed in boxes. The average diameter of the oranges is 6 em,
扎nd the hoxes are GOem long、:怖 em wide. and 24 em deep.
叭Thichof these is the BEST approximation of the number of oranges that can be packed in 仇 box'?
@法。
®
240。圳()
。 1920
Figure 7. TIMSS 2003 released item M032647
D is somewhat closer to the average ability for the correction answer (Option B,labeled 2). In the case of Korea,the average ability for Option D is even higher than the average ability for Option B.
Margaret Wu Relative Strengths of West巴rn and Asian Students
.
83 • Table 7 Item Statistics forItem M032647 for Western and Asian CountriesAustralia item: 80 (M032647)
Cases for this item 781 Discrimination 0.26
HongKong item: 80 (M032647)
Cases for this item 819 Discrimination 0.19
Item Delta(s): -0.43 Item Delta(s): 1.13
Label Scor巴 Count % of tot Pt Sis meanAb Label Score Count % of tot Pt Sis meanAb
'-7 缸勻 JA 且可 /OnVJ 0.00 1.00 0.0。 0.00 0.00 0.0。 127 447 143 42 4 18 16.26 -0.18 -0.54 57.23 0.26 0.26 18.31 -0.13 -0.42 5.38 0.01 -0.11 0.51 -0.08 -0.66 2.30 -0.06 -0.54 '17 缸勻 JA 且可 /O 司/nVJ 0.00 43 1.00 421 0.00 201 0.00 148 0.00 1 0.00 1 0.00 4 5.25 -0.06 51.40 0.19 24.54 -0.17 18.07 -0.02 0.12 -0.02 0.12 0.00 0.49 -0.07 0.71 1.52 0.77 0.94 0.30 0.39 0.24 England it巴m:80 (M032647)
Cases for this item 446 Discrimination 0.27
Taiwan
it巴m:80 (M032647)
Cases for thisit巴m 902 Discrimination 0.32
Item Delta(s): -0.33 Item Delta(s): 0.92
Label Score Count % of tot Pt Sis meanAb Label Score Count % of tot Pt Sis meanAb l23469 0.00 74 16.59 輛0.15 -0.40 1.00 248 55.61 0.27 0.31 0.00 85 19.06 -0.17 -0.48 0.00 24 5.38 0.06 0.10 0.00 8 1.79 -0.14 -1.41 0.00 7 1.57 -0.09 -0.55 USA '-7 缸勻 JA 且可 /OnVJ 0.00 56 6.21 -0.10 0.17 1.00 496 54.99 0.32 1.79 0.00 208 23.06 -0.23 0.35 0.00 135 14.97 -0.09 0.73 0.00 5 0.55 0.03 1.58 0.00 2 0.22 -0.04 0.49 Kor巴a item: 80(M032647) 的m: 80 (M032647)
Cases for this item 1498 Discrimination 0.17 Cases for thisit巴m 891 Discrimination 0.18
Item Delta(s): 0.01 Item Delta(s): 0.96
Label Score Count % oftot Pt Sis meanAb Label Score Count % oftot Pt Sis meanAb
0.00 282 18.83 -0.11 -0.34 0.00 86 9.65 -0.27 0.11 2 1.00 741 49.47 0.17 0.29 2 1.00 493 55.33 0.18 1.51 3 0.00 354 23.63 -0.12 -0.32 3 0.00 204 22.90 -0.14 0.71 4 0.00 102 6.81 0.08 0.19 4 0.00 101 11.34 0.16 1.60 6 0.00 5 0.33 -0.08 -2.35 6 0.00 0.11 -0.06 -0.77 7 0.00 0.07 0.02 0.91 9 0.00 6 0.67 0.02 1.39 9 0.00 13 0.87 -0.04 -0.43
To obtain the solution to this item,the mathematical skills required are not very high. It involves simple division and multiplication, where a quick estimation, rather than complex mathematical
• 84 • Relative Strengths of Western and Asian Students Margaret Wu
calculations of volume, can be used to obtain the answer. To get the incorrect answer of 1,920 (Option D),the likely error is that the radius of the orange is used instead of the diameter. A source of confusion between radius and diameter is likely if the computation of volume of shapes is carried out. It is baffling why some high-ability students made this mistake. Nevertheless, more students from Asian countries made this mistake. Further,the answer of 1,920 is almost nonsensical,because that is an enormous number of oranges to be placed in one box. From the item statistics one may conjecture that more higher ability students in Asian countries than in Western countries tried to work through a mathematical solution to this question and paid no attention to sense making of the answer they obtained. This item again illustrates the contrast between a practical approach and a theoretical approach to solving mathematical problems by Asian and Western students. Because PISA contains more applied questions,Western students may have an advantage doing the PISA test,
provided that the questions can be solved using simple mathematics and a common sense approach. Many other items that show differential item functioning in relation to Western and Asian countries also reveal interesting observations about students in these countries. Unfortunately,owing to the limited space for this paper, we are not able to provide more examples here. Appendix A shows the list of items with respect to differential item functioning between East and West. Further explorations of item performance can be carried out following the order of items in this list.
Discussions and Conclusions
A review of TIMSS 2003 released items showed that almost half the items were not deemed likely to appear in the PISA test,owing to the lack of application contexts for the items. These items are typically content-rich items that involve formal mathematics. An analysis of relative item difficulties showed that three Asian countries performed relatively better in these formal mathematics items than three Western countries did. This may be a contributing factor toward the observation that Western countries generally performed relatively better in PISA than in TIMSS,
because TIMSS contains more context-free items involving formal mathematics.
Further,three interesting observations are made after a closer examination of differential item functioning on items where Western countries performed relatively better. First,students from the three Asian countries have more difficulties in providing explanations to how they arrived at the answer, even when they found the correct answer. Second, some cultural and linguistic issues are identified. Because the source text is in English, some sentence structure and vocabulary used in English are not readily translatable into other languages. The consequence is that students in Asian
Margaret Wu Relative Strengths of Western and Asian Students
.
85 • countries get confused with unusual phrases and terms,and misunderstand some of the questions. Third, students from Western countries appear to perform relatively better on everyday real-life context mathematics items where students bring their knowledge and experience from outside the classroom. In contrast, students from Asian countries tend to rely on knowledge gained in the mathematics class. Consequently,because most items in PISA are word problems in an everyday life context,it is not surprising that Western countries performed relatively better in PISA than they did in TIMSS. The observation that many students, particularly in Asian countries, disconnect mathematics problems from everyday life, sends an important message to mathematics educators about the importance of linking mathematics to the real world. This message has been actively promoted by PISA. However, a caution is needed. An almost exclusive emphasis on real-life mathematics,particularly at the 15-year-old level,willlikely restrict mathematics assessment to a set of items with lower mathematical content,and thus lead to an assessment that does not reflect all the mathematics topics taught in schools (that may be forfutureuse by the students).More generally, the findings 企om this paper not only help us identify the reasons for differential performance of countries in PISA and TIMSS,they also throw some light on important issues in test construction, particularly in the context of international studies. The presence of differential item functioning gives an interesting insight into the mathematical thinking of students and cultural/linguistic characteristics in different countries,yet it threatens the validity of the studies at the same time. Steps must be taken to ensure a fair assessment,both for the countries involved and for mathematics education.
Acknowledgement
I am grateful to Professor Frederick Leung at the University of Hong Kong,and Dr. Andrew Jen at National Taiwan Normal University, for providing the Chinese version of the TIMSS released items.
Margaret Wu • Relative Strengths of Western and Asian Students
86
•
References
American Institutes for Research. (2005). Reassessing
u.s.
international mathematics performance:New findingsfrom the 2003 TIMSS and PISA. Washington,DC: Author.
先秀之苦憲有位宿營滑過嚐鍵是割草
SB
悔自喝苦頭難過程越躇剪緝羽讓海弱可通過蠻的護海當愚
a
寫哥還苟治灣獨鑄封過擇單純當措
Gronmo, L., & Ols凹, R. (2008). TIMSS叩門us PISA: The case ofpure and applied mathematics.
Retrieved February 24,2008,from http://www.timss.no/publications/IRC2006_Gronmo&0Isen. pdf
Hutchison, G.,& Schagen, I.(2007). Comparisons between PISA and TIMSS - Are we the man with two watches? In T. Loveless (Ed.),Lessons learned - What international assessments tell us
about math achievement (pp. 227-261). Washington,DC: The Brookings Institution.
International Association for the Evaluation of Educational Achievement.(2003). TIMSS assessment
frameworks and specifications,2003. Chestnut Hill,MA:TlMSS International Study Centre.
National Center for Education Statistics. (2008). Comparing NAEP, TIM:.蹈, 。nd PISA In 1, 2008, from http://nces.ed.gov/timss/pdf/naep一
mathematics and science. Retrieved May
timss---'pisa_comp.pdf
Neidorf,T.鼠, Binkley,M.,Gattis, K., & Nohara,D. (2006). Comparing mathematics content in the
National Assessment of Educational Progress (NAE門, Trends in International Mathematics
and Science Study (TIMS,旬, and Program for International Student Assessment (PISA) 2003
assessments (NCES 2006-029). Washington,DC: National Center for Education Statistics,U.S.
Department of Education.
Organisation for Economic Co-operation and Development. (2003). The PISA 2003 assessment
也哥華曾嘻嘻罪惡違憲哼
framework - Mathematics, reading, science and problem solving knowledge and skills. Paris:
Author.
Organisation for Economic Co-operation and Development. (2009). Learning mathematics forlife 一
Aperspectivefrom PISA. Paris: Author.
Wu, M. (2009). A comparison of PISA and TIMSS 2003 achievement results in mathematics.
Margaret Wu Relative Strengths of Western and Asian Students + 87 +
Appendix: Average
It
em Difficulties (in IRT logits) for Western
and Asian Countries
Differ- In
Seq UniquelD West Differ- h
Seq Unique ID West East -~~~-~ PISA? East
PISA? ence 33 扎1022135 -0.28 2.56 -2.84 y 60 M032044 -0.09 0.07 -0.16 n 97 M032762 0.12 1.88 -1.76 Y 21 M012040 -1.47 -1.33 -0.14 Y 38 M022148 -1.05 0.56 -1.61 y 66 M032233 1.50 1.58 -0.08 y 90 M032699 -1.84 -0.58 -1.26 Y 76 M032588 -0.87 -0.81 -0.06 n 80 M032647 -0.25 1.01 -1.26 y 91 M032727 Aυ 00υ 0.03 -0.03 y 30 M022016 Aυ II 1.04 -1.15 n 39 M022154 -0.34 -0.31 -0.03 N 58 M02226IC 2.40 3.50 -1.11 y 22 MOl2041 -1.20 -1.17 -0.03 n 96 M032745 1.99 3.04 -1.05 y 56 M02226IA -0.14 -0.15 Aυ00 y 26 扎1022005 0.55 1.54 -0.99 y 92 M032728 0.13 0.12 Aυ Aυl N 27 M022008 0.58 1.57 -0.99 n 32 M022127 1.47 1.43 0.04 Y 94 M032743 -1.25 -0.31 -0.94 y 29 M022012 0.34 0.30 0.04 n 18 MOl2037 -0.54 0.39 -0.93 Y 3 MOl 泊的 -1.18 -1.25 0.07 n 2 MOl2002 -1.43 -0.54 -0.89 y 72 M032533 -0.12 -0.19 0.07 y 34 M022139 0.43 1.29 -0.86 n 40 M022156 -0.07 且 0 .1 6 0.09 y 98 M032763 1.85 2.63 -0.78 Y 93 M032732 Aυ 00υ 0.11 0.10 n 36 M022144 -0.05 0.68 -0.73 n 44 M022191 -0.61 -0.74 0.14 y
95 M032744 Aυ Aυl 0.67 -0.66 y 9 MOl2015 -0.79 -0.93 0.14 N
4 MOl2004 0.03 0.67 -0.64 y MOl2001 -0.74 -0.90 0.16 n 99 M032764 1.76 2.39 -0.63 Y 28 M022010 -0.61 -0.77 0.16 y 45 M022194 -0.63 -0.08 -0.55 y 78 M032612 0.69 0.51 0.18 N 54 M022252 -1.51 -0.96 -0.54 y 的 M032271 -0.29 -0.47 0.18 Y 82 M032649B 1.46 2.00 -0.54 y 65 M032228 -0.31 -0.51 0.20 y 84 M032670 -2.22 -1.71 -0.51 n 87 M032689 0.53 0.27 0.26 N 16 MOl2029 -0.95 -0.45 -0.50 n 69 M032403 -0.19 -0.49 0.30 N 的 M032652 0.95 1.45 -0.50 y 53 M022251 1.17 0.86 0.30 N 62 M032079 0.76 1.20 -0.43 n 19 MOl2038 -1.53 -1.86 0.33 n 10 MOl2016 -0.22 0.21 -0.43 N 71 M032489 -2.04 -2.45 0.41 y 81 M032649A 0.28 0.71 -0.43 y 57 M022261B 1.20 0.78 0.42 y 的 M032671 -1.49 -1.06 -0.43 y 70 M032447 -0.07 -0.52 0.45 y 8 MOl2014 -1.97 -1.55 -0.43 y 64 M032210 0.61 0.13 0.48 N
• 88 • Relative Strengths of Western and Asian Students Margaret Wu
Appendix: (continue) Average
It
em Difficulties (in IRT logits)
for Western and Asian Countries
Differ- In
Seq Uniqueill West East Differ- h
Seq Unique ID West East ~~~~-~ PISA? ence PISA?
77 M032609 -1.27 -0.90 -0.37 n 25 M022004 -0.06 -0.55 0.50 n 的 M032208 -0.84 -0.49 -0.36 N 7 M0120 日 -0.61 -1.13 0.53 y 47 M022198 0.03 0.35 -0.31 n 61 M032046 1.61 1.07 0.54 N 75 M032570 -1.09 -0.78 -0.31 n 的 M032261 0.28 -0.29 0.57 N 11 M012017 -0.44 -0.13 -0.31 y 49 M022202 1.40 0.79 0.60 N 的 M022189 -1.76 -1.46 -0.29 y 35 M022142 0.30 -0.36 0.66 N 17 M012030 0.20 0.49 -0.29 y 6 M012006 -0.95 且1. 62 0.67 Y 88 M032690 0.35 0.60 回0.26 n 24 M022002 1.82 1.10 0.72 N 42 M022188 0.29 0.54 -0.25 N 46 M022196 -0.33 -1.07 0.75 N 12 M012025 -1.57 -1.34 -0.23 Y 的 M012028 -0.83 -1.59 0.76 N 14 M012027 -0.77 -0.55 -0.22 n 86 M032678 0.38 -0.39 0.77 N 37 M022146 -0.49 -1.27 0.78 y 31 M022021 0.51 -0.55 1.06 y 79 M032643 0.61 -0.17 0.78 N 5 M012005 0.18 -0.89 1.07 N 20 M012039 -0.02 -0.92 0.90 N 50 M022227A -0.08 -1.19 1.10 y 52 M022227C 2.13 1.16 0.97 y 23 M012042 0.11 -1.01 1.13 N 55 M022253 -0.02 -1.01 0.99 N 41 M0221的 0.30 -0.83 1.13 N 59 M032036 0.48 -0.52 1.01 N 73 M032545 1.90 0.46 1.45 N 74 M032557 1.74 0.72 1.02 N 51 M022227B 1.54 0.03 1.51 y 89 M032693 1.22 0.20 1.02 N 48 M022199 0.96 -0.56 1.52 N 13 M012026 0.41 -0.64 1.05 N
Margaret Wu Relative Strengths of Western and Asian Students
.
89 • 教育科學研究期刊 第五十六卷第一期 2011 年, 56 ( I ),
67-89透過 PISA 與 TIMSS 評比研究檢視西方與
亞洲學生數學的相對強項
張立氏 澳洲墨商本大學 吉干 1;1;耳丹究中,心 副教授 摘要Wu (2009) 的研究曾將 2003 年「國際學生能力評量計畫J (Programme for International
Student Assessment, PISA) 的數學表現,與「國際數學與科學教育成就趨勢調查 J (Trends in
International Mathematics and Science Study,TIMSS) 八年級學生的數學表現做一比較,結果發
現,西方國家在 PISA 的表現大致上比在 TIMSS 的表現為佳。本研究則將 TIMSS 公開的題目 分為兩組,一組與 PISA 的架構相符,另一組則不相符。其中, TIMSS 評比有很多幾何與代數 的題目是屬於「純粹的數學」題(即末以真實生活情境作為背景的數學題) ,而這些題目並未 出現在 PISA 評比之中。本研究檢視六個國家在這兩組題目表現上的差異,藉此反映出西方與 亞洲國家的相對強項與弱點,接著再將這些強項與弱點連結到 PISA 與 TIMSS 評比的內容上。 有證據顯示,西方與亞洲國家在 PISA 與 TIMSS 的差異表現可歸因於兩評比題目種類的不同。 關鍵字:國際研究、數學、國際學生能力評量計畫、國際數學與科學教育成就趨勢調查 通訊作者 張立民, E-mail: m.wu@unimelb.edu.au 收稿日期 : 2010109/30 ; 修正日期: 2010112113 、 2011102/25 ; 接受日期 2011103/100