• 沒有找到結果。

Using PISA and TIMSS Mathematics Assessments to Identify the Relative Strengths of Students in Western and Asian Countries

N/A
N/A
Protected

Academic year: 2021

Share "Using PISA and TIMSS Mathematics Assessments to Identify the Relative Strengths of Students in Western and Asian Countries"

Copied!
23
0
0

加載中.... (立即查看全文)

全文

(1)

Margaret Wu Relative Strengths of Western and Asian Students+ 67 +

Journal of Research in Education Sciences 2011,56(1), 67-89

Using PISA and TIMSS Mathematics

Assessments to Identify the Relative

Strengths of Students in Western and Asian

Countries

Margaret Wu

Assessment ResearchCentr巳

University of Melbourne,Australia

Associate Professor

Abstract

A study was carried out that compared Programme for International Student Assessment (PISA)

20的 Mathematicsresults with Trends in International Mathematics and Science Study (TIMSS) 2003 Grade 8 Mathematics results (Wu, 2009). It was found that Western countries generally performed better in PISA than in TIMSS, and Eastern European and Asian countries generally performed better in TIMSS than in PISA. In this paper,TIMSS released items are divided into two sets: one that fits the PISA framework and one that does not. In particular, many geometry and algebra items in TIMSS are

inner mathematics" (mathematics without a real-life context),and such items do not appear in the PISA test. Differential performances of six countries on each set of items are examined. In this way,the relative strengths and weaknesses of Western and Asian countries are identified. These strengths and weaknesses are then linked back to the contents of the PISA and TIMSS tests. There is strong evidence that differential performance between Western and Asian countries in PISA and TIMSS can be directly attributed to the types of items in the respective tests.

Keywords: international study,Mathematics,PISA,TIMSS

(2)

68

• Relative Strengths of Western and Asian Students

Introduction

Margaret Wu

A study was carried out that compared Programme for International Student Assessment (PISA) 2003 mathematics results with Trends in International Mathematics and Science Study (TIMSS) 2003 grade 8 Mathematics results,using count可 meanscores for 22 participating countries of both studies (Wu,2009). Itwas found that Western countries generally performed better in PISA than in TIMSS,and Eastern European and Asian countries generally performed better in TIMSS than in PISA. Furthermore,two factors, content balance and years of schooling,accounted for most of the vanatlOn III PISA count可 mean scores after controlling for TIMSS count可 mean score Consequently,the rankings of countries in the two studies can be reconciled to a reasonable degree of accuracy. Further, the Organisation for Economic Co-operation and Development (OECD) publication, Learning mathematics for life (OECD, 2009, chap. 4), showed that the relative performances of countries by traditional mathematics content areas (Algebra, Data, Geometry,

Measurement and Number) differ significantly, and that countries are grouped according to their similarities in students' responses to the items. Gronmo and Olsen (2008) also showed that the compositions of the PISA and TIMSS tests differ considerably in terms of mathematics content and item format. They concluded that content differences between PISA and TIMSS contributed to the observed differences in the rankings of countries in PISA and TIMSS.

The fact that content balance has a significant effect on count可 performance suggests that students in different countries have particular relative strengths and weaknesses. If these specific strengths and weaknesses are identified beyond the level of broad content categories, mathematics educators in eachcount可 canbe informed of the specific skills students have or lack. This will also provide a further insight into what PISA and TIMSS are each assessing,beyond the usual rhetoric about curriculum-based and non-curriculum-based focus.

Thisp叩erattempts to examine item level skills in order to identi 命 strengths and weaknesses,

thus providing specific instructional feedback to mathematics educators as well as test designers. The item analysis is based on data from six countries: Australia,England,United States,Hong Kong,

Taiwan and Korea. The results for the three Western countries show a similar pattern, as do the results for the three Asian countries. There is strong evidence that cultural traditions of learning mathematics have an impact on students' performances on different types oftest items.

(3)

Margaret Wu Relative Strengths of Western and Asian Students

.

69 •

PISA and TIMSS Mathematics Frameworks and Tests

Comparisons of PISA and TIMSS mathematics frameworks can be found in a number of publications (American Institutes for Research,2005; Hutchison & Schagen,2007; National Center for Education Statistics,2008; Neidorf,Binkley,Gattis,&Noha凹,2006). These comparisons tend to produce a descriptive list of similarities and differences between the two published frameworks,such as the classifications of the content domains and the cognitive domains. However, few published comparisons critically examined the differences. For example,would one framework lead to a test that is completely different from a test based on the other framework? Is one framework a subset of the other? If so,what is missing in that framework? Are the two frameworks essentially the same,

other than nomenclature of the classi日cations?This paper looks at these issues and,in doing so,it is hoped that a better understanding is gained in relation to key differences between PISA and TIMSS at the level of the tests and items,not just at the level of broad aims and orientations,as reflected in the frameworks.

At a first glance,most would conclude that both PISA and TIMSS mathematics frameworks are comprehensive. It does not appear that one framework is necessarily a subset of the other, or that something is glaringly missing from either framework. However,the PISA mathematics framework suggests its more inclusive approach, with the following line at the beginning of the framework document:

Rather than being limited to the curriculum content students have learned, the assessments

focus on determining

if

students can use what they have learned in thesituati仰s theyare

likely to encounter in their daily lives. (DEeD,2003,p. 24)

The word

limited" suggests that PISA is attempting to be more inclusive in terms of coverage of the mathematics domain. It also suggests that the school curriculum, whether intended,

implemented,or attained,does not focus on whether students can use what they have learned. Does this mean that TIMSS,being more curriculum based,does not assess whether students can use what they have learned? The TIMSS mathematics framework states the following:

While the assessment of abilities such as solving nonroutine problems and reasoning with

(4)

.

70 • Relative Strengths of Western and Asian Students Margaret Wu

that 戶rm the initial base 戶rthe development and implementation of these skills will also

be assessed. Problem solving and communications are key outcomes of mathematics

education that are associated with many of the topics in the content domains. They are

regarded as valid behaviors to be elicited by test items in most topic a陀的 (International

Association 戶rthe Evaluation ofEducational Achievement,2003,p. I I)

Itwould appear that TIMSS does not preclude problem solving in the framework,but there may be less emphasis on this than in PISA,where the mathematics framework is almost entirely built on

theapplication of mathematics competencies that may reasonably be expected of students. In short,

one may expect a different balance in the two tests in tenns of problem solving items and fact/procedural items,but one might not necessarily expect a complete absence of one type of items.

However,a close examination of the test items in the two surveys reveals that there is one class of items in TIMSS that does not appear in PISA. One might label this class of items as

content rich,"“formal mathematics,"or“naked mathematics." Some of these items are illustrated in Figure

ItemA In squareEFGH.which ofth的心isFALSE"

tn-d > mt-eu-e mmwmm mm 叫啥 叫叫 CH C 了 。 C YHH( 1 、lfta YHm 叫叩 L 后 UMG A且 fl 川心血

叫州州叫

‘ FIE-也 wr 吼叫 nM EGEVL A且 AdAdAd

@@@@

E F Item B

Ifnis a negativeinte早日民 whichof these is theI肘'gestnumber'?

®

3+n

®

i3 xn

©

3-n

@

,

1+n

Figure I. Examples ofTIMSS items not appropriate for the PISA test

(5)

Margaret Wu Relative Strengths of Western and Asian Students

.

71 • of competency list number 8: Using symbolic, formal, and technical language and operations (OECD,2003,p. 41). While these items could fit the PISA framework at a theoretical level,they are not tasks that "(students) are likely to encounter in their daily lives (outside the classroom context)." These may be items that an adult working in a scientific field may encounter in his or her daily life,

but not 15-year-old students studying at school. Consequently,items focusing on technical language and formal mathematics are typically not in the PISA test. This is not because the PISA framework does not cover such competencies, but because the operationalisation of the PISA framework as applied to I5-year-old students precludes these kinds of items. So, at one level one could say the PISA framework is comprehensive,but at another level,typically at the level of the construction of test items,a set oftypical school mathematics items are excluded.

In contrast,when PISA items are examined against the TlMSS framework,no group of items appears to be misfitting theTlMSS framework to the extent that such items could not be included in the TlMSS test. However,there are more PISA items thanTlMSS items where,within an item,the competencies tested cover multiple content domains.

Because formal mathematics items are not included in the PISA test,one may ask if,and how,

count可 resultsmay be affected,particularly in the context of differential performance of countries in PISA and TlMSS. This paper attempts to answer this question, by examining the differential performance of countries on the set ofTlMSS items that do not fit with the PISA test.

Methodology

Because there is a large pool of released items from TlMSS 2003 Grade 8 assessment (99 items in total),we examined these items and divided them into two sets: one that fits the PISA test and one that does not. In particular,many geometry and algebra items in TlMSS are

inner mathematics" (mathematics without a real-life context),and such items do not appear in the PISA test. Owing to the amount of the work involved in rescaling the item responses for all countries,we selected six countries in particular to examine their differential performances on the two sets of items. The six countries were Australia,England,and the United States (Western countries,abbreviated as West); and Hong Kong,Taiwan,and Korea (Asian countries,abbreviated as East). Item response modeling (Rasch model) was used to calibrate the item difficulties. The item difficulties for the 99 released items were estimated for each country,with the average difficulty for each country centred at zero. These item difficulties were compared across the six countries. If there was no differential item functioning across the six countries,then an item that was relatively more difficult than another item

(6)

.

72 • Relative Strengths of Western and Asian Students Margaret Wu for one country should be relatively more difficult in another country. On the other hand,ifthere was differential item functioning,then one item may be found to be relatively easier in one count可, but

relatively difficult in another country. We carried out the comparisons of item difficulties particularly with respect to items not fitting the PISA test, and with respect to Western countries and Asian countnes.

Results

Of the 99 released items,42 were deemed not fitting the PISA test. That is,around half of the TIMSS items are not likely to appear in the PISA test. This proportion is surprisingly high.Itcould mean that a large part of mathematics taught in schools is not included in the PISA test. Table I shows the classifications (made by the author) so that other researchers can cross-check if desired. An

'n" in the column headed

In PISA?" means that the TIMSS item is not likely to be in the PISA test, and an

N" means that the item is almost certainly not a PISA item. A

TIMSS item could be a凹ISA i此tem,P and a

Y" means that the item is almost certainly a PISA item. This classification was made before any item calibration was carried out, to ensure that the classification process was independent of any knowledge of the relative item difficulties for each

count可﹒

The second step in our analysis was to calibrate the items to obtain their difficulty measures for each country,using item response modeling (Rasch model). The average item difficulty for each country was set at zero,so that the calibrated item difficulty for each item was a measure relative to the average item difficulty in that country. In this way, item difficulties across countries can be compared, even if the abilities of students vary across countries. That 函, the calibrated item

difficulties are relative item difficulties for each count可; they are not absolute difficulties such as percentages correct. Table 2 shows the calibrated item difficulties for Australia and Taiwan,arranged in order of the difference in item difficulties. The items at the top left part of Table 2 are those that

Aus仕alian students found relatively easier (as compared to other items) than Taiwan students did. The items at the bottom right part of Table 2 are those that Australian students found relatively more difficult than Taiwan students did.Itis worth noting that,as one scans down from the top left part to the bottom right part of Table 2,the number of

n" and

N" in the column

In PISA?" increases. In fact,of the first 20 items that Australian students found much easier (relative to other items) than Taiwan students did,only one item has a“N" classification(“almost certainly not a PISA item"). In

(7)

Margaret Wu Relative Strengths of Western and Asian Students

.

73 • Table 1 Classification of TIMSS 2003 Released Items into Categories of Appropriateness in the

PISA Test

Item Seq Unique ID In PISA? Item Seq Unique ID In PISA? Item Seq Unique ID In PISA?

234567890l2345678901234567890123 lIll11All--22222222223333 M012001 M012002 M012003 M012004 MOl泊的 MOl2006 M012013 MOl2014 M012015 MOl2016 MOl2017 MOl2025 MOl2026 M012027 M012028 M012029 M012030 M012037 M012038 M012039 扎1012040 MOl2041 MOl2042 M022002 M022004 M022005 M022008 M022010 M022012 M022016 M022021 M022127 M022135 n y n YNYYYNNYYN n N n y y n N Y n N N n y n y n n yyy 34 M022139 n 35 M022142 36 M022144 37 品1022146 38 品1022148 39 品1022154 40 M022156 41 品1022185 42 M022188 43 M022189 44 M022191 45 M022194 46 M022196 47 M022198 48 M022199 49 M022202 50 M022227A 51 M022227B 52 M022227C 53 M022251 54 M022252 55 M022253 56 M02226IA 57 M022261B 58 M022261C 59 M032036 60 M032044 61 M032046 62 M032079 的 M032208 64 M032210 的 M032228 66 M032233 N yyNYNNYYYN NNYYYNYNYYYN n N n NNyy n 的 M032261 68 M032271 69 M032403 70 M032447 71 M032489 72 M032533 73 M032545 74 M032557 75 M032570 76 M032588 77 M032609 78 M032612 79 M032643 80 M032647 81 M032649A 82 M032649B 83 M032652 84 M032670 85 品1032671 86 品1032678 87 M032689 88 M03269。 89 M032693 90 M032699 91 M032727 92 M032728 93 M032732 94 M032743 95 M032744 96 M032745 97 M032762 98 M032763 99 M032764 n NYNYYYNN n n n NNyyyy n yNN n NYYN n yyyYYY

(8)

• 74 • Relative Strengths of Western and Asian Students Margaret Wu

Table 2 CalibratedItem Difficulties (in IRT logits) for Australia and Taiwan

Seq Unique ID AUS

T\\小~

Diffi 冒且圓n Seq Unique1D AUS T\\小~ Differ- ~.~n

ence PISA? ence PISA?

97 M032762 0.06 2.17 -2.108 Y 14 MOl2027 -0.68 -0.65 -0.02 n 58 M022261C 2.25 4.28 -2.036 y 69 弘1032403 -0.17 -0.18 0.01 N 38 M022148 -1.22 0.59 -1.811 y 54 M022252 -1.42 -1.44 0.01 y 33 M022135 -0.42 1.36 -1.774 y 29 弘1022012 0.36 0.31 0.05 n 18 MOl2037 -0.39 1.23 -1.613 Y 35 M022142 0.13 0.07 0.06 N 34 M022139 0.29 1.70 -1.402 n 49 弘1022202 1.30 1.21 0.09 N 26 M022005 0.17 1.57 -1.399 y 81 M032649A 0.73 0.64 0.09 y 80 M032647 -0.44 0.92 -1.362 y 78 M032612 0.30 0.20 υ0.1υA N 94 M032743 -1.22 0.06 -1.28 y 42 M022188 0.34 0.19 0.15 N 90 M032699 -2.16 -0.94 -1.225 Y 65 M032228 -0.21 -0.39 0.18 y 96 M032745 2.10 3.27 -1.169 y 32 M022127 1.24 1.06 0.18 Y 2 MOl2002 -1.51 -0.40 -1.104 y 72 M032533 0.06 -0.13 0.18 y 27 M022008 0.62 1.72 -1.096 n 88 M032690 0.17 -0.03 0.20 n 8 MOl2014 -2.16 -1.17 -0.992 y 的 M022189 -1.58 -1.78 0.20 y 36 M022144 0.20 1.12 -0.921 n 的 M032233 1.78 1.55 0.23 y 95 M032744 -0.11 0.81 -0.92 y 19 M0l2038 -1.54 -1.84 0.30 n 10 M012016 -0.25 0.58 -0.827 N 92 M032728 0.25 -0.06 0.31 N 98 M032763 1.90 2.72 -0.818 Y 53 M022251 0.99 0.66 0.33 N 99 M032764 1.82 2.62 -0.802 Y 39 M022154 -0.23 -0.57 0.34 N 84 M032670 -2.48 -1.68 -0.8 n 的 M032271 -0.23 -0.62 0.39 Y 16 M012029 司1.05 -0.26 -0.789 n 40 M022156 0.10 -0.50 0.40 y 62 M032079 0.64 1.42 -0.781 n 93 M032732 -0.12 -0.55 0.43 n 45 M022194 -0.88 -0.12 -0.755 y 60 M032044 0.08 -0.35 0.43 n 47 M022198 -0.26 0.43 -0.693 n 28 M022010 -0.51 -0.98 0.47 y 30 M022016 0.17 0.83 -0.66 n 57 M022261B 1.15 0.65 0.50 y 11 M012017 -0.78 -0.13 -0.648 y 的 M032261 0.54 0.02 0.52 N 12 扎1012025 -1.38 -0.92 -0.462 Y 87 M032689 0.55 -0.03 0.58 N 83 M032652 1.11 1.57 -0.461 y 15 M012028 -0.91 -1.54 0.63 N 4 M012004 -0.12 0.34 -0.46 y 61 M032046 1.94 1.29 0.65 N 的 M032671 -1.49 -1.04 -0.451 y 13 MOl2026 0.36 -0.31 0.70 N 75 M032570 -0.89 -0.44 -0.447 n 25 M022004 -0.04 -0.72 0.68 n 17 M012030 -0.07 0.37 -0.438 y 41 M0221的 0.28 -0.45 0.73 N 7 M0120日 -1.32 -0.88 -0.433 y 6 M012006 -1.07 -1.80 0.73 Y

(9)

Margaret Wu Relative Strengths of Western and Asian Students

.

75 • Table 2 (continue) CalibratedItem Difficulties (in IRT logits) for Australia and Taiwan

Seq Unique ID AUS TWN Differ- In Differ- In

ence PISA? Seq Unique ID AUS T\\-干4 ence PISA?

3 M012003 -1.15 -0.74 -0.42 n 64 M0322l0 0.71 -0.03 0.74 N 的 M032208 -0.98 -0.58 -0.40 N 20 M012039 -0.17 -0.93 0.76 N 21 M012040 -1.41 -1.01 -0.40 Y 86 M032678 0.43 -0.33 0.76 N 22 M012041 -1.03 -0.66 -0.38 n 24 M022002 1.86 1.04 0.79 N M012001 -0.85 -0.59 -0.26 n 74 M032557 1.66 0.76 0.90 N 44 M022191 -0.81 -0.56 -0.25 y 70 M032447 0.10 1.01 0.91 y 77 M032609 1.11 -0.91 -0.20 n 79 M032643 0.70 -0.25 0.95 N 82 M032649B 1.67 1.87 司0 .20 y 46 M022196 -0.06 -1.02 0.96 N 9 M012015 -0.91 -0.72 -0.19 N 71 M032489 -2.16 -3.13 0.97 y 91 M032727 -0.16 0.02 -0.18 y 52 M022227C 2.14 1.16 0.99 y 56 M022261A -0.19 -0.12 -0.06 y 55 M022253 0.09 -0.92 1.01 N 37 M022146 -0.44 -1.48 1.03 y 59 M032036 0.49 -0.89 1.38 N 89 M032693 1.10 0.06 1.04 N 51 M022227B 1.71 0.19 1.53 y 31 M022021 0.41 -0.64 1.05 y 23 M012042 0.67 -1.01 1.68 N 5 M012005 0.26 -0.88 1.13 N 73 M032545 2.40 0.44 1.97 N 50 M022227A 0.01 -1.18 1.18 y 48 M022199 1.06 -1.04 2.11 N 76 M032588 -0.26 -1.52 1.26 n

contrast,of the 20 items that Australian students found much more difficult (relative to other items) than Taiwan students did,12 items are classified as“N"(“"a叫lmos叫tcertainly not a PISAit記em刊")

Figure 2 shows a抖10叫tp of the average item diffi日ICU叫lltie臼s for Western c∞ountne臼s(AUS,ENG,USA) and average item difficulties for Asian countries (HK

G,

TWN, KOR), arranged in order by the magnitude ofthe difference between the average difficulties. That吟,the items on the left side of the plot are those that Western countries found relatively easier than Asian countries did. The items on the right side ofthe plot are those that Western countries found relatively more difficult. Interesting 旬,

the majority of the items on the right side of the plot are items that are deemed unlikely to appear in the PISA test (mostly labeled

N"). The numerical values of item difficulties are shown in Appendix

A.

In summary,both Table 2 and Figure 2 show that Asian countries have a tendency to perfonn relatively better on TIMSS items that are deemed not appropriate for the PISA test. These items are mostly

content rich" items that involve formal mathematics.

(10)

,獨歹哲明、唱“骨頭告也會

aeiJit

Margaret Wu • Relative Strengths of Western and Asian Students

扭曲唱凶 4 但明。語 JF+ 76

HEZ ∞Eω £ gω 且可 -DSEωHZZH 且也且注音咱哥占百三百津巴 ωω 孟晶宮 ZB 眉毛 Eg332EZH 且已〈 內心 K迫切河 hh 卜〈∞ HL 口 -ODEa-己也可丐。 υ zhzzzhzfzzzzhzzh 星 ZZ 區 zzzh 口 Zhhh 回去 zzhEhEZhzhhEEEEzh 口 CEE 口 hhhEEZ 口 hhhzzhEhhphhELFECh 口 hhh 口 CC +一一----OIIl +_.,.+----4 +--<11 +--<11 +-<11I

t:

+<11 ++4+44 +<II -M

.~凡

E--14 -M 4 #

..

..

....+ .+ +-+ +-+ +-+ 4一+

+有喝一一+

+

4一-+

<ll--+ + 4一4一一++ 4一一+ 4一←+ 4一一+ 4一一+ 4一一+ 4一一+ 4喝一一一一一一++ 喝一一一+ 喝一一』一一+ 吾土一←+ 4 ++ l

斗 oil可

4 + + 守,、 P也

-o AlIn:ly]!Pill剖I

P也

守,、

(11)

M訂g訂et Wu Relative Strengths of Western and Asian Students

.

77 •

Mean Square F Sig.

1.86 1.47 .23

0.03 0.02 .88 10.82 8.56 .00

1.26

Analysis of Variance of Item Difficulty Measures

To formally carry out statistical tests of the effects of item context on the item difficulty measures for the two count可 groups, an analysis of variance (ANOYA) of item difficulties was carried out with respects to two main effects: East/West and in-PISA,and the interaction term. Table 3 shows the ANOYA results.

Table 3 ANOYA Tests ofEffects onItem Difficulties Source Sum of Squares df

InPISA 1.86 EastWest 0.03 InPISA

*

EastWest 10.82

E叮or 745.82 590 Total 758.51 594

Note. Dependent variable:Item difficulties

The two main effects of the ANOYA are not statistically significant(p value equals .22 and .88 respectively). This is not surprising. For the main effect,InPISA,the ANOYA shows that the overall item difficulties for items that fit the PISA framework are not greatly different 企om the items not fitting the PISA企amework. For the main effe仗, EastWest, the non-significant result is expected

since the item calibrations were carried out by fixing the average item difficulty at zero for all six countries so that there is no difference in average item difficulty across the East and West groups. However,the interaction term,InPISA by EastWest,is statistically significant (p = .004),showing

that the item difficulties are indeed likely to be different between Asian and Western countries for InPISA items, and for non-InPISA items. The statistically significant interaction term provides a strong support for the finding that there is differential item functioning across the two country groups.

Reasons for Differential Item Functioning

(12)

.

78 • Relative Strengths of Western and Asian Students Margaret Wu differential item functioning between Western and Asian countries. While the items in which Asian countries performed relatively well tend to be those with formal mathematics,what types of items are those that Asian countries performed relatively poorly in,and what are the reasons for this poor performance? There appear to be three main reasons for a relatively lower performance of the three Asian countries on some items:

I. where explanations are required in the responses;

2. where the English language is not easily translated into Asian languages in the question; 3. where everyday life knowledge and experience can be drawn upon to answer the question. We take a look at some example items as an illustration for the above three points.

The item where there was the largest difference between item difficulties for Australia and Taiwan was item 97 (M032762) in the TIMSS released item set,as shown in Figure 3.

Betty talks for less than 2 hours per month. Which plan would be less expensive for her? Less expensive plan

Explain your answer in terms ofboth the monthly fee and free minutes.

Figure3. TIMSS 2003 released item M032762

This item follows a piece of stimulus material where phone plans with different fee structures are presented. The coding scheme for this item is given in Table 4.

Table 4 Coding Scheme ofItem M032762

Code Response

Correct Response

20 Plan B with explanation that includes free minutes used and explicit reference to lower monthly fee for PlanB.

Partial Response

10 Plan B with explicit reference to lower monthly fee and no reference to free minutes Incorrect Response

70 Plan B with inadequate (only free minutes) or no explanation 71 Plan A with or without explanation

79 Other incorrect Non Response 99 Blank

(13)

Margaret Wu Relative Strengths of Western and Asian Students

.

79 • A comparison of the percentages of students in each coding category between the six countries is given in Table 5.

Table 5 Percentages ofStudents by Coding Categories ofItem M032762

Code Australia England USA Hong Kong Taiwan Korea

20 44% 45% 37% 28% 27% 40% 唱且團

o

5% 4% 9% 5% 6% 6% 70 30% 33% 32% 56% 48% 42% 71 12% 12% 14% 10% 15% 8% 79 3% 3% 6% 1% 2% 3% 99 6% 5% 2% 1% 3% 2% 20+10+70 79% 82% 78% 89% 81% 88%

Codes 20, 10 and 70 all relate to the correct choice of Plan B, but they differ in the extent to which adequate explanations are given.Itis striking that the percentages for Code 70(co汀ectanswer,

but inadequate explanations) differ greatly between Western countries and Asian countries, in that Asian countries have higher percentages for Code 70 as compared to Western countries. In the case of Hong Kong and Taiwan,the percentages correct (Code 20) are much lower than for the Western countries. If we ignore the explanation part of the answer and just focus on whether students selected the correct plan (Plan B),we can look at the sum ofthe percentages for Codes 20,10 and 70. In fact,

the three Asian countries obtained comparable, if not higher, percentages for choosing the correct Plan. Where Asian students fell down was mostly in providing adequate explanations for the correct answer. Recently I was involved in a project in Macao where teachers constructed PISA-like mathematics items and administered them to the students. A number ofreports 企om the teachers included comments like

students could not be bothered to read long worded questions or to provide extended responses for explaining their answers." This would seem to support the results shown in Table 5.

The second type of items where Asian students performed relatively less well is those where the English language is not readily translatable into Asian languages.Item 33 (M022135) in the TIMSS released item set is such an example. This item is shown in Figure 4.

Table 6 shows the percentage of students choosing each multiple choice option of the question,

(14)

Margaret Wu

A beakerof\v犯 terwhich 113前 re凡cbedboiling point i兵几!lowed t府 coo!.

The temper'ature ofthe water is recorded at five minutL'interv,扎扎扎ncl 泣

temperatuI'• time gra ph isell'川、ln.

• Relative Strengths of Western and Asian Students

Cooling Curve 100 目。 60 4 日 2 日 25 。 Time (minutes)

About how manv minutl's did ittake 心 rthe watpl'I 仆(肘。 I tlw flrs! 20 degrees'! H 50 (υ0) 的」三伺 LU 且 Eω ← :37

@@@@

80

TIMSS 2003 releaseditem 恥1022135

Figure 4.

Percentage ofStudents in Each Response Category ofItem M022135 Table 6 Korea Taiwan Hong Kong USA England Australia MC Option 11% 45% 26% 52% 55% 59% A 4% 6% 6% 8% 14% 12% B 6% 9% 8% 11% 12% 6% C 79% 40% 60% 27% 19% 21% D

From Table 6,it can be seen that about twice as many students in Western countries chose the correct option A than option D. In contrast,for Asian countries,a large proportion of students chose the incorrect option D compared to optionA.

(15)

Margaret Wu Relative Strengths of Western and Asian Students

.

81 • degrees" instead of "to cool the 戶rst 20 degrees." This is most likely because of language conventions,where the phrase“the戶rst20 degrees" is not a commonly used structure for the Asian languages. In Chinese,it would typically be said as "to cool 20 degrees" (i.e.,there is no explicit words

the first"). In Hong Kong,the question has been translated as

From the start till cooled the initial 20 degrees." (從開始至冷卻了最初的二十度)1 While this is a correct literal translation,it is not a sentence structure that people would normally use in speech. The word

initial" in this context sounds out of place in Chinese. This word can also mean

original," or

the very beginning." The word "till" may lead to an interpretation of

to cool to 20 degrees." Simply put,in Chinese,there is no suitable literal translation of

to cool the戶rst20 degrees." One would need to drop the word

the first" to make it sound like natural speech, but the translator clearly thought that this might disadvantage the students. By adding the word "initial," while more

correct" in matching the English version,it made the sentence sound unnatural and foreign to Chinese speakers. The result was that students were still confused. In general,such differences in language usage are likely the cause for differential item functioning. This is not strictly a translation verification issue,because the translation may be literally correct,yet the sentence structure cannot be the same across languages,

so the propensity to misunderstand the question will vary between language groups. The discrimination index of .18 for Hong Kong and .41 for Australia further supports the conjecture that the question was confusing to Hong Kong students.

The third type of items in which students from the three Asian countries performed relatively less well is related to everyday life experience and skills,such as estimation,and the understanding of measuring units. These problems generally do not require high levels of mathematical skills to solve. For this type of items,students from Western countries have particular strengths as compared to students from Asian countries. Three such items (M032699,M022005,and M032647) are shown in Figure 5 to Figure 7. Appendix gives the differences in average item difficulties for East and West countnes.

We will takeItem M032647 (Figure 7) as an example and compare the item statistics in more detail. This is a real-life application of mathematics. The item statistics for the six countries are shown in Table 7. The three western countries performed relatively better on this item than the three Asian countries did (see item delta logits). One interesting observation about this item is that there is evidence that some higher ability students chose the incorrect Option D (labeled 4 in the item analysis). For all six countries,the average ability (column headed mean Ab in Table 7) for Option

可 Inthe Taiwanversi冊, the translation is“從一開始最初冷卻了二十度" This translation is a little less confusing than the Hong Kong version,but it is still problematic.

(16)

• 82 • Relative Strengths of Western and Asian Students Margaret Wu

Which of these units would usually be used for an area the size of a soccer field?

@s叫qu叫a

@c叫u昀1巾圳b愉IC C仰削cen削en叫1札t叫lme叫M叫正e亨rs

©s叫qu盯e me叫t。叫叭rs.鳴S

@c山ic meters

Figure5. TIMSS 2003 released item M032699

The number of 250 milliliterb叫tIes that can be filled from 400 liters of waterIS

®

16

®

160

。 1600

。 1600。

Figure6. TIMSS 2003 released item M022005

Orange后 are packed in boxes. The average diameter of the oranges is 6 em,

扎nd the hoxes are GOem long、:怖 em wide. and 24 em deep.

叭Thichof these is the BEST approximation of the number of oranges that can be packed in 仇 box'?

@法。

®

240

。圳()

。 1920

Figure 7. TIMSS 2003 released item M032647

D is somewhat closer to the average ability for the correction answer (Option B,labeled 2). In the case of Korea,the average ability for Option D is even higher than the average ability for Option B.

(17)

Margaret Wu Relative Strengths of West巴rn and Asian Students

.

83 • Table 7 Item Statistics forItem M032647 for Western and Asian Countries

Australia item: 80 (M032647)

Cases for this item 781 Discrimination 0.26

HongKong item: 80 (M032647)

Cases for this item 819 Discrimination 0.19

Item Delta(s): -0.43 Item Delta(s): 1.13

Label Scor巴 Count % of tot Pt Sis meanAb Label Score Count % of tot Pt Sis meanAb

'-7 缸勻 JA 且可 /OnVJ 0.00 1.00 0.0。 0.00 0.00 0.0。 127 447 143 42 4 18 16.26 -0.18 -0.54 57.23 0.26 0.26 18.31 -0.13 -0.42 5.38 0.01 -0.11 0.51 -0.08 -0.66 2.30 -0.06 -0.54 '17 缸勻 JA 且可 /O 司/nVJ 0.00 43 1.00 421 0.00 201 0.00 148 0.00 1 0.00 1 0.00 4 5.25 -0.06 51.40 0.19 24.54 -0.17 18.07 -0.02 0.12 -0.02 0.12 0.00 0.49 -0.07 0.71 1.52 0.77 0.94 0.30 0.39 0.24 England it巴m:80 (M032647)

Cases for this item 446 Discrimination 0.27

Taiwan

it巴m:80 (M032647)

Cases for thisit巴m 902 Discrimination 0.32

Item Delta(s): -0.33 Item Delta(s): 0.92

Label Score Count % of tot Pt Sis meanAb Label Score Count % of tot Pt Sis meanAb l23469 0.00 74 16.59 輛0.15 -0.40 1.00 248 55.61 0.27 0.31 0.00 85 19.06 -0.17 -0.48 0.00 24 5.38 0.06 0.10 0.00 8 1.79 -0.14 -1.41 0.00 7 1.57 -0.09 -0.55 USA '-7 缸勻 JA 且可 /OnVJ 0.00 56 6.21 -0.10 0.17 1.00 496 54.99 0.32 1.79 0.00 208 23.06 -0.23 0.35 0.00 135 14.97 -0.09 0.73 0.00 5 0.55 0.03 1.58 0.00 2 0.22 -0.04 0.49 Kor巴a item: 80(M032647) 的m: 80 (M032647)

Cases for this item 1498 Discrimination 0.17 Cases for thisit巴m 891 Discrimination 0.18

Item Delta(s): 0.01 Item Delta(s): 0.96

Label Score Count % oftot Pt Sis meanAb Label Score Count % oftot Pt Sis meanAb

0.00 282 18.83 -0.11 -0.34 0.00 86 9.65 -0.27 0.11 2 1.00 741 49.47 0.17 0.29 2 1.00 493 55.33 0.18 1.51 3 0.00 354 23.63 -0.12 -0.32 3 0.00 204 22.90 -0.14 0.71 4 0.00 102 6.81 0.08 0.19 4 0.00 101 11.34 0.16 1.60 6 0.00 5 0.33 -0.08 -2.35 6 0.00 0.11 -0.06 -0.77 7 0.00 0.07 0.02 0.91 9 0.00 6 0.67 0.02 1.39 9 0.00 13 0.87 -0.04 -0.43

To obtain the solution to this item,the mathematical skills required are not very high. It involves simple division and multiplication, where a quick estimation, rather than complex mathematical

(18)

• 84 • Relative Strengths of Western and Asian Students Margaret Wu

calculations of volume, can be used to obtain the answer. To get the incorrect answer of 1,920 (Option D),the likely error is that the radius of the orange is used instead of the diameter. A source of confusion between radius and diameter is likely if the computation of volume of shapes is carried out. It is baffling why some high-ability students made this mistake. Nevertheless, more students from Asian countries made this mistake. Further,the answer of 1,920 is almost nonsensical,because that is an enormous number of oranges to be placed in one box. From the item statistics one may conjecture that more higher ability students in Asian countries than in Western countries tried to work through a mathematical solution to this question and paid no attention to sense making of the answer they obtained. This item again illustrates the contrast between a practical approach and a theoretical approach to solving mathematical problems by Asian and Western students. Because PISA contains more applied questions,Western students may have an advantage doing the PISA test,

provided that the questions can be solved using simple mathematics and a common sense approach. Many other items that show differential item functioning in relation to Western and Asian countries also reveal interesting observations about students in these countries. Unfortunately,owing to the limited space for this paper, we are not able to provide more examples here. Appendix A shows the list of items with respect to differential item functioning between East and West. Further explorations of item performance can be carried out following the order of items in this list.

Discussions and Conclusions

A review of TIMSS 2003 released items showed that almost half the items were not deemed likely to appear in the PISA test,owing to the lack of application contexts for the items. These items are typically content-rich items that involve formal mathematics. An analysis of relative item difficulties showed that three Asian countries performed relatively better in these formal mathematics items than three Western countries did. This may be a contributing factor toward the observation that Western countries generally performed relatively better in PISA than in TIMSS,

because TIMSS contains more context-free items involving formal mathematics.

Further,three interesting observations are made after a closer examination of differential item functioning on items where Western countries performed relatively better. First,students from the three Asian countries have more difficulties in providing explanations to how they arrived at the answer, even when they found the correct answer. Second, some cultural and linguistic issues are identified. Because the source text is in English, some sentence structure and vocabulary used in English are not readily translatable into other languages. The consequence is that students in Asian

(19)

Margaret Wu Relative Strengths of Western and Asian Students

.

85 • countries get confused with unusual phrases and terms,and misunderstand some of the questions. Third, students from Western countries appear to perform relatively better on everyday real-life context mathematics items where students bring their knowledge and experience from outside the classroom. In contrast, students from Asian countries tend to rely on knowledge gained in the mathematics class. Consequently,because most items in PISA are word problems in an everyday life context,it is not surprising that Western countries performed relatively better in PISA than they did in TIMSS. The observation that many students, particularly in Asian countries, disconnect mathematics problems from everyday life, sends an important message to mathematics educators about the importance of linking mathematics to the real world. This message has been actively promoted by PISA. However, a caution is needed. An almost exclusive emphasis on real-life mathematics,particularly at the 15-year-old level,willlikely restrict mathematics assessment to a set of items with lower mathematical content,and thus lead to an assessment that does not reflect all the mathematics topics taught in schools (that may be forfutureuse by the students).

More generally, the findings 企om this paper not only help us identify the reasons for differential performance of countries in PISA and TIMSS,they also throw some light on important issues in test construction, particularly in the context of international studies. The presence of differential item functioning gives an interesting insight into the mathematical thinking of students and cultural/linguistic characteristics in different countries,yet it threatens the validity of the studies at the same time. Steps must be taken to ensure a fair assessment,both for the countries involved and for mathematics education.

Acknowledgement

I am grateful to Professor Frederick Leung at the University of Hong Kong,and Dr. Andrew Jen at National Taiwan Normal University, for providing the Chinese version of the TIMSS released items.

(20)

Margaret Wu • Relative Strengths of Western and Asian Students

86

References

American Institutes for Research. (2005). Reassessing

u.s.

international mathematics performance:

New findingsfrom the 2003 TIMSS and PISA. Washington,DC: Author.

先秀之苦憲有位宿營滑過嚐鍵是割草

SB

悔自喝苦頭難過程越躇剪緝羽讓海弱可通過蠻的護海當愚

a

寫哥還苟治灣獨鑄封過擇單純當措

Gronmo, L., & Ols凹, R. (2008). TIMSS叩門us PISA: The case ofpure and applied mathematics.

Retrieved February 24,2008,from http://www.timss.no/publications/IRC2006_Gronmo&0Isen. pdf

Hutchison, G.,& Schagen, I.(2007). Comparisons between PISA and TIMSS - Are we the man with two watches? In T. Loveless (Ed.),Lessons learned - What international assessments tell us

about math achievement (pp. 227-261). Washington,DC: The Brookings Institution.

International Association for the Evaluation of Educational Achievement.(2003). TIMSS assessment

frameworks and specifications,2003. Chestnut Hill,MA:TlMSS International Study Centre.

National Center for Education Statistics. (2008). Comparing NAEP, TIM:.蹈, 。nd PISA In 1, 2008, from http://nces.ed.gov/timss/pdf/naep一

mathematics and science. Retrieved May

timss---'pisa_comp.pdf

Neidorf,T.鼠, Binkley,M.,Gattis, K., & Nohara,D. (2006). Comparing mathematics content in the

National Assessment of Educational Progress (NAE門, Trends in International Mathematics

and Science Study (TIMS,旬, and Program for International Student Assessment (PISA) 2003

assessments (NCES 2006-029). Washington,DC: National Center for Education Statistics,U.S.

Department of Education.

Organisation for Economic Co-operation and Development. (2003). The PISA 2003 assessment

也哥華曾嘻嘻罪惡違憲哼

framework - Mathematics, reading, science and problem solving knowledge and skills. Paris:

Author.

Organisation for Economic Co-operation and Development. (2009). Learning mathematics forlife 一

Aperspectivefrom PISA. Paris: Author.

Wu, M. (2009). A comparison of PISA and TIMSS 2003 achievement results in mathematics.

(21)

Margaret Wu Relative Strengths of Western and Asian Students + 87 +

Appendix: Average

It

em Difficulties (in IRT logits) for Western

and Asian Countries

Differ- In

Seq UniquelD West Differ- h

Seq Unique ID West East -~~~-~ PISA? East

PISA? ence 33 扎1022135 -0.28 2.56 -2.84 y 60 M032044 -0.09 0.07 -0.16 n 97 M032762 0.12 1.88 -1.76 Y 21 M012040 -1.47 -1.33 -0.14 Y 38 M022148 -1.05 0.56 -1.61 y 66 M032233 1.50 1.58 -0.08 y 90 M032699 -1.84 -0.58 -1.26 Y 76 M032588 -0.87 -0.81 -0.06 n 80 M032647 -0.25 1.01 -1.26 y 91 M032727 Aυ 00υ 0.03 -0.03 y 30 M022016 Aυ II 1.04 -1.15 n 39 M022154 -0.34 -0.31 -0.03 N 58 M02226IC 2.40 3.50 -1.11 y 22 MOl2041 -1.20 -1.17 -0.03 n 96 M032745 1.99 3.04 -1.05 y 56 M02226IA -0.14 -0.15 Aυ00 y 26 扎1022005 0.55 1.54 -0.99 y 92 M032728 0.13 0.12 Aυ Aυl N 27 M022008 0.58 1.57 -0.99 n 32 M022127 1.47 1.43 0.04 Y 94 M032743 -1.25 -0.31 -0.94 y 29 M022012 0.34 0.30 0.04 n 18 MOl2037 -0.54 0.39 -0.93 Y 3 MOl 泊的 -1.18 -1.25 0.07 n 2 MOl2002 -1.43 -0.54 -0.89 y 72 M032533 -0.12 -0.19 0.07 y 34 M022139 0.43 1.29 -0.86 n 40 M022156 -0.07 且 0 .1 6 0.09 y 98 M032763 1.85 2.63 -0.78 Y 93 M032732 Aυ 00υ 0.11 0.10 n 36 M022144 -0.05 0.68 -0.73 n 44 M022191 -0.61 -0.74 0.14 y

95 M032744 Aυ Aυl 0.67 -0.66 y 9 MOl2015 -0.79 -0.93 0.14 N

4 MOl2004 0.03 0.67 -0.64 y MOl2001 -0.74 -0.90 0.16 n 99 M032764 1.76 2.39 -0.63 Y 28 M022010 -0.61 -0.77 0.16 y 45 M022194 -0.63 -0.08 -0.55 y 78 M032612 0.69 0.51 0.18 N 54 M022252 -1.51 -0.96 -0.54 y 的 M032271 -0.29 -0.47 0.18 Y 82 M032649B 1.46 2.00 -0.54 y 65 M032228 -0.31 -0.51 0.20 y 84 M032670 -2.22 -1.71 -0.51 n 87 M032689 0.53 0.27 0.26 N 16 MOl2029 -0.95 -0.45 -0.50 n 69 M032403 -0.19 -0.49 0.30 N 的 M032652 0.95 1.45 -0.50 y 53 M022251 1.17 0.86 0.30 N 62 M032079 0.76 1.20 -0.43 n 19 MOl2038 -1.53 -1.86 0.33 n 10 MOl2016 -0.22 0.21 -0.43 N 71 M032489 -2.04 -2.45 0.41 y 81 M032649A 0.28 0.71 -0.43 y 57 M022261B 1.20 0.78 0.42 y 的 M032671 -1.49 -1.06 -0.43 y 70 M032447 -0.07 -0.52 0.45 y 8 MOl2014 -1.97 -1.55 -0.43 y 64 M032210 0.61 0.13 0.48 N

(22)

• 88 • Relative Strengths of Western and Asian Students Margaret Wu

Appendix: (continue) Average

It

em Difficulties (in IRT logits)

for Western and Asian Countries

Differ- In

Seq Uniqueill West East Differ- h

Seq Unique ID West East ~~~~-~ PISA? ence PISA?

77 M032609 -1.27 -0.90 -0.37 n 25 M022004 -0.06 -0.55 0.50 n 的 M032208 -0.84 -0.49 -0.36 N 7 M0120 日 -0.61 -1.13 0.53 y 47 M022198 0.03 0.35 -0.31 n 61 M032046 1.61 1.07 0.54 N 75 M032570 -1.09 -0.78 -0.31 n 的 M032261 0.28 -0.29 0.57 N 11 M012017 -0.44 -0.13 -0.31 y 49 M022202 1.40 0.79 0.60 N 的 M022189 -1.76 -1.46 -0.29 y 35 M022142 0.30 -0.36 0.66 N 17 M012030 0.20 0.49 -0.29 y 6 M012006 -0.95 且1. 62 0.67 Y 88 M032690 0.35 0.60 回0.26 n 24 M022002 1.82 1.10 0.72 N 42 M022188 0.29 0.54 -0.25 N 46 M022196 -0.33 -1.07 0.75 N 12 M012025 -1.57 -1.34 -0.23 Y 的 M012028 -0.83 -1.59 0.76 N 14 M012027 -0.77 -0.55 -0.22 n 86 M032678 0.38 -0.39 0.77 N 37 M022146 -0.49 -1.27 0.78 y 31 M022021 0.51 -0.55 1.06 y 79 M032643 0.61 -0.17 0.78 N 5 M012005 0.18 -0.89 1.07 N 20 M012039 -0.02 -0.92 0.90 N 50 M022227A -0.08 -1.19 1.10 y 52 M022227C 2.13 1.16 0.97 y 23 M012042 0.11 -1.01 1.13 N 55 M022253 -0.02 -1.01 0.99 N 41 M0221的 0.30 -0.83 1.13 N 59 M032036 0.48 -0.52 1.01 N 73 M032545 1.90 0.46 1.45 N 74 M032557 1.74 0.72 1.02 N 51 M022227B 1.54 0.03 1.51 y 89 M032693 1.22 0.20 1.02 N 48 M022199 0.96 -0.56 1.52 N 13 M012026 0.41 -0.64 1.05 N

(23)

Margaret Wu Relative Strengths of Western and Asian Students

.

89 • 教育科學研究期刊 第五十六卷第一期 2011 年, 56 ( I )

,

67-89

透過 PISA 與 TIMSS 評比研究檢視西方與

亞洲學生數學的相對強項

張立氏 澳洲墨商本大學 吉干 1;1;耳丹究中,心 副教授 摘要

Wu (2009) 的研究曾將 2003 年「國際學生能力評量計畫J (Programme for International

Student Assessment, PISA) 的數學表現,與「國際數學與科學教育成就趨勢調查 J (Trends in

International Mathematics and Science Study,TIMSS) 八年級學生的數學表現做一比較,結果發

現,西方國家在 PISA 的表現大致上比在 TIMSS 的表現為佳。本研究則將 TIMSS 公開的題目 分為兩組,一組與 PISA 的架構相符,另一組則不相符。其中, TIMSS 評比有很多幾何與代數 的題目是屬於「純粹的數學」題(即末以真實生活情境作為背景的數學題) ,而這些題目並未 出現在 PISA 評比之中。本研究檢視六個國家在這兩組題目表現上的差異,藉此反映出西方與 亞洲國家的相對強項與弱點,接著再將這些強項與弱點連結到 PISA 與 TIMSS 評比的內容上。 有證據顯示,西方與亞洲國家在 PISA 與 TIMSS 的差異表現可歸因於兩評比題目種類的不同。 關鍵字:國際研究、數學、國際學生能力評量計畫、國際數學與科學教育成就趨勢調查 通訊作者 張立民, E-mail: m.wu@unimelb.edu.au 收稿日期 : 2010109/30 ; 修正日期: 2010112113 、 2011102/25 ; 接受日期 2011103/100

數據

Figure I. Examples ofTIMSS items not appropriate for the PISA test
Table 2 Calibrated It em Difficulties (in IRT logits) for Australia and Taiwan
Figure 2 shows a 抖10叫t p of the average item diffi 日ICU 叫lltie臼s for Western c∞ountne臼s (AUS , ENG, USA) and average item difficulties for Asian countries (HK G, TWN, KOR), arranged in order by the magnitude ofthe difference between the average difficultie
Table 3 ANOYA Tests ofEffects on It em Difficulties Source Sum of Squares df
+3

參考文獻

相關文件

• However, these studies did not capture the full scope of human expansion, which may be due to the models not allowing for a recent acceleration in growth

The resulting color at a spot reveals the relative levels of expression of a particular gene in the two samples, which may be from different tissues or the same tissue under

• Content demands – Awareness that in different countries the weather is different and we need to wear different clothes / also culture. impacts on the clothing

(1) Western musical terms and names of composers commonly used in the teaching of Music are included in this glossary.. (2) The Western musical terms and names of composers

In this Learning Unit, students should be able to use Cramer’s rule, inverse matrices and Gaussian elimination to solve systems of linear equations in two and three variables, and

In this paper, we build a new class of neural networks based on the smoothing method for NCP introduced by Haddou and Maheux [18] using some family F of smoothing functions.

In 1990s, the research area of nonhomogeneous Harmonic analysis became very active due to three famous mathematicians Nazarov, Treil and Volberg published an important result in

and Jorgensen, P.l.,(2000), “Fair Valuation of Life Insurance Liabilities: The Impact of Interest Rate Guarantees, Surrender Options, and Bonus Policies”, Insurance: Mathematics