• 沒有找到結果。

Reviewing the results of the SCT Study (2004-2008)

12.1 At the conclusion of the SCT study it would seem appropriate to attempt some overall synthesis of the results that have been gathered over four school years, particularly those that measure pupils’ attainment. Since its inception, the study has had to cope continually with problems which arose from the initial research design. Ideally, from a research perspective to compare small and large classes would have required two samples; one with large and one with small classes. Schools allotted small classes would have been chosen at random with some initial stratification to take account of key variables such as proportion of disadvantaged pupils, the band level etc. Pupils would then have tested on entry to P1 in the September and then again in June at the end of each year up to the completion of P4.

12.2 In practice aided and government schools were invited to apply for participation in the Study via a circular memorandum dated 14 May 2004. The selection criteria were: (a) schools’ readiness in trying out SCT, based on their knowledge and experience in curriculum development, participation in teaching pedagogy related initiatives/projects, teachers’ commitment, etc.; (b) prospect of stable and adequate P1 intake throughout the period of the Study, and (c) availability of classrooms to accommodate the additional classes.

The circular memorandum also indicated that in the event the number of schools which satisfied the above criteria exceeded the quota of 40, priority would be given to schools which had a sizable number of students who were in need of stronger school support. Since the number of eligible schools fell short of the target number, it was not necessary to apply this additional criterion.

12.3 Given the circumstances stated in para 12.2, a compromise design was therefore employed whereby the small and the control classes came from the same schools all of whom volunteered to participate. This decision gave rise to several important limitations. First it meant that the equivalent classes from the experimental and control cohorts had to be tested in different years. Thus P1 small classes were tested in 2004/05 (Cohort 1) and 2005/06 (Cohort 2) but P1 controls (Cohort 3) were not tested until 2006/07. Second, only Cohort 1 pupils could be followed up to the end of P4 without extending the study. To make comparisons, therefore, required different samples of pupils to provide controls in P1, P2, P3 and P4. The P2 normal classes could then be followed through P3 up to the end of P4 while pupils from Cohort 3 could be followed up to the end of P2.

12.4 This would have mattered less if the subsequent scores used as post-tests could have been adjusted for intake. But since the tests had to be created at the same time that the study began it was only possible to test Cohort 1 small classes in October 2004 and the P2 and P3 control classes in December 2004.

It was not until the 2006/07 school year that a straightforward comparison of

performance in P1 classes could be made over the same time interval (Cohort 2 v Cohort 3). When this was done it was in the small classes that the pupils made greater academic progress although the effect size was in every case small. Irrespective of whether the pupil was a boy or a girl or the nature of the subject (except mathematics for girls) pupils in the small classes did better than those from the controls.

12.5 When the various comparisons were undertaken in the P2, P3 and P4 classes the advantages of the Cohort 2 small classes in P1 appeared to decrease over time. The fact that the P1 gains were not replicated in Cohort 1 restricts the degree of confidence that can be placed in this finding, although it is reasonable to assume that the schools and P1 teachers in the 2004/05 school year would still have been adjusting to the changed circumstances and therefore less likely to have been able to take full advantage of the class size reductions. However, a significant difference between the small classes and those in the control group emerged when the effects across different ability bands was examined. It appeared that teachers in small classes were able to improve the performance of pupils more evenly while in some cases teachers in the larger classes improve overall performance by boosting that of the more able pupils at the expense of the children in the bottom third of the class. In all cases although these differences were statistically significant they again gave rise to only very small effect sizes; the equivalent to one to two months additional schooling. However, such results were confounded in that in some cases the same teachers taught the small and normal classes.

12.6 Mainly for this latter reason an additional group of ‘reference schools’ were added. Again, however, there were problems in the timing of testing and the fact that three different samples of pupils representing P1, P2 and P3 had to be used in order to follow their progress in the subsequent year. The decision to conduct the pre-test in September at the beginning of the school year was the logical one but again this meant that in some cases, when comparing P2 or P3 classes, the pre-test (end of P1 or P2 test) had been administered to the small class pupils in the previous June.

12.7 In seeking to carry out a review of these results over the four year period two approaches can be taken. First, only the results where the comparisons are across similar time intervals could be used. The disadvantage of proceeding in this way is that it would eliminate some results. The second approach is to consider all the results obtained at the end of each school year and to attempt to estimate the effect of administering the test at different points in time. Table 12.1 sets out the means and standard deviations for the various test administrations from the end of P2 until the end of P3. Dates in italics indicate the move back to normal classes.

Table 12.1 Means and standard deviations for the test administrations (P1 to P3)

End of P1 End of P2 End of P3

Date Mean s.d. N Date Mean s.d. N Date Mean s.d. N CHINESE

E X P

June 05 June 06

36.39 38.26

19.03 19.97

3684 3217

June 06 June 07

51.50 52.37

18.42 19.55

3889 3282

June 07 47.62 18.40 3974

N O R M

Dec 04 June 07

54.28 35.04 36.56

20.05 19.75

4276 3395

Dec 04 June 05 June 08

62.82 52.90 53.67 52.97

17.32 18.33 19.82

4541 4320 3389

June 05 June 06 June 08

43.97 46.23 49.17

16.19 17.04 18.86

4613 3844 3353 R

E F

Sept 06 June 07

42.95 37.81 39.08

20.13 19.97

2713 2656

Sept 06 June 07 June 08

55.04 54.02 54.25 55.90

18.00 19.20 19.22

2717 2676 2505

June 07 June 08

49.42 51.96

18.59 18.86

2686 2516

ENGLISH E

X P

June 05 June 06

52.44 51.85

22.37 23.36

3699 3210

June 06 June 07

57.11 53.01

23.95 24.03

3899 3261

June 07 32.69 21.84 3963

N O R M

Dec 04 June 07

64.23 51.43 51.89

21.23 22.94

4242 3369

Dec 04 June 05 June 08

68.04 57.29 58.35 54.47

22.27 23.63 24.62

4565 4311 3405

June 05 June 06 June 08

28.49 30.99 33.76

19.31 20.91 22.39

4615 3866 3356 R

E F

Sept 06 June 07

54.77 52.37 53.59

23.30 23.51

2713 2668

Sept 06 June 07 June 08

60.26 57.58 54.04 58.61

23.64 24.26 24.24

2710 2667 2507

June 07 June 08

36.21 35.97

23.65 23.92

2692 2522

MATHEMATICS E

X P

June 05 June 06

43.68 44.98

24.71 24.48

3702 3201

June 06 June 07

53.40 54.46

20.86 21.80

3892 3275

June 07 59.58 21.73 3973

N O R M

Dec 04 June 07

60.20 44.21 44.40

23.17 25.68

4268 3354

Dec 04 June 05 June 08

57.80 54.54 51.20 55.21

20.30 20.62 22.49

4578 4311 3406

June 05 June 06 June 08

58.53 60.42 60.15

21.31 20.57 21.98

4607 3925 3330 R

E F

Sept 06 June 07

52.07 45.82 44.76

24.44 25.74

2715 2667

Sept 06 June 07 June 08

56.36 54.89 56.10 57.81

20.79 21.61 21.63

2717 2660 2500

June 07 June 08

61.52 61.27

21.58 21.47

2689 2508

12.8 The first step in the analysis was to calculate the average percentage gains that resulted in taking the tests in September and December respectively. This was done by calculating the percentage differences in each case and then taking an average across the three subjects. For the end of P1 scores taking the test in September added approximately 12% to the average June score and taking the test in December a further 23.5%. For the end of P2 scores the corresponding figures were 4.4% and 11.4% respectively. The figures in bold in Table 12.1 represent the adjusted scores.

12.9 The standard deviations in Table 12.1 were then pooled to give a standard error of approximately 0.50. To reach statistical significance at the 5% level any difference between pairs of means in the table should therefore be greater than 1.96 while a value greater than 2.58 would indicate the a result which was significant at the 1% level. Examining differences between Cohort 1 and Cohort 2 at the end of P1 and P2 shows only one significant difference; that in English at the end of P2. At the end of P3, when Cohort 2 returned to normal classes, there are no significant differences with Cohort 1 across any of the subjects. For the comparisons with the control and reference classes there are no clear trends. In so far as significant differences do occur they appear to represent random fluctuations possibly due either to variations in the intake to P1 or, more likely because of the quality of teaching. In every case, all of the significant differences between pairs of means result in extremely small effect sizes.

12.10 There are also three end of P4 comparisons. The P2 control classes of 2004/05 took the end of P4 test in June 2007 while, Cohort 1 having returned to normal classes, together with the reference classes took the same test in June 2008. In Chinese the means were 54.72 (control) 51.42 (Cohort 1) and 54.04 (reference group). The score of Cohort 1 pupils is significantly lower than that of both other samples (1% level but very small effect size). Moving back to normal classes in P4 appears to have a negative effect. In English the corresponding means were 40.11 (control) 38.35 (Cohort 1) and 42.75 (reference group) respectively. Here, although the trend is similar to that for Chinese only the performance of pupils in the reference group is significant (1% level; small effect size). In mathematics, however, Cohort 1 pupils do outscore the control sample (52.26 against 50.88) although the result is not statistically significant. The pupils in the reference group do even better with a mean of 54.39. This is a significant improvement over Cohort 1 pupils’

performance (5%) and of control pupils (1%) although again these are only small effect sizes.

12.11 Confirmation of these trends can be seen in Table 12.2 where the end of P3 scores are used to predict the P4 score. It can be seen that in every case, Cohort 1 does less well than expected in comparison to the control group from the same schools (Chinese and English) and compared to the reference group in mathematics, although the magnitude of the differences in the latter subject are not as great as is the case of the two languages. The return of Cohort 1 to normal classes, having spent the previous three years in small ones, appears therefore to have retarded rather than enhanced pupils’ progress.

Table 12.2 End of P3 to end of P4 residual gains for each subject

Sample Chinese English Mathematics

Mean s.d. N Mean s.d. N Mean s.d. N Cohort 1 -1.892 11.31 3754 -1.285 12.56 3757 -0.849 12.82 3758

Control 2.34** 11.63 3624 1.995** 13.58 3638 0.394 12.96 3676 Reference -0.583 11.72 2365 -1.030 12.79 2363 0.747** 13.04 2333

**p<0.01; small to very small effect size

12.12 It has previously been shown that compared to Cohort 3 and the reference group pupils in small classes in Cohort 2 did better on all three attainment tests at the end of P1. In the 2007/08 school year, Cohort 3 and the P1 reference class have both been followed into P2. It is thus possible to determine whether the progress made by Cohort 2 pupils in P1 continues in the following year. The residual gains for the three samples (Cohort 2, Cohort 3 and reference group) are shown in Table 12.3. In each case pupils in Cohort 2 small classes do less well than expected at the end of P2 when compared to pupils in normal classes. Thus the advantage gained by Cohort 2 at the end of P1 is eroded in the following year.

Table 12.3 Start of P1 to end of P2 residual gains for each subject

Sample Chinese English Mathematics

Mean s.d. N Mean s.d. N Mean s.d. N Cohort 2 -1.219 13.0 280 -1.339 16.7 280 -0.795 14.2 2851 Cohort 3 0.305 12.9 301 -0.713 17.0 300 0.210 14.6 2998 Reference 1.090* 13.2 229 1.860* 16.9 229 0.706* 14.1 2323

**p<0.01; small to very small effect size

12.13 It is also possible to deduce the effect on pupils in Cohort 2 when they move back to normal classes by using end of P2 scores to predict the P3 results. This time there are four comparison groups. The control class who took the end of P2 test in June 2005, Cohort 1 who took it in June 2006, Cohort 2 who took it in the June of the following year (2007) and the reference group who also took it in the same year. The residual gains for each subject are shown in Table 12.4. Here the results illustrate the difficulty of providing straightforward answers to some of the questions posed at the start of this report given the complexity of the research design. Compared to Cohort 3 pupils in the small classes appeared to regress during the P2 year. However when the comparison involves different samples the situation of pupils in Cohort 2 appears to have improved during the P3 year. In both languages pupils in Cohort 2, on returning to a large class make significant progress (1% level) compared to the pupils in the control groups and Cohort 1. In mathematics, however, Cohort 2 pupils make significantly less progress (1% level) that might have been predicted from their P3 score in comparison with the control group and the reference group has the lowest mean residual gain. Thus there is no overall consistency suggesting, once again, that what matters more than the size of the class is the expertise and experience of the teacher.

Table 12.4 End of P2 to end of P3 residual gains for each subject

Sample Chinese English Mathematics

Mean s.d. N Mean s.d. N Mean s.d. N Cohort 1 0.433 12.0 364 -0.842 13.9 364 -0.405 13.5 3457 Cohort 2 1.164 12.3 299 2.927 14.0 302 -0.866 13.1 2868 Control -2.692 11.3 354 -4.115 14.2 356 1.980* 12.5 3181 Reference 1.880* 12.3 238 3.755* 14.3 238 -1.320 13.4 2366

**p<0.01; small to very small effect size

12.14 A simple regression analysis was also undertaken using aggregated scores across the three subjects to predict the end of P2 scores. The predictive variables included the start and end of P1 scores, pupil gender, SEN classification, learning orientation at the end of P2, class membership (small or normal) and school. The strongest influence on end of P2 attainment was found to be the end of P1 scores. These accounted for 67.4% of the total variance explained by the regression equation. Scores on the P1 pre-test on entry to primary school contribute another 3.7%. Being classified SEN (unstandardised regression coefficient -6.24) lowers end of P2 score while end of P2 orientation to learning and being a girl (regression coefficients 1.08 and 0.70 respectively) make small positive contributions. Along with the above variables, 23 schools with significant regression coefficients contribute to explaining a further 2.7% of the total variation, 12 making a positive contribution to the end of P2 predicted score. Of these, 7 are reference schools.

This is balanced out by the fact that of the 11 making a negative contribution, 7 are also reference schools.

12.15 Repeating the analysis using multilevel regression (pupil characteristics and schools as the two levels) confirms the results of the simple regression model.

The contribution of the various pupil characteristics to the total predicted variance is 95.55 (standard error = 1.47) while the school contribution is 5.42 (standard error = 1.92). Thus around 5% of pupils’ end of P2 test performance can be attributed to differences between schools. Of the pupil characteristics with significant regression coefficients, start and end of P1 scores, end of P2 learning orientation and being a girl all featured in the simple regression analysis. The only addition using the multilevel model was membership of Cohort 3 confirming that the advantaged gained by Cohort 2 pupils in P1 had been eroded by the end of P2.

12.16 A similar analysis can be used to chart the progress of Cohort 1 pupils as they move through P2 to P4. This combines the P2 control classes who took the end of P2 test in June 2005 and completed the P4 year in June 2007 and Cohort 1 SCT ‘experimental’ classes who followed the same pathway from June 2006 to June 2008. The end of P2 scores are the largest predictor of P4 attainment (unstandardised regression coefficient = 0.83) accounting for 70.9% of the explained variance. The pupils’ orientation to learning at the end of P4 accounts for just over another 1% of this variation. Being a girl, having a Principal who has strong beliefs in the value of SCT and is experienced also contributes positively but being in Cohort 1 reduces the pupils’ P4 mean score by 1.8%. This reflects the adverse effect of returning to large classes. 17 schools have significant regression coefficients of which 6 make negative contributions to the P4 scores. The school contribution to the explained variance was again tested using the multi-level regression model.

The pupil contribution now has 4 significant regression coefficients consisting of the end of P2 scores, end of P4 orientation to learning, being a girl and membership of Cohort 1, the latter making a negative contribution (unstandardised regression coefficient = -2.08). These measures contribute 81.49 to the total variation (standard error = 1.55) while schools contribute 4.15 (standard error = 1.19) so that approximately 5% of the pupil attainment at the end of P4 is attributable to schools.

12.17 It is also of some interest to conduct the same analysis separately for the experimental school and the controls to see if schools’ contribution differs when classes are smaller. Table 12.5 displays the comparison using the aggregated scores from the Chinese, English and mathematics tests. In both cases the end of P3 test is the main predictor of P4 attainment followed by the end of P2 test. These two variables account for 81.8% of the explained variance in the experimental sample and 77.9% in the controls. Girls appear to gain a bigger advantage in the small classes and the active leadership of the Principal also makes a contribution. In the control classes, Principals who have either a negative (bottom quartile in ratings) or a positive view (top quartile in ratings) of the value of small classes feature in the regression equation. Presumably, a Principal with somewhat negative views of SCT accepts the existing conditions in normal classrooms and encourages staff to cope as best they can. Principals with positive views may be more likely devise strategies which compensate for larger classes, such as deploying existing resources (including teachers) more effectively to take account of individual circumstance with a particular class. For example they may employ a non-specialist teacher in a certain subject because of his or her expertise in coping with certain pupils with behavioural problems and this may, in part, explain the findings in para 9.9.

12.18 In the experimental sample 10 schools have significant unstandardised regression coefficients while in the control group there are 13. The relative contributions can be determined using the multilevel regression model where for the experimental sample just over 4% of the predicted P4 attainment is attributable to schools against just over 11% for the control group. Again, using the multilevel model adds little to the simpler regression analysis.

Looking at the individual schools involved it is interesting to note that only three schools feature in both the analysis of the experimental and the control samples. These are Schools 34 and 15 who contribute positively to the predicted P4 score. But School 16 makes a negative contribution when classes are small but a positive one when they are normal. What this seems to suggest is that it is the teacher’s expertise that has much the biggest influence, irrespective of whether the class is of small or normal size.

Table 12.5 Significant predictors of end of P4 attainment (Cohort 1 v Control)

Small Class sample Control sample

Significant variable

Regression coefficient

Square multiple correlation

Significant variable

Regression coefficient

Square multiple correlation

Constant -2.176 Constant -11.099

P3 end test 0.513 0.768 P3 end test 0.604 0.761 P2 end test 0.317 0.818 P2 end test 0.196 0.779 Being a girl 1.452 0.820 P4 attitudes 2.889 0.790 P1 end test 0.085 0.822 P1 end test 0.146 0.794 P4 attitudes 1.162 0.824 School 17 10.300 0.798

School 34 4.614 0.826 School 10 -1.778 0.801

School 15 2.222 0.827 School 27 7.088 0.804

+ Leadership 1.661 0.829 School 34 6.095 0.806

School 19 -4.010 0.829 School 15 2.050 0.807

Small Class sample Control sample Significant

variable

Regression coefficient

Square multiple correlation

Significant variable

Regression coefficient

Square multiple correlation

School 10 -2.865 0.830 School 9 5.453 0.809

School 2 -2.855 0.831 Being a girl 1.476 0.811

School 16 -3.208 0.832 School 26 4.006 0.812

School 7 -3.718 0.832 School 16 2.740 0.813

School 36 -2.394 0.833 School 35 3.603 0.814

School 33 -1.918 0.833. - SCT view 3.047 0.815

School 23 -2.210 0.834 + SCT view 2.032 0.816

School 11 5.424 0.816

School 24 -3.516 0.817 School 31 -2.993 0.818

School 3 -4.448 0.818

12.19 As a final check the regression equations were modified first to exclude the contribution of the end of P3 scores, then those of P2 and finally the end of P1 when predicting P4 attainment. In the control group removing P3 scores reduces the proportion of explained variance accounted for from 81.8% to 75.5% while also removing the end of P2 scores reduces the figure to 69.3%.

When no attainment data are included the figure drops to 14.2%. In the experimental group excluding P3 attainment reduces the total variance accounted for from 83.4% to 78.5%. When the end of P2 scores are excluded from the regression equation the figure is reduced to 48.9% and with no attainment measure the figure reduces to 27.1%. Ten schools contribute about 3.5% to this variation (14 in the control group contributing 8%). In the control group being one of the schools with a high proportion of disadvantaged pupils enters the regression equation (unstandardised regression coefficient = -7.537) confirming the earlier analysis that being in a smaller class in these schools makes a positive contribution to pupils’ P4 attainment.

In the experimental sample when no attainment measure is included in the regression equation it is being SEN which contributes 17.2 % of the total explained variation with parental support contributing a further 3.3%.

12.20 These results support the conclusions of earlier analyses. Whether pupils are in a small or a normal class what matters most is their prior attainment at the end of the year. Small classes may compensate, in part, for prior attainment on entry to primary school, particularly for pupils with disadvantaged backgrounds to the extent that they have matched the performance of children in classes with standard populations of pupils by the end of P1. However, any such advantages are gradually eroded year by year so that by the end of P4 on returning to normal classes they fall behind. School differences tend to be greater in normal classes, although the same schools do not always feature in top and bottom quartiles in successive years. This points to differential intakes at the start of primary schools and the greater expertise of some teachers as pupils move from class to class.