Brief Introduction of Sample Size - Large Sample, Small Sample and

Chapter 2 Literature Review

2.1 Brief Introduction of Sample Size - Large Sample, Small Sample and

2.1 Brief Introduction of Sample Size - Large Sample, Small Sample and Relationship between Them

2.1.1 Brief Introduction of Sample Size

Scientific research in general and education measurement in particular, the statistical sample size plays a very important role because of reason which have two opposite issues, the first is statistical power, the second includes cost, effort and data collection conditions. Statistical power is the probability that a statistical test will indicate a significant difference when there truly is equal to one. Statistical power is analogous to the sensitivity of a diagnostic test, and one could mentally substitute the word

“sensitivity” for the word “power” during statistical discussions. The second issue says that some measurements contain a large amount of information concerning the parameter of interest, others may contain little or none. Since the product of research is information, its “purchase” is expected to be at minimum cost. Therefore, the estimation of sample size should be taken when studying. The process of estimating sample size depends on some assumptions and parameters, namely at the following 5 basic elements:

modeling research, sampling variation and variance, effect size, significance, and power (Eng, 2003; Florey, 1993; Henry, 1990; Noordzij et al., 2010).

The ever increasing demand for research has created a need for an efficient method of determining the sample size needed to be representative of a given population. Krejcie and Morgan (1970) had published a formula for determining sample size. It is easy to

consult calculation which could have been constructed using the following general formula (Krejcie & Morgan, 1970):

In the case of population size unknown:

) 1 ( )

( 2

2 2 /

1 P P

n Z ^^  (2-1)

where, P: population proportion, d: confident limit around the point estimate, Z:

Z-score corresponding to expected statistical meaning.

In the case of population size known (less than 10,000), sample size is calibrated:

N n N_c n



 1

(2-2)

where, N: population size, n: sample size calculated in (2-1)

In specific cases, the formulae are different, all can be seen in (Bartlett, Kotrlik, &

Higgins, 2001; Cochran, 1977; Desmond & Glover, 2002; Hayes & Bennett, 1999;

Kerry & Bland, 1998; Kerry & Bland, 1998).

2.1.2 Relationship between Large and Small Samples

Although it is difficult to draw a clear-cut line of demarcation between large and small samples, but it is normally agreed amongst statisticians that a sample is to be recorded as large only if its size exceeds 30. After World War II, for doctoral dissertations and most other purposes, when comparing groups, the proper sample size is 30 cases per group. The number 30 has arisen from the understanding that with fewer than 30 cases, this was dealing with “small” samples that required specialized handling with “small sample statistics” instead of the critical-ratio approach had been accepted (Cohen, 1990).

Hogg, Tanis, and Rao (1977) wrote that, sample size which was less than 25 or 30 would be considered small and so more than that number would be considered large.

The reason of this is that when sample size is more than 30, its student’s t-distribution approximates to normal distribution. This study agrees with this judgment that means if sample whose size is less than 30 it will be considered small sample.

2.1.3 Minimum Size of Sample for Statistics

For a normal distribution, there are about 68% of values drawn within one standard deviation sigma (σ) away from the mean, about 95% of values lain within two standard deviations, and about 99.7% of values lain within three standard deviations. This fact is known as the 68-95-99.7 (empirical) rule or the 3-sigma rule (Govindaraju & Lai, 2004;

Maronna, Martin, & Yohai, 2006; Pukelsheim, 1994). More obviously, it can be seen in the Figure 2-1 (Moore, 2010).

Note: Adapted from “The basic practice of statistics” Fifth ed., (p. 75) by Moore, 2010, United States of America: Palgrave Macmillan.

Fig. 2-1 The 68–95–99.7 rule for normal distributions

In mathematical notation, these facts can be expressed as follows, where x is an observation from a normally distributed random variable, μ is the mean of the distribution, and σ is its standard deviation:

683 . 0 ) Pr( x 

955 . 0 ) 2 2

Pr(  x   997 . 0 ) 3 3

Pr(  x  

where Pr is abbreviated by probability

Now, a minimum subject number is necessary determined to efficiently acquire the trustful data for normal distribution, this number proposed is eleven in the following section (Yamaguchi et al., 2006).

In fact, the following expansion uses the nature of the standard deviation:

When x, 0;d_i x_i x, there is

and n is the number of subjects considered

It is estimated the minimum number of subjects lying in normal distribution using the above (2-5).

Formula (2-7) shows that x_k satisfies (2-3), but with this sample size, it cannot be detected outliers in case of n9. It can be considered the standard deviation which can be computed in case of n9, it cannot be determined whether there are outliers (messy data) in 3 range.

(2) Consider the case of 10n

Let n10, the left side of (2-5) is equal to 10². As expanding the equation (2-5) by the same procedure above (case of (1)),

2 2 satisfy (2-3). It is possible to detect as an outlier if x_k jumps out of other data, range of



3 . Of course the number of subjects can be more if all of them satisfy (2-3) for normal distribution.

Because denominator of right side of (2-4) is n1, so n10 is applicable to the case (1), and n11 corresponds to the case (2). The conclusion is that the minimum subject number required is eleven for normal distribution in range of the 3-sigma rule.

2.1.4 Necessity of Nonparametric Statistical Methods

Statistical science usually tends to focus on what are called parametric statistics.

These techniques are termed parametric because they focus on specific parameters of

the population, commonly the mean and variance. In order to utilize these techniques, the following assumptions regarding the nature of population from which the data are drawn must be satisfied (Pett, 1997; Tomkins, 2006):

(a) Normal distribution of the dependent variable (b) A certain level of measurement: Interval data

(c) Adequate sample size (more than 30 recommended per group) (d) An independence of observations, except with paired data

(e) Observations for the dependent variable have been randomly drawn (f) Equal variance among sample populations

(g) Hypotheses usually made about numerical values, especially the mean

In practice of measurement and educational measurement in specific, one or all of these parametric assumptions is often broken. In many cases, the solution to this problem is another group of tests for statistical inference, which do not make strict assumptions about the population, is known the nonparametric statistics – distribution free (Gibbons

& Chakraborti, 2011; Siegel, 1957). This study proposes new assessment method which is considered nonparametric statistical method named RaschGSP IRT applying to educational measurement to solve urgent problem that faces us.

2.2 Some Theories Commonly Apply to Large Samples

在文檔中 RaschGSP IRT理論在大量數據教育測驗上之應用 (頁 31-36)