Chapter 2 Literature Review
2.1 Brief Introduction of Sample Size - Large Sample, Small Sample and
2.1 Brief Introduction of Sample Size - Large Sample, Small Sample and Relationship between Them
2.1.1 Brief Introduction of Sample Size
Scientific research in general and education measurement in particular, the statistical sample size plays a very important role because of reason which have two opposite issues, the first is statistical power, the second includes cost, effort and data collection conditions. Statistical power is the probability that a statistical test will indicate a significant difference when there truly is equal to one. Statistical power is analogous to the sensitivity of a diagnostic test, and one could mentally substitute the word
“sensitivity” for the word “power” during statistical discussions. The second issue says that some measurements contain a large amount of information concerning the parameter of interest, others may contain little or none. Since the product of research is information, its “purchase” is expected to be at minimum cost. Therefore, the estimation of sample size should be taken when studying. The process of estimating sample size depends on some assumptions and parameters, namely at the following 5 basic elements:
modeling research, sampling variation and variance, effect size, significance, and power (Eng, 2003; Florey, 1993; Henry, 1990; Noordzij et al., 2010).
The ever increasing demand for research has created a need for an efficient method of determining the sample size needed to be representative of a given population. Krejcie and Morgan (1970) had published a formula for determining sample size. It is easy to
12
consult calculation which could have been constructed using the following general formula (Krejcie & Morgan, 1970):
In the case of population size unknown:
) 1 ( )
( 2
2 2 /
1 P P
d
n Z (2-1)
where, P: population proportion, d: confident limit around the point estimate, Z:
Z-score corresponding to expected statistical meaning.
In the case of population size known (less than 10,000), sample size is calibrated:
N n Nc n
1
(2-2)
where, N: population size, n: sample size calculated in (2-1)
In specific cases, the formulae are different, all can be seen in (Bartlett, Kotrlik, &
Higgins, 2001; Cochran, 1977; Desmond & Glover, 2002; Hayes & Bennett, 1999;
Kerry & Bland, 1998; Kerry & Bland, 1998).
2.1.2 Relationship between Large and Small Samples
Although it is difficult to draw a clear-cut line of demarcation between large and small samples, but it is normally agreed amongst statisticians that a sample is to be recorded as large only if its size exceeds 30. After World War II, for doctoral dissertations and most other purposes, when comparing groups, the proper sample size is 30 cases per group. The number 30 has arisen from the understanding that with fewer than 30 cases, this was dealing with “small” samples that required specialized handling with “small sample statistics” instead of the critical-ratio approach had been accepted (Cohen, 1990).
Hogg, Tanis, and Rao (1977) wrote that, sample size which was less than 25 or 30 would be considered small and so more than that number would be considered large.
The reason of this is that when sample size is more than 30, its student’s t-distribution approximates to normal distribution. This study agrees with this judgment that means if sample whose size is less than 30 it will be considered small sample.
13
2.1.3 Minimum Size of Sample for Statistics
For a normal distribution, there are about 68% of values drawn within one standard deviation sigma (σ) away from the mean, about 95% of values lain within two standard deviations, and about 99.7% of values lain within three standard deviations. This fact is known as the 68-95-99.7 (empirical) rule or the 3-sigma rule (Govindaraju & Lai, 2004;
Maronna, Martin, & Yohai, 2006; Pukelsheim, 1994). More obviously, it can be seen in the Figure 2-1 (Moore, 2010).
Note: Adapted from “The basic practice of statistics” Fifth ed., (p. 75) by Moore, 2010, United States of America: Palgrave Macmillan.
Fig. 2-1 The 68–95–99.7 rule for normal distributions
In mathematical notation, these facts can be expressed as follows, where x is an observation from a normally distributed random variable, μ is the mean of the distribution, and σ is its standard deviation:
683 . 0 ) Pr( x
955 . 0 ) 2 2
Pr( x 997 . 0 ) 3 3
Pr( x
where Pr is abbreviated by probability
14
Now, a minimum subject number is necessary determined to efficiently acquire the trustful data for normal distribution, this number proposed is eleven in the following section (Yamaguchi et al., 2006).
In fact, the following expansion uses the nature of the standard deviation:
When x, 0;di xi x, there is
and n is the number of subjects considered
2
It is estimated the minimum number of subjects lying in normal distribution using the above (2-5).
15
Formula (2-7) shows that xk satisfies (2-3), but with this sample size, it cannot be detected outliers in case of n9. It can be considered the standard deviation which can be computed in case of n9, it cannot be determined whether there are outliers (messy data) in 3 range.
(2) Consider the case of 10n
Let n10, the left side of (2-5) is equal to 102. As expanding the equation (2-5) by the same procedure above (case of (1)),
2 2 satisfy (2-3). It is possible to detect as an outlier if xk jumps out of other data, range of
3 . Of course the number of subjects can be more if all of them satisfy (2-3) for normal distribution.
Because denominator of right side of (2-4) is n1, so n10 is applicable to the case (1), and n11 corresponds to the case (2). The conclusion is that the minimum subject number required is eleven for normal distribution in range of the 3-sigma rule.
2.1.4 Necessity of Nonparametric Statistical Methods
Statistical science usually tends to focus on what are called parametric statistics.
These techniques are termed parametric because they focus on specific parameters of
16
the population, commonly the mean and variance. In order to utilize these techniques, the following assumptions regarding the nature of population from which the data are drawn must be satisfied (Pett, 1997; Tomkins, 2006):
(a) Normal distribution of the dependent variable (b) A certain level of measurement: Interval data
(c) Adequate sample size (more than 30 recommended per group) (d) An independence of observations, except with paired data
(e) Observations for the dependent variable have been randomly drawn (f) Equal variance among sample populations
(g) Hypotheses usually made about numerical values, especially the mean
In practice of measurement and educational measurement in specific, one or all of these parametric assumptions is often broken. In many cases, the solution to this problem is another group of tests for statistical inference, which do not make strict assumptions about the population, is known the nonparametric statistics – distribution free (Gibbons
& Chakraborti, 2011; Siegel, 1957). This study proposes new assessment method which is considered nonparametric statistical method named RaschGSP IRT applying to educational measurement to solve urgent problem that faces us.