4.6 Research Design
4.6.3 Questionnaire Pilot Study
4.6.3.1 Items Analysis
Items analysis was used to identify and discern items' suitability in the study (DeVellis, 2003). The survey - generated from the initial focus group of service expectation of experienced TPB spectators - needed to undergo a series of examinations, particularly in its second section. This process was used to develop the pilot questionnaire into the formal one, and involved employing a series of statistical examinations to ascertain the value of the items used in the 252-individual pilot sample. Items analysis examined the appropriateness of questions used in the pilot test with 5 quantitative criteria: “Missing Data Examination”, “Description Statistics Value”, “Comparisons of Extreme Groups”,
“Item-Total Correlations” and “Factor Analysis”. These categories where then used to determine the validity of each item (DeVellis 2003; Netemeyer, Bearden and Sharma, 2003). The study employed a relatively large number of analysis methods in order to ensure a rigorous testing of all data in the study and increase the study’s reliability and effectiveness.
Missing Data Examination
A Missing Data Examination analyses the ratio of responses in which a question is left unanswered. It is used to try and identify which questions are hard to answer or provide respondents with any reason to withhold their answer. Hard-to-answer questions are often skipped, either because the respondents are unsure of their answer or because the answer may be
personal. One can logically conclude that a given question with a high ratio of missing data demonstrates poor targeting skills and is inappropriate in a formal questionnaire. After the analysis of the pilot test results, no item in the study was shown to have been left unanswered more than 5% of the time, meaning that they were all appropriately targeted as respondents had no problem answering (See Appendix D-1).
Description Statistics Value
Items are tested for description statistics value in order to identify items that have a centralised effect and avoid extreme performance results (DeVellis, 2003). Three quantitative criteria of description statistic value are used to identify the appropriateness of any given item: the Item Mean, the Item’s Standard Deviation and its Coefficient of Skewness (SK value). These 3 criteria were all used to identify the range over which item performance fell in the scale.
The ‘mean’ is simply the arithmetic average of a distribution of scores, and gives researchers a rough summary of the distribution of statistics (Timothy, 2001). The ‘standard deviation’ provides a measure of the average or standard distance from the mean (Frederick and Wallnau, 1992). Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. In a normal distribution, approximately 68% of the values lie within one standard deviation of the mean and approximately 95% of the data lies within two standard deviations of the mean. In other words, these three criteria all evaluate the distributing range, and the closer the value of the item is to its mean, the more appropriate the item (Rohatgi and Saleh, 2008).
The generally-accepted standard of item appropriateness is a mean value of within ±1.5 standard deviations of the scale; a standard deviation value higher than 0.75; and a Coefficient of Skewness (SK) value of within ±2. These standards are generally considered appropriate of well-targeted items in a scale (DeVaus, 2002). In this study, the mean values ranged from 4.848055 to 3.046345, each item’s standard deviation was lower than 0.75 and the SK values were all within ±2 on the importance scale, meaning that each item fell
inside what has been identified as a suitable range (See Appendix D-2).
Comparisons of Extreme Groups (CR Value)
One particular criterion of comparison of extreme groups is similar to a description statistics test. Netemeyer, Bearden, et al. (2003) demonstrated
“Extreme Groups Comparisons” test by using pilot samples on a total score of scale. They identified samples falling into two ‘extreme’ groups of “High 27%”
and “Low 27%” by computing the average mean of each item against samples scored in the extreme group. ‘T-Testing’ a question is used to obtain the CR Value of the item; if the value falls into either extreme group, the question is either unstable or confuses people easily – either way, it is inappropriate for the questionnaire.
However, attention must be paid to the limiting 'percentage' at which we label a group an extreme group. Although 27% is the standard used by the majority of researchers and in this study, different standards are employed in different situations (DeVellis, 2002). For instance a wider cut-off range may be used when the number of valid samples in a group is too small, or a smaller range when pool samples are too many. Netemeyer, Bearden, et al. (2003), however, suggested that 27% is a suitable standard regardless of the number of samples, due to the ‘effect of over-rejection’ (see Netemeyer, Bearden, et al., 2003). In order to avoid the over-rejection effect during t-testing, the study adopted α=.01, as well as CR≧2.58, of qualify discrimination (good); or α=.001, CR≧
3.29 of well discrimination (acceptable) in this case. All items showed good discernment (see Appendix D-3 for a full table of results).
Item-Total Correlation
Individual items need to be substantially correlated with the other items with the same measuring objective as a collective or a group, and DeVellis stated in 2003 that examining the properties of each item by its item-scale correlation would achieve this. The correlation coefficient measures the degree of relationship between two interval variables. As Netemeyer et al. (2003) has also illustrated that when determining the item-scale correlation it is best to work out each item’s score over the whole scale first and then work out the
coefficient of correlation between the overall score and each item’s individual score.
In the function of reliability analysis of SPSS software, the coefficient value was labelled the ‘corrected item-total correlation’, in accordance with DeVellis (2003)'s suggestions . The corrected item-total correlation correlates the item being evaluated with the other scale items and clearly showed the relationship between each individual item and the others. The standard used for item-total correlation in the study was 0.3. Netemeyer, Bearden, et al. (2003) also supported Cronbach's assertion that a value of α higher than 0.7 shows a high level of correlation, while a value of between 0.7 to 0.3 shows an average level of correlation, with lower than 0.3 being seen as a poor level of correlation that can influence the inter-consistency of items.
This study followed the suggestions of Netemeyer, Bearden, et al. (2003) and identified items' corrected item correlation value using Cronbach's α values.
All of items in the study’s questionnaire all measure the same thing and all had an α value of higher than 0.35, making them appropriate for the questionnaire (see Appendix D-4 for full results).
Factor Analysis
Factor analysis is a common technique of testing or evaluating the relationship between relative factors or individual items, and many researchers use factor analysis to estimate items’ ability of explanation (DeVellis, 2003). Factor loading produces an objective view of appropriateness of items when compared to merely judging items based on their total score amounts, especially when considered that the questionnaire comprised of a psychology test consisting of a series of questions with many different components or factors behind them, and that while there is some correlation between different factors, different factors influence each item in different ways. The study employed factor analysis to evaluate the relationship between each items and relative factors.
Two common methods of factor analysis are ‘Principal Component Analysis’
and ‘Principal Factor Analysis’. According to Howitt and Cramer (2007), the most basic sort of factor analysis is the principal components method. This is a mathematically-based technique with the following characteristics: (a) the factors extracted in order of magnitude from the largest to smallest in terms of the amount of variance explained by the factor (factors, as variables, have a certain amount of variance associated with them); (b) each of the factors displayed the maximum amount of variance (Howitt and Cramer, 2007: 331).
There are two advantages to conducting a factor analysis. Firstly, factor loading categorises the varied service items into several major factors, assisting in easy classification of the items; secondly, the classification system developed via factor loading can be used to eliminate items that prove inconsistent with all of the factors (unless the item is considered important enough to become a single-item factor). This second aspect of factor loading was utilised in this study to assess the consistency of items.
Consequence, the study used ‘Principal Component Analysis’ as the extraction method to estimate the loading in all enquiries. If the value lower than 0.3 would be identified as inappropriate question. In this study, all questions have qualified value of being an appropriate question (see Appendix D-5). In addition, ‘Principal Component Analysis’ and ‘Varimax with Kaiser Normalization’ as the extraction method and rotation method to identified 6 factors in this study. These factors were “participant characteristics”, “venue service”, “subsidiary service”, “game affair service”, “medical, sanitation and disability service” and “social and educated service” which named by the it’s characteristic (see Appendix D-6).