• 沒有找到結果。

Reliability

The internal consistency form of reliability was assessed in this study. Internal consistency is the extent to which items within a dimension are correlated with each other. It will be examined by three methods: item-scale correlation (Streineer, 1990), and Cronbach alpha (Cronbach, 1951). Item-scale correlations, which assess the extent to which an item is related to the remainder of its scale, should exceed 0.4 (Kline, 1986) whereas Cronbach alpha, which measures the overall correlation

between items within a scale, should exceed 0.7 (Nunnally, 1994) or 0.8 (Ware, 1993) to be considered acceptable.

Validity

Six aspects of validity will be evaluated: Internal validity (convergent and discriminant), criterion validity, construct validity, clinical test of validity, relative validity, and factorial validity.

Internal validity

The convergent and discriminant validity of SF-36 was examined by the multitrait multimethod matrix (Campbell & Fiske, 1959).. For convergent validity, the correlation between comparable dimensions on SF-36 and Chinese Health Questionnaire (CHQ) - for example, between mental health and depression and poor family relation - should be higher than the correlations between less comparable dimensions - For example, physical functioning and social dsyfunction. We‘ll test discriminant validity by comparing item to own scale correlation with item to other scale correlation. The item to own scale correlation should be higher if the categories within the SF-36 questionnaire are valid.

Construct, Criterion, and Clinical tests of validity

Construct validity assesses the extent to which a measure is related to criteria derived from an established clinical or social theory or “construct”. One method is to examine construct validity, where hypotheses or constructs concerning the expected distribution of health between groups are examined by the measure being validated

(Streiner, 1989; McDowell, 1987). Therefore, the scales will be compared to

assessments of physical and mental health based on information independent of SF-36.

The study population was stratified along three variables corresponding to mental and physical health. The mental health variable has 3 levels:

Group 1: number of life events ƙ1; Group 2: number of life events 2-5 Group 3: number of life events ≥ 6

Two variables for the physical health variable, the first one has 2 levels:

Group 1: no chronic condition; Group 2: any chronic conditions The second one has 3 levels:

Group 1: 18-34 years old; Group 2: 35-49 years old; Group 3: ≥ 50 years old

As to clinical test validity, one-way ANOVA will be applied to make comparisons between these groups.

Relative Validity

The relative validity of each scale in measuring each dimension of health was assessed by the ratio of variance explained by the scale of interest (i.e., the scale coefficient squared) to the variance explained by the “best” scale (McHorney, 1993;

Liang, 1985). Relative validity was assessed by the ratio of F-statistics, derived from one-way ANOVA models[12] for comparisons between mental health groups and among physical health groups. Again, the scale with the highest F-statistic was the reference with a relative validity of 1.

Exploratory Factory Analysis

In addition to evaluation of five aspects of validity, tests of validity will be applied in this study. Exploratory factory analysis (Child, 1990), a technique of psychometric validation, assesses the agreement between hypothetical factors that go to make up the measure and the scales designed to assess those factors. If the Chinese version of SF-36 is a valid measure for use, the scales defined by this authors should merge from a factor analysis of these two samples from general population, and items relating to a particular scale should be grouped together within a single factor. Within such an assessment a factor should be considered relevant only if its eigenvalue (a statistical measure of its power to explain variation between subjects) exceeds 1.1 (Jolliffe, 1986).

Results

Random Sample of General Population

Table 1 provides information on the distributions of sociodemographic characteristics, number of chronic diseases, and having illness during the past 6 months. Of 426 respondents, 155 (36.4%) were 18-34 years old, 231 (54.2%) were male, 220 (55.8%) had more than 12 years of education, 283 (69.4%) had income more than 3,5000 NT dollars, 53 (12.4%) didn’t have any chronic disease, and 32 (7.5%) had been ill during the past 6 months.

The abbreviated English and Chinese content for each SF-36 item and scale assignment are shown in Table 2. These scales were constructed to be

multidimensional. The SF-36 survey includes a single-item measure of health transition, which is not used to score any of the eight multi-item scales.

The number and percent of participants missing each of the 36 items is presented in Table 3 for the random sample of Taichung city population.

Missing-value rates for the 36 items were consistently low, ranging from 0.2 to 1.2 and averaging 0.66.

Table 4 presents the percentage of items within each scale that were computable.

For the total sample, these percentages were very high across scales, ranging from a low of 96.0% (RE) to a high of 99.5 (PF). Data completeness was not significantly different across scales among different subgroups. Older subgroups (>50 year old) and subgroup with chronic disease has slightly higher rate of complete items in all eight scales and subgroup with illness also has slightly higher rate of complete items in all scales except for PF and GH.

Average scores and quartiles of score distributions (Table 5) indicated that the population was generally in good health. Substantial ceiling effects were observed for 5 of the 8 scales while no substantial floor effect was observed for all 8 scales. The scales with substantial ceiling effects were physical functioning, role-physical, social functioning, and role-emotional.

Table 6 presents the item means and standard deviations and results of

item-scale correlation coefficients. Standard deviations of items belonging to a given scale were fairly homogeneous. A possible exception was the physical functioning scale, where standard deviations varied from 0.27 to 0.68. This was due to higher proportion of respondents answering “limited a little” for “vigorous activities” than

other items. Three phenomena were observed from the correlation coefficients. The first one was that we observed fairly homogeneous correlation coefficients between an item and its hypothesized scale. The second is that almost all correlation

coefficients between an item and its hypothesized scale had strong associations (≥0.7).

The last was that the correlation coefficients between an item and other scales were much smaller than coefficients between an item and its hypothesized scale.

Results of scale tests, item-discriminant validity and item-convergent validity based on the matrix in Table 6 are presented summarized in Table 7. Perfect scaleing success rates for item-discriminant and item-convergent validity were achieved across all eight SF-36 scales. In 277 comparisons out of 280, the correlation between an item and its hypothesize scale exceeded correlations with all others scales by more than 2 standard errors. In addition, all items satisfied the criterion set a priori for convergent validity, i.e. a correlation with own scale ≥0.4. Thus, the success rate for discriminant validity was 98.9%, and for convergent validity, 100.0%.

Table 8 presents Cronbach’s α across scales for overall group and 15 subgroups.

These subgroups differed in terms of sociodemographic characteristics and chronic conditions. Overall, Cronbach’s α ranged from 0.63 to 0.97. Minimum standards of reliability for purposes of group comparisons (≥0.5 or ≥0.7) were satisfied for overall group for all SF-36 scales in this population while 4 Cronbach’s α for 15 subgroups did not satisfied with this minimum standards (scales of vitality and mental health for 9-12 years of education and social functioning scale for age >65 years old and for male). Among different scales, the social functioning scale had the lowest values of Cronbach’s α; possibly because this scale contains only two items. It also had more variation across different subgroups relative to other scales, particularly for gender. The scale of role-physical and physical functioning had the highest internal consistency relative to the other scales for overall and all subgroups, and had more homogeneous coefficients across different subgroups. Role-physical was also the only scale that consistently exceeded the minimum standard of 0.90 for comparisons of scores for individual patients while physical function exceeded this standard except for subgroups of 35-49 years old and >12 year of education. In general, all

Cronbach’s α values of all scales were consistent across different subgroups.

Validation by Factor Analysis

Factor analysis identified seven relevant factors, with eigenvalues ranging from 1.01 to 14.02 and with proportions of total variance ranging from 2.87% to 40.05%

(Table 9). The proportion of total variance of these seven factors explained by these items ranged from 59.0% (MH2) to 87.4% (for RP3) (not shown in the table). The

factors (factors 2 and 5). Factor 1 was formed by 8 items of physical functioning and 1 item of social functioning. The other one item of social functioning (SF2) did not have any coefficient higher than 0.4, indicating little contribution to any factors.

The highest coefficient of SF2 was 0.36 for factor 1. This might imply factor 1 corresponded to the combination of physical functioning and social functioning. The remaining 2 items of physical functioning combined with bodily pain and then formed factor 7. The other 3 factors corresponded to 3 scales of the SF-36: role-physical, general health perception, and role-emotional.

Validation by the Hypothesized Dimensionality of the SF-36 scales

We used principal component analysis to test the hypothesized dimensionality of the SF-36 scales. Because we hypothesized two dimensions to underline the structure of the eight scales, we extracted two principal components. To facilitate interpretation, we further rotated the components to orthogonal structure using the varimax method.

The proportion of variability in one of the principal components explained by each scale was obtained by squaring the corresponding correlation coefficient. To evaluate the factorial validity of each scale as a measure of each component, we first squared each factor loading (scale-component correlation) to estimate the proportion of variance shared with that component (common-factor variance). We defined the scale sharing the most variance with each component as the most valid measure of that component. For each component, we then estimated relative validity (RV) for each scale by dividing the variance shared with the component by that estimate for the most valid scale. These ratios indicate in proportional terms how much less valid each scale is relative to the most valid scale. The higher the RV of a scale, the more

precisely or efficiently it measures the underlying construct of interest as defined by the most valid scale.

Factor analysis of eight health scales produced 2 principal components. The first (“physical health”) explained 56.5% of total variance, while the second (“physical health”) explained 13.1%, for a total of 69.6%. The proportion of total variance explained by these 6 scales varied between 45 and 86%. Only 6 out of the 16 observed correlations between individual scales and principal components followed the pattern that was hypothesized by McHorney et al (Table 10). We found that scales of general health, social functioning, role-emotional, and mental health correlated more strongly with “physical” component than was predicted. Scales of physical functioning, role-physical, bodily pain, and general health correlated more strongly with “mental” component than was expected while role-emotional correlated slightly less strongly with “mental” component than was expected. Even though the

concordance rates with hypothesized correlations was low, the order of correlation within each component was generally consistent with a priori hypothesized by McHorney et al. The relative validity of a scale was given by the ratio of explained variance to that of the best scale: physical functioning for the “physical” component, and mental health for the “mental” component. In general, the patterns of relative validity were consistent with prediction.

Validation by Norm-based Interpretation

Lower scores on the SF-36 reflect poorer health state. Table 11 shows normative data in the form of means and standard deviations, broken down by age, gender, education, income, chronic disease and having illness. Overall, older subjects reported significantly poorer health on all scales of the SF-36 except for mental health than did younger subjects (all significant scales p<0.001, except for role-emotional p=0.0346). Women only reported poorer health on vitality scale than did men (p=0.0146). There were significant differences in scores among subjects with different levels of education on all scales of the SF-36 except for role-emotional and mental health (p<0.001 on vitality and physical functioning scales, p<0.01 on bodily pain, general perception of health, and social functioning; and p<0.05 on

role-physical). Subjects with lower income reported poorer health on physical functioning, role-physical, general perception of health, and vitality (p<0.001 on general perception of health; p<0.01 on physical functioning and vitality; and p<0.05 on role-physical). Subjects with chronic disease had significantly lower scores on all scales than those without (p<0.001 on all scales except for role-emotional p<0.01).

Subjects reporting an illness during previous 6 months had significantly lower scores on all scales than those without (p<0.001 on all scales except for physical functioning, social functioning, role-emotional, p<0.01, and mental health p<0.05).

Construction Validation

Table 12 shows the means score in the group with no chronic disease, mean difference between groups with and without chronic disease, F-statistics, and estimates of RV. Patients with any chronic diseases scored significantly lower on all eight scales compared to patients with no chronic disease. General health scale was the most valid in detecting differences between patients with and without chronic disease. Vitality scale was the second most valid scale, followed by the role-physical, social functioning, physical functioning, and bodily pain. As hypothesized, the best mental health scales (mental health and role-emotional) performed most poorly in this test.

Primary Care Sample

Table 13 provides information on the distributions of sociodemographic characteristics, number of life event, taking medicine, and having chronic disease among outpatients. Of 284 outpatients, 140 (49.3%) were 18-34 years old, 133 (47.0%) were male, 170 (73.9%) had more than 12 years of education, 228 (80.6%) had more than one life event during the past month, 138 (49.1%) were taking medicine, and 117 (41.8%) had any chronic disease.

The number and percent of outpatients missing each of the 36 items is presented in Table 14 for the outpatient sample of primary care setting. Missing-value rates for the 36 items were consistently low, ranging from 0.0 to 2.4 and averaging 1.44.

Table 15 presents the percentage of items within each scale that were

computable for the outpatient sample of primary care setting. For the total sample, these percentages were very high across scales, ranging from a low of 97.6% (RE) to a high of 99.3 (MH). Data completeness was not significantly different across scales among different subgroups. In general, age group of 35 to 49 years old, education group less than 9 years, and subgroup with more than 5 life events had slightly lower rate of complete items in all eight scales.

Average scores and quartiles of score distributions (Table 16) indicated that the population was not in good health. Substantial ceiling effects were observed for 4 of the 8 scales and they are physical functioning, role-physical, bodily pain,

role-emotional. Moderate floor effects were observed in scales of role-physical and role-emotional.

Table 17 presents the item means and standard deviations and results of

item-scale correlation coefficients. Standard deviations of items belonging to a given scale were fairly homogeneous. A possible exception was the physical functioning scale, where standard deviations varied from 0.27 to 0.68. This was due to higher proportion of respondents answering “limited a little” for “vigorous activities” than other items. We also observed three phenomena from the item-scale correlation coefficients. The first one was that we observed fairly homogeneous correlation coefficients between an item and its hypothesized scale. The second is that almost all correlation coefficients between an item and its hypothesized scale had strong to moderate associations (0.7-0.3). The last was that the correlation coefficients between an item and other scales were much smaller than coefficients between an item and its hypothesized scale.

Results of scale tests, item-discriminant validity and item-convergent validity

based on the matrix in Table 17 are presented summarized in Table 18. Perfect scaling success rates for item-discriminant and item-convergent validity were achieved across 6 and 5 of eight SF-36 scales, respectively. In 270 comparisons out of 280, the correlation between an item and its hypothesize scale exceeded correlations with all others scales by more than 2 standard errors. In addition, all items except for 2 items, one for physical functioning and the other for mental health, satisfied the criterion set a priori for convergent validity, i.e. a correlation with own scale ≥0.4. Thus, the success rate for discriminant validity was 96.4%, and for convergent validity, 94.3%.

Table 19 presents Cronbach’s α across scales for overall group and 15 subgroups. These subgroups differed in terms of sociodemographic characteristics, life events, and chronic conditions. Overall, Cronbach’s α ranged from 0.61 to 0.89.

Minimum standards of reliability for purposes of group comparisons (≥0.5) were satisfied for overall group for all SF-36 scales in this outpatient sample while 4 Cronbach’s α for 15 subgroups were not satisfied with this minimum standards (scales of bodily pain for life events ≤ 1 and those without taking any medicine and mental health scale for education ≤ 9 years and life events ≤ 1.). These Cronbach’s α below minimum standards also more varied across subgroups. Among different scales, the social functioning scale had the highest values of Cronbach’s α, and next were role-physical, physical functioning, and role-emotional. In general, all

Cronbach’s α values of all scales were consistent across different subgroups.

Validation by Factor Analysis

Factor analysis identified 8 relevant factors, with eigenvalues ranging from 1.12 to 10.61 and with proportions of total variance ranging from 3.21% to 30.30% (Table 20). The proportion of total variance of these 8 factors explained by these items ranged from 41.6% (PF6) to 86.3% (for BP1) (not shown in the table). Physical functioning scale separated into 2 factors (factors 1 and 3). Mental health and vitality scales were combined together and then separated into two factors (factors 2 and 4).

Factor 5 was formed by 3 items of role-emotional and 1 item of social functioning.

Although the coefficient of the other item of social functioning (SF2) is not greater than 0.4 in factor 5, the coefficient of this social functioning item was highest in factor 5. The other 3 factors corresponded to 3 scales of the SF-36: role-physical, general health perception, and role-emotional.

Validation by the Hypothesized Dimensionality of the SF-36 scales

We used principal component analysis to test the hypothesized dimensionality of the SF-36 scales in this outpatient sample. Factor analysis of eight health scales

produced 2 principal components. The first (“physical health”) explained 50.5% of total variance, while the second (“physical health”) explained 13.0%, for a total of 63.5%. The proportion of total variance explained by these 6 scales varied between 52.0% and 77.9%. Only 8 out of the 16 observed correlations between individual scales and principal components followed the pattern that was hypothesized by McHorney et al (Table 21). We found that scales of physcial functioning and bodily pain did not correlate strongly enough with “physical” component than was predicted while role-emotional and mental health correlated slightly more strongly with

“physical” component than was predicted. Scales of physical functioning, role-physical, bodily pain, and vitality correlated more strongly with “mental”

component than was expected. Even though the concordance rates with hypothesized correlations was not high, the order of correlation within each component was

generally consistent with a priori hypothesized by McHorney et al. The relative

validity of a scale was given by the ratio of explained variance to that of the best scale:

physical functioning for the “physical” component, and mental health for the “mental”

component. In general, the patterns of relative validity were consistent with prediction.

Validation by Distinguishing Subgroups

Lower scores on the SF-36 reflect poorer health state. Table 22 shows means and standard deviations, broken down by age, gender, education, life event, taking medicine, and chronic disease. Overall, older subjects reported significantly poorer health on physical functioning and role-physical than did younger subjects (p<0.001 for physical functioning and p<0.01 for role-physical). Women only reported poor

Lower scores on the SF-36 reflect poorer health state. Table 22 shows means and standard deviations, broken down by age, gender, education, life event, taking medicine, and chronic disease. Overall, older subjects reported significantly poorer health on physical functioning and role-physical than did younger subjects (p<0.001 for physical functioning and p<0.01 for role-physical). Women only reported poor

相關文件