• 沒有找到結果。

測量醫療結果之新工具-中文版健康量表SF-36效度評估之研究

N/A
N/A
Protected

Academic year: 2021

Share "測量醫療結果之新工具-中文版健康量表SF-36效度評估之研究"

Copied!
65
0
0

加載中.... (立即查看全文)

全文

(1)

 

 -  SF-36 

The Emerge of A New Instrument for Medical Outcome - A Study of A Chinese-Version Short Form 36

: NSC 87-2314-B-039-023

: 1997  7  1  1998  6  30 

 : !"#

$%: &'()*

(2)

English Abstract

As the objectives of medical care for patients not only in prolonging the duration of life but also in improving the quality of life, achieving a more effective life and preserving function and well-being have been recognized, it has been increasing consensus about the importance of the centrality of the patient point of view in monitoring the quality of medical care outcomes. Therefore, the main efforts of the past decades is using standardized patient surveys to collect information on general health outcomes in order to serve research effectively. A new generic health-status measure, the SF-36, has been introduced and has attracted considerable interest. SF-36 has shown significant reliability and validity in four fields of application: monitoring population health, estimating the burden of different conditions, clinical trials of treatment effects, and monitoring outcomes in clinical practice.

The Chinese-version SF-36 was developed following the guidelines for cross-cultural adaptation. The primary objectives of this study are to provide estimates of reliability and validity of Chinese version SF-36 among individuals selected from general population and primary care attenders as well as to compare their health status between two groups.

A cross-sectional study with two samples was designed to collect information about SF-36, life events, clinical diagnosis, and sociodemographic factors. One sample will be recruited from primary care attenders in a large teaching hospital in Taichung while the other will be selected from a general population in Taichung using multistage sampling method.

Analyses were conducted across 15 subgroups differing in sociodemographic characteristics and chronic conditions. For both samples of each scale, item-completion rates were high across all subgroups (97.6% to 99.8%), but tended to be somewhat lower among the elderly and those with chronic disease. On average, surveys were complete enough to compute scale scores for more than 97% of the sample. For random sample, all scales passes tests for item-internal consistency (100% passed) and item-discriminant validity (98.9%passed). For outpatient sample, all scales pass tests for item-internal consistency (96.4% passed) and item-discriminant validity (94.3%passed). Reliability coefficients ranged from a low of 0.76 to a high of 0.93 across scales for random sample and from a low of 0.61 to a high of 0.93 for outpatient sample. Validation by factor analysis yielded results remarkably similar to those proposed by the authors who developed SF-36 for both

(3)

samples. For the comparisons of all scales of SF-36 between random and outpatient samples, subjects from primary care settings reported significantly compromised health status compared to subjects of general population after considering the effect of age, gender, education, and chronic conditions.

Keywords: short form 36 (SF-36); health status; validity; reliability; primary care; general population;

(4)

 

     !"#$ %&'()*+,-./&0123456 &7 8'9:;<=>?@A2-./BCDEFGH$ &IJKLM NOP&QR2STU!"VW&XYLEFGHZ=[ ;\]&;^8!"VW_`abcde'(fgfhijklmno[ pq34;^rs!"VWtuvwx!"yz{|}u~ 34{|56abcde €‚klƒ„…†…=[ ‡ˆ‰&[abcde qŠ‹vˆPŒŽ'(‘’\$ “+ ”•‡ˆ‰[abcde …}u7 8;B–—‘&˜™šŽ<›œ ž…Ÿ;B[…EF ¡¢U[abcde £¤š¥`!"¦§+, ¨EF$ %&B}u‡ˆ‰[abcde q;^rs©ª«¬­./„…© …®¯°!"VW±²=[ ¨EF ³´µ¶EF·¸2STkUXYEFCD ¹º d»» ¼½ 2‡¾¿¬ÀÁ·Àª­ÂÃ./© Ä»» ¼;^rs;^rs¹ÅÆ Ç·ȇÉÉrÊËÌÍBÎÏÐÑÒZÓÒÊËÌÍšSTXYÔ Õ[abcdeÖ8.¤©/×+ØÑ=[ ¨EF£ÙÚ@Û\-/×+؆Ö8.wxÜ£ÝÞß®¯à ;^rs~ª­./°á%˜âZqi\Ýހklƒãäåæeç[è ääæéçê° ‡A믃ì©Ö8.ìá%˜âZí=ïðñò_`˜âZ à;^rs~ª­./€q äÄç-ó;^rsôõ… ö»»ç÷Þ …øB äéæä窭./ôõ…B äeæùç÷Þ…øB äùædç;^r sú;<8„…øû»æåeè»æädüñª­./ú;<8„… øû »æeö è »æäd-+ý£Ù2}uþ…56‚à;^rs~ª ­./šxâ+؀ “욡•_`þkl&`‚ þ…q’˨&®¯Ì¶qGA8Þ †Ö8.w  \ abcde _‡ª­./£¤œ;^rs,56‚ abcde !"_ `JÜ£v&˨,¡¢ abcde !"_`þ…Ÿ;\ =[ U!"_`!"VW „……[ [

(5)

Introduction ... 1

The Need of Medical Outcome Instrument... 1

Advances in Developing Medical Outcome Measures ... 2

The Short Form 36 (SF-36) ... 2

The Applications of the SF-36... 5

Monitoring Population Health ... 6

Estimating the Burden of Different Conditions ... 6

Clinical Trials of Treatment Effects ... 6

Monitoring Outcomes in Clinical Practice ... 6

The Needs of Chinese Version of SF-36 ... 7

1. Research for health policy and health behavior: ... 7

2. Cross-cultural adaptation of medical outcome measures: ... 7

Translation of Chinese-version SF-36 ... 7

The Need of Validation of Chinese Version SF-36 ... 8

Reliability ... 8

Validity ... 9

a. Content Validity... 9

b. Construct Validity ... 9

c. Internal Validity: Convergent and Discriminant Validity ... 9

d. Criterion validity ... 10

e. Factorial Validity ... 10

f. Clinical tests of Validity... 11

Specific Aims ... 12

Methods... 12

Study Design... 12

Subjects ... 12

Primary Care Sample... 13

General Population Sample ... 13

Evaluation of Non-response Bias... 13

Administration of the Questionnaire... 14

Measurement ... 14 SF-36 ... 14 Clinical Criteria... 14 Sociodemographic factors ... 15 Life event ... 15 Statistical Analysis ... 16 Reliability ... 16 Validity ... 16

(6)

Internal validity ... 16

Construct, Criterion, and Clinical tests of validity... 16

Relative Validity... 17

Exploratory Factory Analysis ... 17

Results ... 18

Random Sample of General Population ... 18

Validation by Factor Analysis ... 19

Validation by the Hypothesized Dimensionality of the SF-36 scales ... 20

Validation by Norm-based Interpretation ... 21

Construction Validation ... 21

Primary Care Sample ... 22

Validation by Factor Analysis ... 23

Validation by the Hypothesized Dimensionality of the SF-36 scales ... 23

Validation by Distinguishing Subgroups ... 24

Construction Validation ... 25

Regression Model of 8 Scales of SF-36 ... 25

Discussion... 26

Reference... 28 [

(7)

Introduction

The Need of Medical Outcome Instrument

As the objectives of medical care for patients not only in prolonging the duration of life but also in improving the quality of life, achieving a more effective life (McDermott, 1981) and preserving function and well-being (American College of Physicians, 1988; Cluff, 1981; Ellwood, 1988; Schroeder, 1987; Tarlov, 1983) have been recognized, it has been increasing consensus about the importance of the

centrality of the patient point of view in monitoring the quality of medical care outcomes (Geigle & Jones, 1990; Ware & Sherbourne, 1992).

However, information from patients about their experiences of disease and treatment has never been routinely collected in clinical research or medical practice and this information is not a part of the medical record, either. Therefore, it is unavailable for analysis in the current health care database.

Another reason for the need of medical outcome instrument is that traditional measures of morbidity and mortality are generally agreed to be too narrow to measure the potential benefits of health care interventions, which can influence a wide number of variables such as physical mobility, emotional well-being, social life, and overall well-being (Brazier, 1992). Therefore, many specific questionnaires were developed and were intended to encompass aspects of physical, psychological and social well-being in evaluating the outcomes of different forms of treatment and care (Wilkin, 1992; McDowell, 1987; Bowling, 1991). These consist of both disease-specific measures which are designed to be sensitive to the outcomes of particular disease processes and to characterize the impact of the disease, and generic measures which are designed to be applicable across a wide range of medical

conditions. But disease-specific clinical measures do not provide a complete picture of the impact of the disease upon patients. Most importantly, they fail to address the impact of the illness upon subjectively assessed function and well-being of patients (Longstreth, 1992). The inclusion of generic measures has been treated as central to the evaluation of treatment regimens and surveillance of disease progression

(Fitzpatrick, 1994). However, a generic questionnaire that is easy to administer, acceptable to patients, and short as well as being fully validated is few. Nottingham health questionnaire was one of the more widely used with its acceptability, and easy administration, but it has been criticized that it is not able to detect low levels of disability, which are important not only clinically but also to respondents (Ware &

(8)

Sherbourne, 1992).

Policy analysts also need to use information about functional status, well-being, and other important health outcomes to compare the costs and benefits of competing ways of organizing and financing health care services and managers of health care organizations seek to produce the best value for each health care dollar. Health outcome information will also be utilized by clinical investigators to evaluate new treatments and by practicing physicians and other providers to achieve the best possible patient outcomes. Therefore, the main effort of the past decades is using standardized patient surveys to collect information on general health outcomes in order to serve research effectively (Ware & Sherbourne, 1992).

Advances in Developing Medical Outcome Measures

Significant advances in methods for assessing patient perspectives about

functional status, well-being and other important health care outcomes during the past decade are (1) an improved understanding of the major dimensions of health and the validity of specific scales in relation to those dimensions (Hays & Stewart, 1990, Liang, 1986; Ware et al., 1981); (2) demonstration of the usefulness of standardized health surveys in clinical trials (Bombardier et al., 1986; Croog et al., 1986; Fowler et al., 1988); (3) health policy evaluation (Brook et al., 1983; Ware et al., 1986); (4) general population health surveys (Bergner et al., 1981; Stewart et al., 1988, 1989; Ware et al., 1986) and medical practice (Nelson & Berwick, 1989).

The Short Form 36 (SF-36)

A new generic health-status measure, the SF-36, has been introduced and has attracted considerable interest. The SF-36 is referred to as a generic measure

because it assesses health concepts that represent basic human values that are relevant to everyone‘s functional status and well-being (Ware, 1987, 1990a). Such measures are called generic not only because they are universally valued but also because they are not age, disease, or treatment specific. Generic health measures assess

health-related quality of life outcomes, namely, those known to be most directly affected by disease and treatment.

Generic health measures are not designed to serve as substitutes for traditional measures of clinical endpoints but to test generic health measures in parallel with clinical measures (Ware & Sherbourne, 1992). The potential of such comparisons is illustrated in the profiles of functional status and well-being for patients with different medical and psychiatric conditions and in contrast to the general U.S. population.

(9)

validity of generic health measure scales in describing groups of patients known to differ in functional status and well-being. These comparisons also facilitate understanding among clinicians of the meaning of differences in generic health measures scale scores because these diagnostic groups are familiar.

A long battery of health measures has its excellent characteristics in terms of traditional psychometric standard of reliability, validity, and precision. But its advantages have been cut down by considerable costs of data collection and respondent burden in terms of new psychometric standard of feasibility and practicality. During the stage of developing SF-36, the researcher attempts to achieve reductions in respondent burden with sacrificing measurement precision below the critical level (Ware & Sherbourne, 1992). This reduction was accomplished by constructing scales from more efficient items. In the Health Insurance Experiment (HIE), for example, 25 items were necessary to define seven levels of physical functioning (Stewart et al., 1978). With the SF-36 Physical Functioning, only 10 items are necessary to define 21 level of functioning (Stewart & Kamberg, 1992). In addition, empirical data confirmed its acceptablility, quickness, comprehensibility, the appropriateness of the items and coverage among elderly

people, young adult population, and patients with Parkinson‘s disease. (Hayes, 1994). The SF-36 has also been compared in “normal” populations with the Nottingham health profile, and has been reported to be preferable for measuring improvements in health in a population with relatively minor conditions such as in general practice or in the community (Brazier, 1992). This is because more subjects use a wider range of scores, which leads to a greater power to discriminate between groups.

Dimensions of SF-36

Although SF-36 try to reduce respondent‘s burden, it still has 8 categories of operational definitions to measure four health concepts: which includes one

multi-item scale measuring each of eight health concepts (a) behavioral functioning, (b) perceived well-being, (c) social and role disability, and (d) personal evaluations (perceptions) of health in general. Table 1 shows the physical and mental health phenomena assumed to be represented by scales, which (1) physical functioning (PF), (2) role limitation due to physical health problems (RP), (3) bodily pain (BP), (4) general health (GH), (5) vitality (energy/fatigue) (VT), (6) social functioning (SF), (7) role limitations due to emotional problems (RE), and (8) mental health (MH)

(10)

Table 1. Summary of health phenomena captured by SF-36 scales. PF RP BP GH VT SF RE MH

Function well-being disability perception mental physical mental physical mental physical mental physical

x x x x x x x x x x x x x

The SF-36 has undergone a considerable amount of testing for reliability and validity among patient population in the USA and shown to detect differences in health status for patients with different types and severity of medical condition (Ware, 1992; Jenkinson, 1993; Brazier, 1992; Garratt, 1993; Jenkinson, 1993; Lyons, 1994; McHorney, 1992; 1993). This provides evidence for the potential value of this measure in identifying sickness-related dysfunction among patients. It has been adapted for use in UK, with the wording of six questions slightly altered, and it has also been demonstrated to achieve high levels of reliability and construct validity among community and patient populations in the UK (Brazier, 1992; Jenkinson C, 1993; Garratt, 1993).

Fourteen reliability assessment studies had reported that all estimates exceeding accepted standards for measures used in group comparisons. For each scale, the median of the reliability coefficients across studies equals or exceeds 0.80, with the exception of the Social Functioning scale (the median for this two-item scale is 0.76). These results support the use of the SF-36 scales in studies of health status that are based on group-level analyses. Only the Physical Functioning scale consistently exceeded the 0.90 standard of reliability, which some consider a minimum

requirement for comparisons of scores for individual patients (Ware, 1992). Table 2 represents the summary information of the SF-36. The scales of validity in Table 2 was ordered according to their validity, from the scale known to be the most valid measure of the physical component of health status, PF, to the last scale in the table, MH, which is the most valid measure of the mental component of health status. Interestingly, MH is the poorest measure of the physical component, and PF is the poorest measure of the mental component. Scales in between PF and MH are ordered according to their validity in measuring physical and mental

(11)

The SF-36 survey of generic health concepts is a promising tool for monitoring the results of care. Prior to the Sf-36, not only none of health outcome measuring generic functional status and well-being measures had received widespread adoption, but also none had been shown to be suitable for use across diverse populations and health care settings. As a result, the opportunity to describe differences in

functioning and well-being for both the sick and the well was lost. Little was known about how patients suffering from one chronic medical or psychiatric condition differed from each other in terms of functional status and well-being. The SF-36 provides a common yardstick to compare those patients with chronic health problems to those sampled from the general population.

In summary, factors limiting the rate of progress in monitoring health outcomes from the patient point of view have included the absence of measurement tools with good psychometric properties that are easily administered and well documented. The SF-36 offers one approach for achieving these objectives. Standardization of SF-36 content and scoring will make meaningful interpretation and comparisons of results across studies possible.

Table 2. Information about SF-36 health status scale.

RP BP GH VT no of item no of level reliability Validity PF P M Meaning of Score Low High

Limited a lot in performing all physical activities

Performs all types of physical activies 10 21 0.93 *

-Problems with work or other daily activities

No problems with work or other daily activities 4 5 0.89 *

-Very severe and extremely limiting pain

No pain or limitations due to pain

2 11 0.90 *

-Evaluates personal health as poor and believes it is likely to get worse

Evaluates personal health as excellent 5 21 0.81 + +

Feels tired and worn out all of the time

Feels full of pep and energy all of the time 4 21 086 + +

Extreme and frequent interference with normal social activities due to physical or emotional problems

Performs normal social activities without interference

2 9 068 + *

Problems with work or other daily activities as a result of emotional problems

No problems with other work or other daily activities as a result of emotional problems

3 4 082 - *

Feelings of nervousness and depression all of the time

Feels peaceful, happy, and calm all of time 5 26 0.84 - *

Believes general health is much better now than one year ago

Believes general health is much worse now than one year ago

1 5 a a a

RHT SF

MH RE

a: validity is not available.

The Applications of the SF-36

(12)

discussed below: (1) monitoring the health of the general population, (2) estimating the burden of different conditions, (3) clinical trials of treatment effects, and (4) monitoring outcomes in clinical practice.

Monitoring Population Health

The health of the general population in developed countries cannot be well understood from analyses of treatment survival rates or from population mortality statistics (Elinson & Mattson, 1984). Application of standardized generic measures of physical and mental function and well-being, social and role disability, and general health perceptions will make comprehensive monitoring of the health of the general population possible.

In order to measure the health of the general population and to compare

different population groups, we can compare SF-36 profiles for different populations. The trend of the health between different populations will be revealed by the

difference of mean scores of each scale. Standardization of the SF-36 for use in all countries will facilitate further study of population differences, specific treatment benefits, and various health care policy issues (Aaronson et al., 1992).

Estimating the Burden of Different Conditions

The SF-36 and other standardized assessment methods offer a number of advantages to providers. By standardizing questions, answers, and scoring, reliable and valid comparisons can be made to determine the relative burden of different conditions by comparing health profiles of each scale.

Clinical Trials of Treatment Effects

The SF-36 has been used to evaluate the burden of specific conditions such as the burden of heart disease and the benefits of heart valve replacement (Ware & Sherboure, 1992). To date, we are aware of more than dozen publications reporting results from clinical studies that included the SF-36 and there are about 150 topics under study in clinical trials using the SF-36 health survey.

Monitoring Outcomes in Clinical Practice

The SF-36 and other patient-based instruments have the potential to serve as “laboratory tests” of functioning and well-being in everyday medical practice (ACP, 1988). Their routine administration would be useful in : detecting and explaining decreased functional capacity and well-being, keeping track of changes in function

(13)

over time, making it possible to consider the patient‘s total functioning in choosing among therapies, guiding the efficient use of community resources and social services, and predicting more accurately the course of chronic disease.

The Needs of Chinese Version of SF-36

In addition to needs of medical outcome measures for monitoring population health, estimating the burden of different conditions, clinical trials of treatment effects, and monitoring outcomes in clinical practice, additional two urgent demands call for the need of Chinese version of SF-36 in Taiwan. They are as follows:

1. Research for health policy and health behavior:

In Taiwan, National Health Insurance (NHI) program was officially

implemented in March 1, 1995. It provides universal coverage and reduces the price of medical care and deletes the barriers of health care utilization. Hence, people may increase the use of medical care after deletion of the economic barriers. Therefore, it is necessary to observe the effects of NHI on health care utilization and expenditure in order provide information for health policy makers. In order to evaluate the health care utilization given the same state of health, measuring the health status in general population would be an important issue.

2. Cross-cultural adaptation of medical outcome measures:

To compare health status outcomes across countries and conduct multinational trials of drug therapies and other treatments (Anderson, 1994), it is necessary to have standardized questionnaires and scoring methods as well as proof that the same health attributes are being measured in each country. A recent comparison of international health statistics underscored the consequences of a lack of standardized health status information; the report concluded that “there are virtually no population-based data available with which to make meaningful international comparisons on the prevalence of disease and disability” (Aaronson, 1988).

Translation of Chinese-version SF-36

The translation of Chinese-version SF-36 follows the guidelines for

cross-cultural adaptation proposed by Guillemin et al (1993). Preliminary Chinese versions were obtained through a 6-month period of translation procedure. A group of bilingual collaborators who are researchers in studies of health behavior and health policy participated in the work of translation during this period. Because of the

(14)

cultural differences, the Chinese version and the English version differ in the wording of several items to make it acceptable for Chinese subjects. These differences have been discussed with the researchers who developed the original version.

Back-translations were used during this translation process. And then a committee review the different versions of translation in order to produce a final version of the modified measure based on the various translations and back-translations. The four committe members are expert in health behavior and in the intent of the measure and the concepts to be explored. The final version of the SF-36 has been used among colleagues to examine its acceptability as well as content validity.

The Need of Validation of Chinese Version SF-36

Previous studies indicate that inadequate language translation may lead to reduction of the content validity of measurement (Berkanovic, 1980; Deyo, 1984), so it is necessary to validate the Chinese version SF-36 before it is widely used.

To make sure the data we collect do provide correct information, two crucial aspects of correctness should be considered: reliability, the extent to which measures give consistent or accurate results, and valid, the extent to which the results pertain directly to the desired attribute or characteristics being measured.

Validity studies help us to understand what a difference or a change in a score means. When enough evidence has been accumulated to show that a scale measures the intended health concept and does not measure other concepts, the scale is said to be validated. As long as the process of validation continues, new information is produced about the interpretation and meaning of scores.

Reliability

The evaluation of the reliability of any measurement procedure consists of estimating how much of the variation in a score is real or truth as opposed to chance or random errors (Selltiz et al., 1976). A reliability of 0.70 indicates that 70% of the measured variance is reliable; reliability coefficients are therefore proportions. Reliability examines the consistency of results from different measures designed to evaluate the same variable. Acceptable reliability differs depending on what is being analyzed: comparisons among individuals or across administrations to the same individual require high reliability (values >0.90); group comparisons, needed to compare average health status scores between diagnostic or treatment groups, do not require as high a reliability (values of 0.5 or 0.70 or higher are acceptable)

(15)

(Helmstadter, 1964; Nunnally, 1978).

Validity

a. Content Validity

The validity of questionnaires in the health field has most often been evaluated by means of content, construct, or criterion validation. Content validity (whether the test offers an adequate sample of the construct) is a challenge in the health field because of the breadth of health variables. Content validation requires the existence of a defining standard against which one can compare the content of a measure. Standards can be based on well-accepted theoretical definition, on published standards, or on interviews with those who are experiencing the types of health problems under study. When construction of the SF-36 began, Ware published a set of standards for evaluating the content validity of general health measures intended to be comprehensive (1987). These standard were applied in constructing the SF-36. b. Construct Validity

When construct validation is used, both the test and the underlying theory must be evaluated. There are three steps to accumulate evidence of validity related to theoretical constructs: (1) specify the domain of variables, that is, prepare a blueprint for the constructs; (2) establish the internal structure of the observed variables; and (3) verify theoretical relationships between scale scores and external criteria (Ware, 1992). One method of testing the underlying theory is to test the differences between two patient groups known to differ in some way. For example, patients with a relatively minor and uncomplicated medical condition should score better in mental health (theorized construct) than patients with a psychiatric illness, and the average mental health scores of these patient groups should differ significantly. The mean difference between such groups in the Medical outcomes Study (MOS) was very large: 30.78 points on a 100 point scale (McHorney et al., 1993). The comparison demonstrates validity for the Mental Health (MH) scale because the mental health scores were much lower for patients with psychiatric disease (known to have poor mental health by definition of their disease).

c. Internal Validity: Convergent and Discriminant Validity

(16)

validation. Convergent validity is supported when different methods of measuring the same construct provide similar results. Discriminant validity examines whether a measure of one underlying construct can be differentiated from another construct. For example, in the MOS, measures of physical functioning, mobility, and

satisfaction with physical abilities were expected to yield results that “converage” at least moderately with one another because they are all hypothesized to assess physical health. In tests of discriminant validity, different measures are expected to yield different results. For example, one would not expect a measure of physical functioning to be highly related to a measure of depression or of loneliness (Ware, 1992).

d. Criterion validity

Criterion validity demonstrates that test scores are systematically related to one or more outcome criteria. This technique can be used when external evidence is available for use as a criterion against which the results of the test can be compared. The “criteria” were selected because they (1) are important (clinically, socially); (2) represent plausible outcomes of the variations in functioning and well-being

measured by scales; and (3) were measured independently of the scale in question. Examples of correlations with external evidence occur when (1) health status and resource use are negatively correlated, (2) age and physical health are negatively

correlated (according to the theory that physics function declines with increasing age), or (3) physical and mental health each have a positive correlation with general health (Ware, 1992).

e. Factorial Validity

Factor analysis provides an empirical test of the construct validity of the SF-36 in relation to its hypothesized structure. In the absence of agreed upon “criteria” for validating a scale, the validity of each scale can be tested using factor analytic

methods.

The SF-36 was constructed to represent two major dimensions of

health-physical and mental - that have been confirmed empirically in previous studies (Hays & Stewart, 1990; Ware, Davies-Avery, & Brook, 1980). Thus, two principal components can be constructed from the correlations among SF-36 health scales and rotated them to orthogonal simple structure. The orthogonal solution has the advantage of permitting interpretation of correlations across components to estimate the factor content of each scale.

(17)

Then two components were interpreted on the basis of their correlations with the SF-36 scales. If the pattern of results across scales was very consistent with expectation for physical and mental “dimensions” of health as summarized in Table 2, they were labeled “physical” and “mental” accordingly. If the two-dimensional structure had not been confirmed or the interpretation of the factors turned out to be ambiguous, these components could not have been used as “criteria” in testing the validity of each scale.

f. Clinical tests of Validity

Clinical tests of validity were based on criteria used to form mutually exclusive patient groups (McHorney et al., 1992; 1993). These groups differed in the severity of their conditions as defined by clinical measures of physical and mental

(psychological) morbidity. In McHorney studies (1992,1993), the least severe comparison group was limited to patients with only a minor medical condition such as uncomplicated hypertension while the group with serious physical morbidity included patients with congestive heart failure (CHF) and complications (e.g., edema,

orthopnea); myocardial infarction survivors with substantial morbidity (e.g.,

noteworthy and recurring angina and /or severe CHF symptomatology); hypertension patient with a history of a stroke; and diabetic patients with noteworthy complications (e.g., severe autonomic neuropathy). The group used to test validity in relation to clinical criteria of mental health was limited to patients with severe mental morbidity such as current unipolar affective disorder (major depression or dysthymia) or serious depressive symptoms.

1. Mean differences in SF-36 scale scores are for comparisons between the least severe group and the groups either severe physical or mental morbidity. Standardized effect size (ES) is the group difference divided by the general population standard deviation (SD). The relative validity (RV) is the ratio of pair-wise F-statistics, specifically the F for the comparison scale divided by the F for the most valid scale based on the same two-group comparison. The F-ratios analyzed in estimating RV are those for the difference between group means relative to the within-group means is larger and the error term is small. Thus, a larger F reflects greater discriminant validity and/or greater precision in estimating group means. RV estimates indicate how valid each scale is in discriminating between clinical groups, relative to the most valid SF-36 scale. RV is useful in addressing the issue of “conceptual relevance” and answers the question: how sensitive is each SF-36 health concept to differences in the levels of physical and mental morbidity defined by these clinical groups, relative to the

(18)

best scale?

Specific Aims

There are two aims in this study. The first aim of this study is to test the validity and reliability of a Chinese-Language version of the MOS 36-item short form health survey (SF-36) for measuring the health status among general population and primary care attenders. The second aim of this study is to compare the health status between these two populations. Therefore, the specific aims of this study are:

1. To provide estimates of reliability and validity of Chinese version SF-36 among two samples, one for general population and the other for primary care

attenders.

2. To provide health profiles of Chinese version SF-36 among general population and primary care attenders and to make comparisons between these two groups.

Methods

Study Design

Cross-sectional study design with two samples will be used in the present study. SF-36, Chinese Health Questionnaire (CHQ), clinical diagnosis, sociodemographic factors are measured at the same time point. The structure of the study design is shown in Figure 1.

Primary care sample General population sample

Beginning of Study Study population

Measure all variables at the same time

Figure 1: Study design of the present study.

Subjects

The Chinese version of the SF-36 was tested in two samples: (1) one from primary care attenders in a large teaching hospital; (2) the other from a general population in Taichung city.

(19)

Primary Care Sample

Six hundreds consecutive patients who attend general practice in China Medical College Hospital in Taichung were recruited. All study subjects were administered questionnaires to collect information about SF-36, CHQ, sociodemographic factors, medical history, and so on. The detailed steps for the administration of

questionnaires are described in the section of Administration of the Questionnaire. Selection criteria for this study are those who could and would like to complete the self-rating questionnaires while exclusion criteria are those who have cognitive problems.

General Population Sample

The target population was all individuals who resided in Taichung City at the beginning of the study. The sampling frame of this study uses the set of all family records from Bureau of Household. Since Taiwan has good registration of

household, we believe that this sampling would provide good reliability. Multistage sample design was used in this study, which consists of 4 strata and 2 sampling method. The four strata are the district, Li, household, and resident. There was a total of 600 residents selected in this study, and 425 agreed to participate. Thus the overall response rate is 70.83%. The sample size and sampling probabilities adopted in this study are determined by several considerations: power, available resources of the study, and heterogeneity of population. Within the first three strata, systematic sampling was applied with a probability that proportionate to the number of

households in each sampling unit, whereas in the last strata, kish procedure was applied to select a residence from a household.

Evaluation of Non-response Bias

Data were also collected from non-response subjects to evaluate the possibility of non-response bias. To evaluate differences between response and non-response groups, face-to-face interview for primary care sample and telephone calls for general population sample were made to collect key variables in the questionnaire. These questions were formulated to approximate the same question in the written survey. These variables were calculated for responders and non-responders in order to evaluate the potential effects of non-response bias.

(20)

Administration of the Questionnaire

The questionnaire will be self-administered for primary care sample while it will be mailed out/mailed back for general population. The steps for administration of questionnaire for primary care attenders will follow the guidelines suggested by Ware et al. (1992).

The purpose of these guidelines is to establish rapport with the respondent and encourage completion of the questionnaire. The administrator can emphasize to respondents the importance of their answers to the completion of the study or to the addition to their medical records. The administrator can also answer questions and address concerns about the SF-36, and ensure the questionnaire is filled out correctly and completely.

Measurement

SF-36

The SF-36 is a short questionnaire with 36 items which measure eight multi-item variables: physical functioning (10 items), social functioning (2 items), role limitations due to physical problems (4 items), role limitations due to emotional problems (3 items), mental health(5 items), energy and vitality (4 items), pain (2 items), and general perception of health (5 items). There is a further unscaled single item on changes in respondents‘ health over the past year. For each variable item scores are coded, summed, and transformed to a scale from 0 (worst possible health state measured by the questionnaire) to 100 (best possible health state). For the SF-36, a high score indicates better perceived health state.

Clinical Criteria

Using clinical criteria, two mutually exclusive groups were formed: Group 1, no minor or uncomplicated chronic medical conditions; and Group 2, minor or serious (uncomplicated or complicated) chronic medical conditions. The classification of groups will be used to evaluate the clinical test of validity.

Patients classified as having a chronic medical condition included hypertension, diabetes mellitus, heart disease, anemia, incontinence of urine, duodenal ulcer, chronic hepatitis B, hepatitis C, tuberculosis, and so on.

(21)

Greet and evaluate potential respondent

Can he/she read the questionnaire? Introduce questionnaire Use interviewer administration Give respondent questionnaire

Instruct respondent how to fill out questionnaire

Answer respondent's questions

Retrieve questionnaire

Check questionnaire for completeness

Thank respondent

Figure 1: SF-36 administration flow chart for primary care attenders. Sociodemographic factors

Age, gender, level of education will be collected in the questionnaire. Life event

This variable was measured by a self-report questionnaire that consisted of 60 items grouped into 10 problem domains covering housing, work, financial status, legal matters, social and leisure activities, family status, child-parent interaction and marital relationship. For each of the 10 domains the presence of social problems was determined and the total score was then computed by adding up the number of

(22)

Statistical Analysis

Reliability

The internal consistency form of reliability was assessed in this study. Internal consistency is the extent to which items within a dimension are correlated with each other. It will be examined by three methods: item-scale correlation (Streineer, 1990), and Cronbach alpha (Cronbach, 1951). Item-scale correlations, which assess the extent to which an item is related to the remainder of its scale, should exceed 0.4 (Kline, 1986) whereas Cronbach alpha, which measures the overall correlation

between items within a scale, should exceed 0.7 (Nunnally, 1994) or 0.8 (Ware, 1993) to be considered acceptable.

Validity

Six aspects of validity will be evaluated: Internal validity (convergent and discriminant), criterion validity, construct validity, clinical test of validity, relative validity, and factorial validity.

Internal validity

The convergent and discriminant validity of SF-36 was examined by the multitrait multimethod matrix (Campbell & Fiske, 1959).. For convergent validity, the correlation between comparable dimensions on SF-36 and Chinese Health Questionnaire (CHQ) - for example, between mental health and depression and poor family relation - should be higher than the correlations between less comparable dimensions - For example, physical functioning and social dsyfunction. We‘ll test discriminant validity by comparing item to own scale correlation with item to other scale correlation. The item to own scale correlation should be higher if the categories within the SF-36 questionnaire are valid.

Construct, Criterion, and Clinical tests of validity

Construct validity assesses the extent to which a measure is related to criteria derived from an established clinical or social theory or “construct”. One method is to examine construct validity, where hypotheses or constructs concerning the expected distribution of health between groups are examined by the measure being validated

(23)

(Streiner, 1989; McDowell, 1987). Therefore, the scales will be compared to

assessments of physical and mental health based on information independent of SF-36. The study population was stratified along three variables corresponding to mental and physical health. The mental health variable has 3 levels:

Group 1: number of life events ƙ1; Group 2: number of life events 2-5 Group 3: number of life events ≥ 6

Two variables for the physical health variable, the first one has 2 levels: Group 1: no chronic condition; Group 2: any chronic conditions

The second one has 3 levels:

Group 1: 18-34 years old; Group 2: 35-49 years old; Group 3: ≥ 50 years old

As to clinical test validity, one-way ANOVA will be applied to make comparisons between these groups.

Relative Validity

The relative validity of each scale in measuring each dimension of health was assessed by the ratio of variance explained by the scale of interest (i.e., the scale coefficient squared) to the variance explained by the “best” scale (McHorney, 1993; Liang, 1985). Relative validity was assessed by the ratio of F-statistics, derived from one-way ANOVA models[12] for comparisons between mental health groups and among physical health groups. Again, the scale with the highest F-statistic was the reference with a relative validity of 1.

Exploratory Factory Analysis

In addition to evaluation of five aspects of validity, tests of validity will be applied in this study. Exploratory factory analysis (Child, 1990), a technique of psychometric validation, assesses the agreement between hypothetical factors that go to make up the measure and the scales designed to assess those factors. If the Chinese version of SF-36 is a valid measure for use, the scales defined by this authors should merge from a factor analysis of these two samples from general population, and items relating to a particular scale should be grouped together within a single factor. Within such an assessment a factor should be considered relevant only if its eigenvalue (a statistical measure of its power to explain variation between subjects) exceeds 1.1 (Jolliffe, 1986).

(24)

Results

Random Sample of General Population

Table 1 provides information on the distributions of sociodemographic characteristics, number of chronic diseases, and having illness during the past 6 months. Of 426 respondents, 155 (36.4%) were 18-34 years old, 231 (54.2%) were male, 220 (55.8%) had more than 12 years of education, 283 (69.4%) had income more than 3,5000 NT dollars, 53 (12.4%) didn’t have any chronic disease, and 32 (7.5%) had been ill during the past 6 months.

The abbreviated English and Chinese content for each SF-36 item and scale assignment are shown in Table 2. These scales were constructed to be

multidimensional. The SF-36 survey includes a single-item measure of health transition, which is not used to score any of the eight multi-item scales.

The number and percent of participants missing each of the 36 items is presented in Table 3 for the random sample of Taichung city population.

Missing-value rates for the 36 items were consistently low, ranging from 0.2 to 1.2 and averaging 0.66.

Table 4 presents the percentage of items within each scale that were computable. For the total sample, these percentages were very high across scales, ranging from a low of 96.0% (RE) to a high of 99.5 (PF). Data completeness was not significantly different across scales among different subgroups. Older subgroups (>50 year old) and subgroup with chronic disease has slightly higher rate of complete items in all eight scales and subgroup with illness also has slightly higher rate of complete items in all scales except for PF and GH.

Average scores and quartiles of score distributions (Table 5) indicated that the population was generally in good health. Substantial ceiling effects were observed for 5 of the 8 scales while no substantial floor effect was observed for all 8 scales. The scales with substantial ceiling effects were physical functioning, role-physical, social functioning, and role-emotional.

Table 6 presents the item means and standard deviations and results of

item-scale correlation coefficients. Standard deviations of items belonging to a given scale were fairly homogeneous. A possible exception was the physical functioning scale, where standard deviations varied from 0.27 to 0.68. This was due to higher proportion of respondents answering “limited a little” for “vigorous activities” than

(25)

other items. Three phenomena were observed from the correlation coefficients. The first one was that we observed fairly homogeneous correlation coefficients between an item and its hypothesized scale. The second is that almost all correlation

coefficients between an item and its hypothesized scale had strong associations (≥0.7). The last was that the correlation coefficients between an item and other scales were much smaller than coefficients between an item and its hypothesized scale.

Results of scale tests, item-discriminant validity and item-convergent validity based on the matrix in Table 6 are presented summarized in Table 7. Perfect scaleing success rates for item-discriminant and item-convergent validity were achieved across all eight SF-36 scales. In 277 comparisons out of 280, the correlation between an item and its hypothesize scale exceeded correlations with all others scales by more than 2 standard errors. In addition, all items satisfied the criterion set a priori for convergent validity, i.e. a correlation with own scale ≥0.4. Thus, the success rate for discriminant validity was 98.9%, and for convergent validity, 100.0%.

Table 8 presents Cronbach’s α across scales for overall group and 15 subgroups. These subgroups differed in terms of sociodemographic characteristics and chronic conditions. Overall, Cronbach’s α ranged from 0.63 to 0.97. Minimum standards of reliability for purposes of group comparisons (≥0.5 or ≥0.7) were satisfied for overall group for all SF-36 scales in this population while 4 Cronbach’s α for 15 subgroups did not satisfied with this minimum standards (scales of vitality and mental health for 9-12 years of education and social functioning scale for age >65 years old and for male). Among different scales, the social functioning scale had the lowest values of Cronbach’s α; possibly because this scale contains only two items. It also had more variation across different subgroups relative to other scales, particularly for gender. The scale of role-physical and physical functioning had the highest internal consistency relative to the other scales for overall and all subgroups, and had more homogeneous coefficients across different subgroups. Role-physical was also the only scale that consistently exceeded the minimum standard of 0.90 for comparisons of scores for individual patients while physical function exceeded this standard except for subgroups of 35-49 years old and >12 year of education. In general, all

Cronbach’s α values of all scales were consistent across different subgroups. Validation by Factor Analysis

Factor analysis identified seven relevant factors, with eigenvalues ranging from 1.01 to 14.02 and with proportions of total variance ranging from 2.87% to 40.05% (Table 9). The proportion of total variance of these seven factors explained by these items ranged from 59.0% (MH2) to 87.4% (for RP3) (not shown in the table). The

(26)

factors (factors 2 and 5). Factor 1 was formed by 8 items of physical functioning and 1 item of social functioning. The other one item of social functioning (SF2) did not have any coefficient higher than 0.4, indicating little contribution to any factors. The highest coefficient of SF2 was 0.36 for factor 1. This might imply factor 1 corresponded to the combination of physical functioning and social functioning. The remaining 2 items of physical functioning combined with bodily pain and then formed factor 7. The other 3 factors corresponded to 3 scales of the SF-36: role-physical, general health perception, and role-emotional.

Validation by the Hypothesized Dimensionality of the SF-36 scales

We used principal component analysis to test the hypothesized dimensionality of the SF-36 scales. Because we hypothesized two dimensions to underline the structure of the eight scales, we extracted two principal components. To facilitate interpretation, we further rotated the components to orthogonal structure using the varimax method. The proportion of variability in one of the principal components explained by each scale was obtained by squaring the corresponding correlation coefficient. To evaluate the factorial validity of each scale as a measure of each component, we first squared each factor loading (scale-component correlation) to estimate the proportion of variance shared with that component (common-factor variance). We defined the scale sharing the most variance with each component as the most valid measure of that component. For each component, we then estimated relative validity (RV) for each scale by dividing the variance shared with the component by that estimate for the most valid scale. These ratios indicate in proportional terms how much less valid each scale is relative to the most valid scale. The higher the RV of a scale, the more

precisely or efficiently it measures the underlying construct of interest as defined by the most valid scale.

Factor analysis of eight health scales produced 2 principal components. The first (“physical health”) explained 56.5% of total variance, while the second (“physical health”) explained 13.1%, for a total of 69.6%. The proportion of total variance explained by these 6 scales varied between 45 and 86%. Only 6 out of the 16 observed correlations between individual scales and principal components followed the pattern that was hypothesized by McHorney et al (Table 10). We found that scales of general health, social functioning, role-emotional, and mental health correlated more strongly with “physical” component than was predicted. Scales of physical functioning, role-physical, bodily pain, and general health correlated more strongly with “mental” component than was expected while role-emotional correlated slightly less strongly with “mental” component than was expected. Even though the

(27)

concordance rates with hypothesized correlations was low, the order of correlation within each component was generally consistent with a priori hypothesized by McHorney et al. The relative validity of a scale was given by the ratio of explained variance to that of the best scale: physical functioning for the “physical” component, and mental health for the “mental” component. In general, the patterns of relative validity were consistent with prediction.

Validation by Norm-based Interpretation

Lower scores on the SF-36 reflect poorer health state. Table 11 shows normative data in the form of means and standard deviations, broken down by age, gender, education, income, chronic disease and having illness. Overall, older subjects reported significantly poorer health on all scales of the SF-36 except for mental health than did younger subjects (all significant scales p<0.001, except for role-emotional p=0.0346). Women only reported poorer health on vitality scale than did men (p=0.0146). There were significant differences in scores among subjects with different levels of education on all scales of the SF-36 except for role-emotional and mental health (p<0.001 on vitality and physical functioning scales, p<0.01 on bodily pain, general perception of health, and social functioning; and p<0.05 on

role-physical). Subjects with lower income reported poorer health on physical functioning, role-physical, general perception of health, and vitality (p<0.001 on general perception of health; p<0.01 on physical functioning and vitality; and p<0.05 on role-physical). Subjects with chronic disease had significantly lower scores on all scales than those without (p<0.001 on all scales except for role-emotional p<0.01). Subjects reporting an illness during previous 6 months had significantly lower scores on all scales than those without (p<0.001 on all scales except for physical functioning, social functioning, role-emotional, p<0.01, and mental health p<0.05).

Construction Validation

Table 12 shows the means score in the group with no chronic disease, mean difference between groups with and without chronic disease, F-statistics, and estimates of RV. Patients with any chronic diseases scored significantly lower on all eight scales compared to patients with no chronic disease. General health scale was the most valid in detecting differences between patients with and without chronic disease. Vitality scale was the second most valid scale, followed by the role-physical, social functioning, physical functioning, and bodily pain. As hypothesized, the best mental health scales (mental health and role-emotional) performed most poorly in this test.

(28)

Primary Care Sample

Table 13 provides information on the distributions of sociodemographic characteristics, number of life event, taking medicine, and having chronic disease among outpatients. Of 284 outpatients, 140 (49.3%) were 18-34 years old, 133 (47.0%) were male, 170 (73.9%) had more than 12 years of education, 228 (80.6%) had more than one life event during the past month, 138 (49.1%) were taking medicine, and 117 (41.8%) had any chronic disease.

The number and percent of outpatients missing each of the 36 items is presented in Table 14 for the outpatient sample of primary care setting. Missing-value rates for the 36 items were consistently low, ranging from 0.0 to 2.4 and averaging 1.44.

Table 15 presents the percentage of items within each scale that were

computable for the outpatient sample of primary care setting. For the total sample, these percentages were very high across scales, ranging from a low of 97.6% (RE) to a high of 99.3 (MH). Data completeness was not significantly different across scales among different subgroups. In general, age group of 35 to 49 years old, education group less than 9 years, and subgroup with more than 5 life events had slightly lower rate of complete items in all eight scales.

Average scores and quartiles of score distributions (Table 16) indicated that the population was not in good health. Substantial ceiling effects were observed for 4 of the 8 scales and they are physical functioning, role-physical, bodily pain,

role-emotional. Moderate floor effects were observed in scales of role-physical and role-emotional.

Table 17 presents the item means and standard deviations and results of

item-scale correlation coefficients. Standard deviations of items belonging to a given scale were fairly homogeneous. A possible exception was the physical functioning scale, where standard deviations varied from 0.27 to 0.68. This was due to higher proportion of respondents answering “limited a little” for “vigorous activities” than other items. We also observed three phenomena from the item-scale correlation coefficients. The first one was that we observed fairly homogeneous correlation coefficients between an item and its hypothesized scale. The second is that almost all correlation coefficients between an item and its hypothesized scale had strong to moderate associations (0.7-0.3). The last was that the correlation coefficients between an item and other scales were much smaller than coefficients between an item and its hypothesized scale.

(29)

based on the matrix in Table 17 are presented summarized in Table 18. Perfect scaling success rates for item-discriminant and item-convergent validity were achieved across 6 and 5 of eight SF-36 scales, respectively. In 270 comparisons out of 280, the correlation between an item and its hypothesize scale exceeded correlations with all others scales by more than 2 standard errors. In addition, all items except for 2 items, one for physical functioning and the other for mental health, satisfied the criterion set a priori for convergent validity, i.e. a correlation with own scale ≥0.4. Thus, the success rate for discriminant validity was 96.4%, and for convergent validity, 94.3%.

Table 19 presents Cronbach’s α across scales for overall group and 15 subgroups. These subgroups differed in terms of sociodemographic characteristics, life events, and chronic conditions. Overall, Cronbach’s α ranged from 0.61 to 0.89. Minimum standards of reliability for purposes of group comparisons (≥0.5) were satisfied for overall group for all SF-36 scales in this outpatient sample while 4 Cronbach’s α for 15 subgroups were not satisfied with this minimum standards (scales of bodily pain for life events ≤ 1 and those without taking any medicine and mental health scale for education ≤ 9 years and life events ≤ 1.). These Cronbach’s α below minimum standards also more varied across subgroups. Among different scales, the social functioning scale had the highest values of Cronbach’s α, and next were role-physical, physical functioning, and role-emotional. In general, all

Cronbach’s α values of all scales were consistent across different subgroups. Validation by Factor Analysis

Factor analysis identified 8 relevant factors, with eigenvalues ranging from 1.12 to 10.61 and with proportions of total variance ranging from 3.21% to 30.30% (Table 20). The proportion of total variance of these 8 factors explained by these items ranged from 41.6% (PF6) to 86.3% (for BP1) (not shown in the table). Physical functioning scale separated into 2 factors (factors 1 and 3). Mental health and vitality scales were combined together and then separated into two factors (factors 2 and 4). Factor 5 was formed by 3 items of role-emotional and 1 item of social functioning. Although the coefficient of the other item of social functioning (SF2) is not greater than 0.4 in factor 5, the coefficient of this social functioning item was highest in factor 5. The other 3 factors corresponded to 3 scales of the SF-36: role-physical, general health perception, and role-emotional.

Validation by the Hypothesized Dimensionality of the SF-36 scales

We used principal component analysis to test the hypothesized dimensionality of the SF-36 scales in this outpatient sample. Factor analysis of eight health scales

(30)

produced 2 principal components. The first (“physical health”) explained 50.5% of total variance, while the second (“physical health”) explained 13.0%, for a total of 63.5%. The proportion of total variance explained by these 6 scales varied between 52.0% and 77.9%. Only 8 out of the 16 observed correlations between individual scales and principal components followed the pattern that was hypothesized by McHorney et al (Table 21). We found that scales of physcial functioning and bodily pain did not correlate strongly enough with “physical” component than was predicted while role-emotional and mental health correlated slightly more strongly with

“physical” component than was predicted. Scales of physical functioning, role-physical, bodily pain, and vitality correlated more strongly with “mental”

component than was expected. Even though the concordance rates with hypothesized correlations was not high, the order of correlation within each component was

generally consistent with a priori hypothesized by McHorney et al. The relative

validity of a scale was given by the ratio of explained variance to that of the best scale: physical functioning for the “physical” component, and mental health for the “mental” component. In general, the patterns of relative validity were consistent with

prediction.

Validation by Distinguishing Subgroups

Lower scores on the SF-36 reflect poorer health state. Table 22 shows means and standard deviations, broken down by age, gender, education, life event, taking medicine, and chronic disease. Overall, older subjects reported significantly poorer health on physical functioning and role-physical than did younger subjects (p<0.001 for physical functioning and p<0.01 for role-physical). Women only reported poor health on physical functioning, role-physical, general health, and role-emotional scales than did men (all p<0.01). There were significant differences in scores among subjects with different levels of education on physical functioning and role-physical scales of the SF-36 (p<0.001 on physical functioning scale and p<0.01 on

role-physical). Subjects with higher number of life events reported poorer health on vitality, social functioning, role-emotional, and mental health (all p<0.001 except for mental health p<0.01). Outpatients who were taking medicine had significantly lower scores on all scales except for role-emotional (p<0.001 on role-physical, general health, and vitality; p<0.01 on bodily pain and mental health; and p<0.05 on social functioning). Subjects with chronic disease had significantly lower scores on all scales except for social functioning, role-emotional, and mental health than those without (p<0.001 on general health, p<0.01 on role-physical, and p<0.05 on physical functioning, bodily pain, and vitality).

(31)

Construction Validation

Table 23 shows the means score in the age group 18-34 years old, mean difference between age group of 18-34 years old and age groups of 35-49 and ≥ 50 years old, F-statistics, and estimates of RV. Older patients scored significantly lower on physical functioning, role-physical, bodily pain, and general health compared to younger patients. Physical functioning scale was the most valid in detecting

differences between different age groups. Role-physical scale was the second most valid scale. The other scales performed pretty poorly in this test.

Table 24 shows the means score in the group with ≤ 1 life events, mean difference between group of ≤ life events and groups of 2-5 and ≥ 6 life events, F-statistics, and estimates of RV. Outpatients with more life events scored

significantly lower on social functioning, role-emotional, and mental health compared to younger patients. Role-emotional scale was the most valid in detecting differences between different life event groups. Social functioning scale was the second most valid scale, followed by vitality and mental health. Physical functioning, role-physical, bodily pain, and general health scales performed pretty poorly in this test.

Regression Model of 8 Scales of SF-36

We then further examined the differences of 8 scales of SF-36 between random sample of general population and outpatient sample adjusting for the effects of age, gender, education, and chronic conditions (Table 25). We can see that the

differences of 8 scales were all statistically significant after controlling for the other variables in the model, ranging from –2.9 (physical functioning) to –17.5

(role-emotional). Those who had chronic conditions also reported significantly lower scores of all scales after considering the other variables in the model, ranging from 3.9 (physical functioning) and 17.3 (role-physical). Scales of vitality and mental health were significantly higher among those having 9-12 years of education than among those having less than 9 years of education. Those having more than 12 years of education reported significantly higher scores than those having less than 9 years of education in scales of physical functioning, role-physical, vitality, social functioning, and mental health. Gender did not exert any significant effect on any scales of SF-36. Those who were greater than 50 years old reported significantly poorer health status than those who were 18-34 years old in scales of physical functioning, role-physical, bodily pain, and general health. The percentages of 8

(32)

outpatients ranged from 8.78% (bodily pain) to 17.31% (physical functioning).

Next we examine the impacts (from the standardized beta coefficients, not shown in the table) of age, gender, education, chronic condition, and outpatients on each scale of SF-36. Age group >50 years old had the greatest impact on physical functioning, followed by chronic condition and outpatient. For scales of role-physical and general health, chronic conditions had the greatest impact on them, followed by outpatient. Chronic conditions also had the greatest impact on bodily pain, and then followed by >50 age group. For scales of vitality, social functioning, role-emotional, and mental health, outpatient had the greatest impact on them, followed by chronic condition.

Discussion

The SF-36 is a brief and easy to use questionnaire. It has been shown to be suitable for self-administration or face-to-face interview among clinical patients and managed care organization members in several languages. The reliability and validity of the Chinese version of the SF-36 administered through face-to-face interview in a random sample of the general population and through

self-administration in primary care settings have never been reported. Our study showed that the Chinese version SF-36 was favorable for face-to-face interviews and self-administration. For face-to-face interviews, it took just five minutes to complete and achieving a high response rate and remarkable low missing rate (0.2-1.4%) while for self-administration, it took about 10 minutes to complete and remarkable low missing rate (0.7-2.4%). Therefore, the Chinese version SF-36 questionnaire appears to be an acceptable measure of the health status of a Chinese general population.

Our findings supported the claims of internal consistency of the domains of the SF-36 across diverse groups and also confirmed that its psychometric assumptions have remained intact. For example, success rates were high for convergent and discriminant validity.

Validation by factor analysis yielded results remarkably similar to those proposed by the authors who developed SF-36. Two main differences from the hypothetical construct were observed in our population. First, the items of vitality were closely correlated with those of mental health scale, which is similar to the results of Garratt, et al. The items of these two scales consisted of two factors in our study, but only one factor in Garratt et al’s study. Second, the items of bodily pain

(33)

general population while the items of bodily pain formed an independent factor in the primary care sample. In Garratt et al’s study, the items of role limitations due to physical problems cluster together in addition to those of bodily pain and social functioning. The items of the other four factors precisely corresponded to the hypothetical scales. Such precise correspondence between factors and scales is rare in factor analysis and thus confirms the validity of the SF-36 in a Chinese general population.

Estimates of internal consistency for the SF-36 scales across different cultural populations have been reported in 9 studies, as shown in Table 26. All estimates exceeded accepted standards for measures used for group comparisons. For each scale, the median of the reliability coefficients across studies exceeds 0.80, with the exception of the social functioning scale (the median for this two-item scale is 0.77). These results support the use of the SF-36 scales in studies of health status that are based on group-level analyses. No scale consistently exceeded the 0.90 standard of reliability, which consider a minimum standard for comparisons of scores for

individual patients. All of the published coefficients exceed the minimum standard of 0.50 suggested by Helmstadter (1964) for group comparisons; all but three exceed the 0.70 standard for individual comparison suggested by Nunnally (1978).

Subjects with these previously identified chronic conditions reported significantly compromised health status compared to similar subjects without any chronic conditions after considering the effect of age, gender, education, and type of sample. Most of the effects were both statistically and clinically significant. For example, the subjects with chronic conditions had noticeably negative effects on the 8 SF-36 scales, ranging from 3.9 to 17.3 points below the scores for the subjects with no chronic conditions. The reduction in health status associated with chronic conditions was similar in magnitude than those reported for chronic physical illnesses such as low back pain, arthritis, and diabetes (Wells, K.B., et al., 1989), which implied the severe impact of chronic conditions. For instance, the negative effect of chronic conditions on general perception was 13.6 points, which was the same as the impact of diabetes and congestive heart failure (about 13 points).

Subjects from primary care settings reported significantly compromised health status compared to subjects of general population after considering the effect of age, gender, education, and chronic conditions. Most of the effects were both statistically and clinically significant. For example, the subjects from primary care settings had noticeably negative effects on the 8 SF-36 scales, ranging from 2.9 to 17.5 points below the scores for the subjects of general population. The reduction in health status

數據

Table 1. Summary of health phenomena captured by SF-36 scales.  PF RP BP GH VT SF RE MH
Table 2. Information about SF-36 health status scale.
Figure 1:    SF-36 administration flow chart for primary care attenders.
Table 1. Sample size for subgroup analysis: A Random Sample of General Population  From Taichung City
+7

參考文獻

相關文件

(The Emotional and Mental Health Needs of Gifted Students and the Main Categories of Emotional and Mental

就學與就業之職能 治療暨實習、職業 輔導評量學暨實 習、職業復健暨實 習、職能評估與職 業復健暨實習、職 業輔導評量專題研 究、職業輔導評量

(The Emotional and Mental Health Needs of Gifted Students and the Main Categories of Emotional and Mental

Given different levels of homeroom teachers’ transformational leadership, there are significant differences regarding students’ learning satisfaction.. Students have

(1) Parents in different ages, education backgrounds, vocations, and numbers of children in school show significant differences in teacher’s changes dimension related to

There was a significant difference in behaviors of a low-carbon diet among with different mother’s occupations.A positive correlation was gained among knowledge attitudes

Students’ cognition toward low-carbon diet showed significant differences among students with different grades, could be affected by their comprehension on

Students’ cognition toward low-carbon dietshowed significant differences among students with different grades, whether knowing the Meatless Monday campaign, and how