As a primary hypothesis, we assumed that the physiological/biochemical mechanism is not different between races and genders. With this assumption, we did not take race and gender as exogenous variables to make things simple and then the whole dataset was used to pursuit a reasonable model-building procedure in SEM.
When it is believed that there is different among genders and/or races, however, more complicated fitting can be considered. For example, a ‘stratification’ on gender or race is possible.
Full model
Since the inter-relationship between the four factors, reduced from 16 observed variables, and four risk factors or risk taking behaviors is of major concern, first we draw all possible paths as an initial construct of SEM. It is hereafter referred to as the full model. The fit of full model is not a good one (GFI=0.6808, NFI=0.0452,
CFI=0.0424). Standardized parameter estimates are presented in Figure 4.3.
In order to improve the fit (in terms of goodness-of-fit indices), significant paths from the exogenous variables to the latent factors (terms as the γ-path) need to be identified and non-significant paths to be deleted. Traditional method concerning this
‘model-selection’ procedure is to use the Lagrange multiplier test (a parallel of Rao’s score test in regression set-up when there exists a likelihood) or the Wald test in a stepwise manner. However, when the likelihood of an SEM is written, it involves the whole structure of the SEM, including all the covariance parameters, γ-paths, and λ-paths (from factors to the observed y variables). This implies that the stepwise procedure involves a simultaneous estimation of all parameters, not only theγ-paths (orγ-parameters). In this thesis, we propose that the construct of an SEM as a two-stage procedure, but the estimation is a simultaneous one. In this regard, an
alternative (but naïve) algorithm based on the two-stage thinking is proposed based on the building stone of univariate-multiple linear regression. ‘Univariate’ means that the outcome can be (i) the univariate observed variable, y, or (ii) a combined factor score; ‘multiple’ indicates that the explanatory variables are the set of risk factors (AGE, SMK, DRI, and PEA). For (ii), we use the naïve score proposed by Bartlett (1937) in which factor loadings are substituted by those parameter estimates obtained from the ML estimation of measurement model. There is another factor score suggested by the SAS system and, for the current dataset, scatter-plots (Figure 4.4) of these two factor scores shows that these two scores are high surrogates to each other.
[Put Figures 4.3 and 4.4 about here.]
Univariate linear regression with observed dependent variables
We used the physiological/biochemical variables as outcome variables and risk factors as predictors and proceeded regression analyses. In this process, we considered one responser (dependent variable) at a time, and the model could have many predictors (independent variables), so it is called a univariate multiple regression ana lysis. According to the analysis, we recorded the significant level and
provided some criteria to decide the structural model. The double asterisks represented the p-value less than 0.01, and one asterisk indicated a p-value within 0.01 to 0.05. In each cell, a ‘double asterisks’ is treated as being a full mark. The result of univariate multiple regression analysis was reported in Table 4.2.
Next, total numbers of asterisks of x’s (age, smoking, drinking, and betel nut eating) on every observed variable (y) were counted for each factors (f). After this, various criterion rules can be used. (Note that each criterion rule corresponds to a construct.) Examples of considerations on the rules and their interpretation are as follows.
(1) Additive-1/2 rule: The total number of asterisks is greater than or equal to a half of the possible number of asterisks. In this case, the corresponding γ-path is identified as being important. For example, from Table 4.2, since Factor 1 (f1) consisted of 4 variables, thus there must be 8 possible asterisks in 4 cells for each of the 4 risk factors. As a result, the age-f1 relation has 6 asterisks, reveals that the γ-path from AGE to Factor 1 should be considered. Similarly, theγ-paths from SMOKE and DRINK to Factor 1 are both important, but that from BETEL NUT to Factor 1 is not. This criterion relies on the additive effect of significance attributed from the relationship between risk factors (x) and distinct observed variables (y).
(2) Relative significance rule: If the number of cells (which equals the number of variables related to a factor) with two-asterisks significance level exceeds, or equals to, the total number of cells, the γ-path is considered. This rule is very strict in asking for parsimony in the construct ofγ-path.
(3) Strict additive-1/2 rule: Like the rule of (1) except for that the ‘equal to’-requirement is cancelled.
(4) Absolute significance rule: When the number of cells with two-asterisks significance level exceeds 2, it is also reasonable to treat the factor to be highly attributable to the x variables in the sense that there are genuine contributions from x to the combined observed variables (y) which consisted of the factor (f).
It is important to note that some variants of (1)~(4) or their configurations are also possible. (For details, please refers to the results of Table 4.4.)
Univariate linear regression with latent factor scores
When the naïve factor scores were used as outcome variables, the case of p-value less than 0.01 were further partitioned into two sub-cases: 0.001<p< 0.01 and p<
0.001. We set 3 asterisks to the case of p- value<0.001, 2 asterisks to the case of 0.001<p< 0.01, and 1 asterisk to tha t of 0.01<p< 0.05. The result was presented in Table 4.3.
[Put Tables 4.2 and 4.3 about here.]
According to the results of multiple regression analyses, we borrowed the criterion of relative significance rule. (i) If the number of cells (which equals the number of variables related to a factor) with 2 or more asterisks, or, (ii) if, in a more restrict sense, the number of cells with 3 asterisks exceeds or equals to the total number of cells, the γ -path is considered. The first consideration gives the following goodness-of- fit indices: GFI=0.7853, NFI=0.4782, CFI=0.4828; The second one gives GFI=0.8200, NFI=0.5500, CFI=0.5558.
As a final construct by combining the above results, we obtained an SEM shown in Figure 4.5 with the best goodness-of-fit indices with GFI=0.8445 and AGFI=0.7920.
[Put Table 4.4 and Figure 4.5 about here.]
Adding/deleting correlated error terms stepwisely
In order to obtain a satisfactory fit, in terms of goodness-of-fit indices, we tried to add the path of correlations between error terms of observed variables (y) in the order from high to low level of marginal correlations (though partial correlation also could be considered). We had the following order of correlations: SBP/DBP (0.769), AST/ALT (0.755), Hb/RBC (0.604), BuN/CRE (0.584), CHO/TRI (0.311), RBC/ALB (0.310), Hb/ALB (0.271), TRI/BS (0.271), TRI/UA (0.250), GLO/AST (0.227),
WBC/UA (0.220), WBC/PLA (0.208), CRE/UA (0.206), Hb/UA (0.203). According to this order, we added the path between variables (y) one at a time. This is why we called it a ‘stepwise’ manner. It is very different from the standard procedure suggested by most of the statistical packages that the selection of path is decided by a Wald test or a Lagrange multiplier test. The reason is, as stated previous ly, we seek to add the path(s) by a ‘two-stage’ manner. After the global structure is constructed, we can add the correlation terms by considering the ‘correlations’ between the observed variables (y) to raise the goodness-of- fit indices (GFI or AGFI, etc.) to a acceptable level. As suggested by this thesis, the primary pairwise (marginal) correlations can be used in a descending order. The result is shown in Table 4.5, in which adding the marginal correlation greater than 0.2 will finally give a GFI index greater than 0.90.
On the other hand, if the Lagrange multiplier test is used from this stage (as suggested by the statistical packages) without regards to the parts other than the correlations between error terms, a model-building procedure can also be adopted. We contrasted these two procedures, in terms of the GFI/AGFI index, by Figures 4.6 and 4.7, respectively. It demonstrates the growth rate of GFI/AGFI and tells the betterment of our procedure at the early inclusion of higher correlations. Nonetheless, the Lagrange multiplier test still gives better fits from some step although it still falls into the framework of ‘two-stage’ modeling. Before a ‘final’ model is obtained, we can still investigate the ‘lack-of-fit’ problem in what the change is when deleting a correlation between the observed variables (y). The results are reported in Table 4.6 and Figure 4.8.
Finally, the magnitude of GFI-change when deleting one path of correlation in a
‘backward’ manner is shown in Figure 4.8; and a ‘final’ model is given in Figure 4.9.
[Put Tables 4.5 and 4.6 about here.]
[Put Figures 4.6, 4.7, 4.8, and 4.9 about here.]