Chapter 3. METHODS
3.7 Statistical analysis
The number of participants excluded in each analyses are detailed in Table 3-1. We excluded those with extreme energy intake (men: <800kcal/d or >4000kcal/d; women:
<500 kcal/d or >3500kcal/d) when analyzing dietary components assessed by FFQ, as
extreme energy intake may indicate inaccurate response to FFQ or inability of the FFQ to capture the actual diet of the participants. Smokers and habitual alcohol drinkers were excluded from the analysis as smoking may modify the effect of diet on diabetes(101), and alcohol drinking tend to be closely associated with smoking. Those with self-reported history of cancer, coronary heart disease, and stroke were excluded because diet therapy is likely initiated after the diagnosis of these diseases. For analyses on nonalcoholic fatty liver, those with hepatitis B and hepatitis C were further excluded because these conditions may also influence fatty liver(127,128).
For comparison of baseline demographic characteristics, continuous variable were compared using independent sample t-tests (for two groups) or analysis of variance (for
more than two groups); categorical variables were compared using Chi-square test or Fisher’s exact test (for any cell value less than 5). Nutrient and food intakes were
compared using Wilcoxon two sample tests due to the non-normal distribution.
Binary logistic regression was used to study the association between vegetarian diet and metabolic syndrome, while adjusting for age, sex, education, and LTPA, smoking and alcohol drinking. Subgroup analyses on men, premenopausal women, and post-menopausal women were also performed.
Polytomous logistic regression was used to compare the cross-sectional association between vegetarian diet and three stages of glucose metabolism: normal (fasting glucose
< 100 mg/dL), impaired fasting glucose (IFG, fasting glucose: 100mg/dL to 125
mg/dL), and diabetes (two fasting glucose≧126 mg/dL or self-reported diabetes), while adjusting for age, family history of diabetes, education, LTPA, smoking (men only) and alcohol (men only) in Model 1. Model 2 additionally adjusted for BMI. Analysis were conducted separately for men, premenopausal women, and post-menopausal women.
For the association between nonalcoholic fatty liver and vegetarian diet / food groups, we used binary logistic regression while adjusting for age, gender, education, history of smoking, history of alcohol drinking in Model 1. Model 2 additionally adjusted for BMI. The effect of substituting one food for another on nonalcoholic fatty liver is also performed using logistic regression, in which one of the foods, and the sum of both foods were included as independent, continuous variables in the model, while adjusting for potential confounders(129):
Logit (P) = β0 + β1*meat +β2*(soy + meat) + i i
i
z
where P is the probability for a person to have fatty liver,
z is covariate
ii
.In the above model, β1 is equivalent to increasing 1 serving of meat while holding the
total of meat and soy constant (as this value is controlled for in the model). Since the total of meat and soy is held constant, increasing 1 serving of meat means
simultaneously decreasing 1 serving of soy. Therefore, β1 represents the effect of
substituting a serving of soy (7g protein equivalent) with a serving of meat (7g protein equivalent) on loge (P/(1-P)). The same method was applied to all substitution analyses.
General linear model was used to compare change in weight between different dietary patterns while adjusting for baseline age, and education, LTPA, and followed months. Analysis for men and women were conducted separately.
Stratified Cox proportional hazards regression (stratified by follow-up methods and LTPA as the interaction term of these variables and time violated the proportional hazard assumption) was used to analyze the association between dietary patterns and risk of diabetes, with follow-up time as the underlying time scale, while adjusting for age sex, education, family history of diabetes, LTPA, methods of follow-up
(questionnaire only vs health examination) in Model 1. Model 2 additionally adjusts for BMI to estimate the protective effect independent of BMI (a mediator). Time of disease occurrence was set to be the time that the first abnormal glucose was identified (HbA1c
≧ 6.5% or fasting blood glucose ≧ 126 mg/dL). For participants who reported
diagnosis of diabetes at questionnaire but could not remember the time of diabetes diagnosis, censor time was set to be half-way between the previous known disease-free time point and the follow-up time in which diabetes was reported. For those who did not report having diabetes in the questionnaire, but were found to have diabetes during
health examination, the date of health examination was used as the date of disease onset.
Several sensitivity analyses were performed: (1) 25 unconfirmed diabetes event were treated as diabetes cases. (2) To ensure our result was not affected by detection bias from different follow-up methods (health examination vs questionnaire-only), we performed another sensitivity analysis in which only self-reported diabetes were counted as cases. (3) We adjusted for metabolic syndrome in addition to Model 2. (4) Among those with weight measurements at follow-up, we additionally adjusted for change in weight or change in BMI on top of Model 2, to test whether weight change has any effect on diabetes risk.
Among those with consistent diets (included consistent vegetarians and nonvegetarians; excluded the reverted and the converted), we conducted additional analyses on the association between diabetes and food groups (meat, fish, soy, eggs, dairy, whole grains, refined grains, vegetables, fruits). All these food groups were adjusted for energy using residual method(129) and put simultaneously as independent continuous variables into Cox regression model, adjusting for sex, education, family history of diabetes, LTPA, follow-up methods, calories, and BMI, while excluding participants with extreme caloric intakes and participants with censor age <50 years old (to prevent violation of proportional hazard assumption). All analysis were conducted using SAS Statistical Software (version 9.4, SAS Institute, Cary, NC).
CHAPTER 4. RESULTS