• 沒有找到結果。

Chapter 3 Materials and Methods

3.4 Covariates

All confirmed cases and related factors were prospectively collected in the database.

Patient characteristics: age, gender, date of birth, admission, diagnosis of HAI,

discharge, transfer to ICU, and death

Infection-related covariates: infection sites, cultures sampled date, indwelling

47

bladder catheter and central venous catheter, microorganism cultures

Disease conditions: the International Classification of Diseases (Ninth Revision)

Clinical Modification (ICD-9-CM) codes of diagnoses, admission types (from emergency or outpatient department), admitted medical departments, numbers of HAIs episodes during each admission, and cause of death.

Intervention programs indicators: Intervention of PDCA, Hygiene programs,

Taiwan Centers for Disease Control (CDC) National Hand Hygiene Campaign and the urinary tract infection quality improvement program of Taiwan Joint Commission on Hospital Accreditation (TJCHA) called CDC/TJCHA, and Bundle care program.

48 Chapter 4 Model specification

Generalized linear time series models were used for analysis of HAIs incidence. Moreover, Bayesian state-space model was used for evaluating the time-dependent incidences and for forecasting future incidences in HAIs. Both were delineated as follows.

4.1 Generalized linear time series model

To accommodate various distributions of the outcome of interest covering from continuous to discrete types, a generalized linear time-series model following Zeger et al74 study and applied to Japanese encephalitis by Hsu et al13 was proposed in my thesis (Figure 4.1.1, Figure 4.1.2).

Under the context of GLM, three components are often defined, including (1) Random component

It is the distribution of the outcome Y and expressed by E(Y) = μ (2) Link function

It is a function defined by h(μ), which can take varies kinds of distributions, mainly including identity function for normal distribution, logistic function for binomial distribution, logarithm function for Poisson distribution and so on.

(3) Systemic components

This part includes the major components covered in time-series analysis such as

49

seasonal variation, time trend, the cyclic change, and autoregressive orders. The

generalized form is written as follows.

h(μ) = η (4-1)

η represents systematic components. If the outcome of Y is specified by a

Poisson distribution, the equation (4-1) is expressed by

log 𝜇 = ln(𝑃𝑌) + 𝛽0+ 𝛽1𝑋1+𝛽2𝑋2+ ⋯ + 𝛽𝑝𝑋𝑝+ 𝑔(𝑡) + 𝑠(𝑡) +

𝑙𝑗=1𝑟𝑗𝑦𝑡−𝑗 (4-2)

𝑋1, . . , 𝑋𝑝 represent covariates such as age and gender.

𝑔(𝑡) is a polynomial function of time trend.

𝑠(𝑡) is a function of seasonality. Trigonometric function is one of choices. Here,

we used dummy variables to denote spring, summer, autumn, and winter denoted by 𝑠1− 𝑠3. Trigonometric function is used in Bayesian approaches for

comparison.

50

4.2 Decomposition method with generalized linear time-series model

We extended the time series to the Poisson seasonal model in order to have a better fit with the HAIs incidence data. The algorithm consists of four steps. Firstly, the incidence was checked for its seasonality and de-seasonalized if any. Secondly, the residual of the first step was used for trend detection. De-trend procedure was applied if trend existed. Third, cycling was checked if any. Finally, the left residual of the above three steps was checked for discernible patterns or autocorrelations.

The algorithm is written as follows:

Step1.

log (𝐸(𝑦𝑡|𝑥1, 𝑥2, 𝑥3, … , 𝑥𝑝, 𝑠1, 𝑠2, 𝑠3, 𝑠4, 𝑡))

= ln(𝑃𝐷) + 𝛼2+ 𝜔1𝑠1+ 𝜔2𝑠2+ 𝜔3𝑠3+ 𝜔4𝑠4 + 𝜀1

(4-3)

Step2.

𝐸(𝜀1|𝑡) = 𝛼2+ 𝛿1(𝑡 − 𝑡̅) + 𝛿2(𝑡 − 𝑡̅)2 + 𝛿3(𝑡 − 𝑡̅)3+ 𝜀2 (4-4)

Step3.

𝐸(𝜀2|𝑐) = 𝛼3+ 𝛽1𝑥1+ ⋯ + 𝛽𝑝𝑥𝑝+ 𝜀3 (4-5)

51 Step4.

Residual plots for 𝜀3 to check

(a) Autocorrelations: autoregressive order, sample autocorrelation function plots (SAC) using ARIMA method, Ljung-Box statistics.

(b) Normality: Normal plot

4.3 Bayesian Dynamic linear models (DLM)

We are not only interested in the population microbiology on understanding temporal fluctuations in abundance of specific microorganisms, but also in the impact on its relation to the diseases. For instance, as the antibiotics-resistant microorganism grows rapidly, it may increase the possibility to invade the hosts, causes diseases, and the incidence of diseases will increase. However, this is the simplified model for microorganism-host interactions. There are still several interactions between microorganisms, environment, and hosts. Factors involving in the interactions make the interpretation of incidence-microorganism abundance status a complex phenomenon. Crucially, our understanding of disease incidence is also affected by errors in the observation process. Separating true microorganism abundance (biological signals) from observation error in data is of great importance.

52

In this study, we aimed at evaluating the population dynamics of HAIs

microorganisms and its relationships with disease incidence. The disease incidence survey of HAIs time series sometimes include errors, and yet most previous studies have not explicitly been able to account for errors. Therefore, analysis that can manage the source of errors are needed in disease control.

Following the dynamic linear model mentioned in the literature in Chapter 2, we have used state-space model based on the equation (2-21) and (2-22).

To simplify the notation, we use state-space model with the form expressed by

𝑦𝑡 = 𝑋𝑡𝑇𝛽 + 𝑒1𝑇𝐶𝑡 (4-6)

, which is so-called observation equation, and

𝐶𝑡 = 𝐺𝐶𝑡−1+ 𝑒1𝑣𝑡 (4-7)

, which is so-called state equation. 𝑒1is a px1 vector with a one in the first row and zeros elsewhere. Note that 𝑌𝑡is the observed number of disease, 𝐶𝑡 is the true pathogen state. 𝑌𝑝is expressed as

𝑌𝑝 = 𝑋𝑝𝛽 + 𝐶𝑝 (4-8)

53

Σ𝑝can be written in terms of ϕ (autocorrelation) with vector form like the

following

Vec(Σ𝑝) = Vec(GΣ𝑝𝐺𝑇) + Vec(𝑒1𝑒1𝑇)

= (G ⊗ G)Vec(Σ𝑝) + Vec(𝑒1𝑒1𝑇)

= [I − (G ⊗ G)]−1Vec(𝑒1𝑒1𝑇) (4-10)

The joint posterior distribution of the parameters is

P(β, ϕ, 𝜎𝑣2) ∝ (1

54 The derivation of P(𝜎𝑣2|y, β, ϕ) is straightforward by assuming that the prior distribution for 𝜎1

The conditional posterior distribution of β and 𝜎𝑣2 can be simulated by Gibbs sampling scheme.

To reflect the dependence of Σ𝑝 on ϕ with Σ𝑝(ϕ), let 𝑦𝑡 = 𝑦𝑡− 𝑋𝑡𝑇𝛽

𝑦𝑡 = 𝜙1𝑦𝑡−1 + ⋯ + 𝜙𝑝𝑦𝑡−𝑝 + 𝑣𝑡

55

Where 𝑆𝜙 is the region in which the process is stationary. Note that this distribution is a non-standard distribution but it may be sampled with an Metropolis-Hasting algorithm after specifying the prior distribution of 𝜙like 𝜙~N(𝜙0, Φ0). A proposal

candidate density may be obtained by multiplying 𝑃(𝜙)by the terms involving 𝑦𝑡, 𝑡 > 𝑝 with the distribution specified by 𝜙~N(𝜙̂, Φ̂ ).

56

stationary region, which is subject to the usual Metropolis-Hasting acceptance region.

The state-space form consists of the observation equation when regression

coefficients evolve randomly through time following the equations (4-6) and (4-7).

𝑦𝑡 = 𝑋𝑡𝑇𝛽𝑡+ 𝑣𝑡 (4-17)

and the transition equation

𝛽𝑡 = 𝛽𝑡−1+ 𝑒𝑡 (4-18)

where t = 1, … , T, 𝑦𝑡 and 𝑣𝑡 are scalars, 𝑋𝑡, 𝛽𝑡, and 𝑒𝑡 are k x 1 vectors,

𝑣𝑡~𝑁(0, 𝜎2), and 𝑒𝑡~𝑁𝑘(0, Σ). 𝑣𝑡 and 𝑒𝑡 are assumed to be independent with each

other through time.

To generalize this model as a vector autoregression, this model is defined as Y = (𝑦𝑇, 𝑦𝑇−1, … , 𝑦1)𝑇

This likelihood function is

𝑓(𝑦|𝛽, 𝜎2) ∝ 1 𝜎2

𝑇2

exp [− 1

2𝜎2(𝑦 − X𝛽)𝑇(𝑦 − X𝛽)] (4-19)

57

The prior distribution based on the transition equation for β can be defined by the kT x kT matrix F

The priors for the remaining parameters are adopted with standard forms such as

1

𝜎2~Gamma (𝛼20,𝛿20) and Σ−1~𝑊𝑘(𝑢0, 𝑠0).

The joint posterior distribution is P(β, 𝜎2, Σ|y) ∝𝜎12

The following conditional posterior distributions can be sampled with a Gibbs

algorithm.

58

4.4 Bayesian Generalized Time-series Model

We can also combine observation equation with state-space equation to develop a unified framework for the extension of DLM model to generalized linear time-series model with Bayesian underpinning.

59

4.4.1 Bayesian Generalized Autoregressive Poisson Regression Model

Under the context of generalized linear model together with autoregressive model proposed in Chapter 2, a p-order autoregressive Poisson regression model is proposed to include autoregressive order, two classical components of time-series model (time trend and seasonal variation),

log μt = β0+ β1S1+ β2S2+ β3S3+ β4f(t) + β5X + ϕ1yt−1+ ϕ2yt−2 + ⋯ + ϕβyt−p+ ut

(4-23)

μt = E(yt): The expected count of HAIs

f(t) represents the p polynomial time trend function

X includes age, gender, intervention, department and site of infection

The Doodle of Bayesian Directed Acyclic Graphic Model (DAG)

The proposed model is framework in the doodle of Bayesian directed acyclic graphic model (DAG).

60 Doodle

f(t) represents the p polynomial time trend function

X includes age, gender, intervention, department and site of infection

γ(t) is the observed count of HAI at time t. It is determined by μ(t), expected numbers of counts, with Poisson distribution (μt). The parent node of γt is the μt. μt is linked through a logical expression governed by the equation(4-23). βs and ϕs are the corresponding regression coefficients.

The posterior distribution of P(θ|y) is formed in proportion to P(θ), prior distribution, and the likelihood function ℓ(y|θ), where θ = (β1− βk, ϕ1− ϕp) .

61

4.4.2 Bayesian Generalized Moving Average Model

In a similar vein, a q-order moving average Poisson regression model is proposed with the model form like the following,

log( μt) = β0+ β1S1+ β2S2+ β3S3+ β4f(t) + β5X + ut− θ1ut−1

− θ2ut−2− ⋯ − θput−p

(4-24)

The doodle of Bayesian DAG model and also the posterior distribution are derived in a similar manner as done for Bayesian AR process.

4.4.3 Bayesian Generalized ARMA Poisson Regression Model Fixed –effect Model correlated property, partly because of hierarchical (multilevel) data and partly because of the spread of transmission infection disease. To cope with this correlated property, we resort to the adoption of random-effect approach expressed as a linear mixed model and used in a longitudinal follow-up study.

The model form is modified from the equation (4-24) and expressed as follows.

62

logit μt = β0k+ β1S1+ β2S2+ β3S3+ β4f(t) + β5X + ϕ1yt−1+ ϕ2yt−2 + ⋯ + ϕβyt−put− θ1ut−1− θ2ut−2− ⋯ − θput−p

(4-26)

β0k~Ν(0, σk2)

k represent department (hierarchy) or site of infection.

This model is often so-called random-intercept model in a classical mixed model.

Doodle

63 4.5 Estimation of parameters

There are two main methods to estimate the parameters of generalized time series model and dynamic linear model, including maximum likelihood method and Bayesian approach.

4.5.1 Maximum Likelihood Estimate (MLEs)

We can specify the probability distribution of the data given the states and parameters for the observation equation and specify the probability distribution of the state conditional on the previous state in the previous time point in the process

equation. The maximum likelihood estimates (MLE) of the parameters can be obtained provided data and initial state available. However, in some situations, in nonlinear and non-Gaussian distributions, MLE is not easily feasible.

4.5.2 Bayesian Markov chain Monte Carlo methods

In Bayesian inference, the posterior distributions of the parameters are often analytically intractable in that it is difficult to derive in closed form summaries of the posterior. To deal with this problem, one of the methods is to resort to simulation methods. Complicated computational problems also arise in the condition of

non-64

linear distributions. Bayesian simulation-based methods such as Markov chain Monte Carlo (MCMC) approach with Gibbs sampling and Metropolis-Hasting algorithm for Bayesian dynamic models.

Markov Chain Monte Carlo (MCMC) Method75

The MCMC method is to generate stationary distribution underpinning the Markov chain model for which the parameter θ in the time j +1 follows the transition kernel K(θ | θ (j)), indicating that the distribution of parameters in the time j +1 is only dependent on the distribution of parameter θ in time j. According to the theory of Markov chain, the Markov chain under regular situations may reach the equilibrium and independent of initial distribution after a long run of transitions. This implies that

if a stationary distribution, S(θ), can be identified one can infer if θj comes from S(θ).

θ j + 1 also comes from S(θ).

Gibbs Sampler

The posterior distribution for the random effect model consists of all parameters involved, 1-5, 1-p, 1-p, b is denoted by

65 p (θ | y) = p(1-5, 1-p, 1-p, b| y) --- (1)

The Gibbs sampler is a method of estimating these marginal posterior distributions.

The procedure is as follows.

Suppose θ is k-dimension denoted by θ = {1-5, 1-p, 1-p, }

1) Assign starting value

θ 0 = {1 (0),..., p (0), τ(0) }

2) Step 1: Update 1 by sampling

1(1) ∼ P{1 | 2(0),..., p(0), τ(0)}

3) Step 2: Update 2 by sampling

66

2(1) ∼ P{2 | 1(1), 3(0),..., p(0), τ(0) }

4) Step 3: Update 3 by sampling

3(1) ∼ P{3|1(1), 2(1), 4(1)... , p(0), τ(0) }

………..

……….

4) Step k: Update τ by sampling

τ(1) ∼ P{τ | 1(1), 2(1), 4(1)... , p(1) }

This completes one iteration of the Gibbs sampler, which yields a new realization for θ which is given by 1(1), 2(1), 3(1)... , p(1), τ(1). The above procedure is then repeated to get a second realization 1(2), 2(2), 3(2)... , p(2), τ(2). This can be obtained by

sampling 1(2) from the full conditional probability of 1

67 P{1 | 2(1), 3(1),..., p(1), τ(1) }

Next, 2(2) is sampled from the full conditional probability:

P{2 | 1(2), 3(1),..., p(1), τ(1) }

This process continues until τ(2) is sampled from the full conditional probability:

P{τ | 1(2), 2(2),... , p(2) }

This completes the second iteration.

This whole process is repeated for r iterations. The final iteration yields the

realization 1(r), 2(r),..., p(r), τ(r). Geman and Geman show that, for very large values of r, the distribution of the sample 1(r), 2(r),..., p(r), τ(r) becomes close to the marginal posterior distribution of 1, 2,..., p, τ.

In addition to marginal posterior density, disease prediction using the Bayesian

68 approach needs predictive distribution.

The fundamental idea of getting predictive distribution is that given the observed

data r one can integrate out relevant regression parameters to get predictive distribution, P(new |), which is given by

p(𝜇𝑛𝑒𝑤|𝜇) = ∫ p(𝜇𝑛𝑒𝑤|𝜃, 𝜇)p(𝜃|𝜇)𝑑𝜃

However, like marginal posterior density, to integrate out relevant parameters requires high-dimension integration. Again, the MCMC method is tailored for such an

intractable computation.

4.5.3 Model selection for Poisson autoregressive model

Akaike’s information criterion (AIC) was used for model selection in the Poisson

autoregressive model 76. The last selected model were based on minimizing AIC.

Additionally, the residuals of the models were checked whether there were

autocorrelations by Ljung-Box statistics methods and any patterns by residual plots 77. Model selection for Bayesian dynamic linear model was based deviance information criteria (DIC).

69 Chapter 5 Results

5.1 Basic results

A total of 552,233 admissions were registered during the period from 1 January, 1994 to 31 December, 2013. 6,776 (1.2%) admissions were excluded either because information on variables (date of birth or department of admission) were unavailable (n=254) or the admission was indicated for a health check-up or consultations only (n=6522). Therefore, 545,457 admissions from 315,209 patients with 4,210,609 patient-days were available for analysis in this study. Of these, there were 10,117 patients with healthcare-associated infection and 20,221 cultures were identified.

There were 2,965 deaths ascertained within 30 days after the onset of HAIs (Figure 5.1.1). Figure 6 shows the incidence rates of HAI by calendar year. The median length of stay for HAI patients was 35 days (IQR: 22-58 days, range:1-4,033 days). The median length of stay was 5 days (IQR: 3-8) for non-HAI patients. Males accounted for 52.29% of cases (8,450 episodes) with a median age of 68.00 years (IQR: 54.73-76.94); and females accounted for 47.71% of cases (7,711 episodes) and had a median age of 71.94 years (IQR: 60.92-80.25).

The HAIs cohort was systematically collected between 1994 and 2013 that was continued in a 921-bed teaching medical center hospital in Taipei. Total admissions

70

was about 25,000-28,000 every year. There was no substantial change in the hospitalized admission by year. The healthcare-associated infectious (HAI) cases among those admissions were around 900-1,200 by year (Table 1). The range of HAIs incidence rates of were about 2.86-5.16 per 1000 patient-days for every calendar year.

The trend of overall HAI incidence between 1994 and 2013 showed obvious decline, especially in the last two years (see Table 5.1.1). The detail statistical results were demonstrated in Table 5.1.2. The incidence rate of HAI in male was slight higher compared with female. Regarding the admission department, the highest HAI incidence rate was revealed in Department of Oncology, followed by GI, Pediatric, and Chest. The Incidence rates were different from seasons. The HAI incidence rate in summer was higher than others. The incidence rate was highest for urinary tract infections (1.59 episodes per 1,000 patient-days), followed by bacteremia (1.09), pneumonia (0.72), and surgical site infection (0.45).

In multi-variable Poisson model in evaluating the causes of incidence rate, age was a significant factor. The relative risk was highest in the age group of over 80 years old. The lowest risk age group was between 20 to 29 years old. Male patients had higher relative risk of contracting HAIs than female patients. Emergency

department staying had higher incidence risk of HAIs compared to other departments.

71

Patients admitted as an emergency had a higher risk of HAIs compared to those admitted from outpatient departments. After adjusting for age, gender, and the department of hospitalization, patient admission type (outpatient or emergency) and sites of infection were still associated with HAIs incidence (Table 5.1.3).

5.2 Decomposition method with Generalized Linear Time-series Model

among selected infection sites and species

Based on Table 5.1.3, we focused on four infection sites and the two species of Gram (-) based on the longitudinal follow-up for the incidence of HAI from 1994 to 2013.

(A) Overall HAI incidence

The longitudinal follow-up for the incidence of HAI from 1994 to 2013 was shown in Figure 5.2.1. The monthly time series of the HAIs count was shown in Figure 5.2.4. Seasonality was noted in the HAIs incidence. The incidence was higher in the summer and spring than that in the winter (p<0.0001) (Table 5.2.1). The relative risk was higher in the summer than in the winter. Figure 5.2.5 is the time series after de-seasonalization of the HAI, and there were five outliers noted in this figure. After

72

deleting the outliers, there was a significant third degree polynomial trend (Table 5.2.2). The autoregression analysis for the de-seasonalized and de-trend time series revealed lacking of autoregressive order and white noise (Table 5.2.3).

(B) Bacteremia incidence

The monthly time series of the healthcare-associated bacteremia count was shown in Figure 5.2.7. Seasonality was noted. The incidence was higher in the summer and spring than that in the winter (p<0.0001) (Table 5.2.4). The relative risk was 19% higher in the summer than in the winter, and 16% higher in the spring than in the winter. Figure 5.2.8 is the time series after de-seasonalization of the HAI bacteremia, and there were five possible outliers noted in this figure. After deleting the outliers, there was a significant linear trend (Table 5.2.5). The analysis for the de-seasonalized and de-trend time series revealed lacking of autoregressive order (Table 5.2.6).

(C) Pneumonia incidence

The monthly time series of the healthcare-associated pneumonia count was shown in Figure 5.2.10. Although the univariate seasonal variation analysis was not

73

significant, the incidence tended to be higher in the summer than in the winter (Table 5.2.7). Figure 5.2.11 is the time series after de-seasonalization of the HAI pneumonia, and there were three outliers noted in this figure. After deleting the outliers, there was a significant linear trend (Table 5.2.8). The analysis for the seasonalized and de-trend time series revealed second-order autoregressive pattern (Table 5.2.9).

(D) Surgical site infection incidence (SSI)

The monthly time series of the healthcare-associated SSI count was shown in Figure 5.2.13. Seasonality was noted. The incidence was higher in the summer and spring than that in the winter (p=0.0029) (Table 5.2.10). The relative risk was 20%

higher in the summer than in the winter, and 24% higher in the spring than in the winter. Figure 5.2.14 was the time series after de-seasonalization of the healthcare-associated SSI, and there were five outliers noted in this figure. After deleting the outliers, there was a significant linear trend (p<0.0001) (Table 5.2.11). There was lacking of autoregressive pattern for the de-seasonalized and de-trend time series (Table 5.2.12).

(E) Urinary tract infection incidence (UTI)

74

The monthly time series of the healthcare-associated UTI count was shown in Figure 5.2.16. Seasonality was noted. The incidence was higher in the summer and spring than that in the winter (p<0.0001) (Table 5.2.13). The relative risk was 16%

higher in the summer than in the winter, and 11% higher in the spring than in the winter. Figure 5.2.17 was the time series after de-seasonalization of the healthcare-associated UTI, and there were four outliers noted in this figure. After deleting the outliers, there was a significant third degree polynomial trend (Table 5.2.14). The analysis for the de-seasonalized and de-trend time series revealed fourth-order autoregressive pattern (Table 5.2.15).

(F) E. coli Bacteremia incidence

The monthly time series of the healthcare-associated E. coli bacteremia count was shown in Figure 5.2.18. Seasonality was not significant (Table 5.2.16). Figure 5.2.19 was the time series after de-seasonalization of the HAI E. coli bacteremia, and there were four outliers noted in this figure. After deleting the outliers, there was a significant quadratic trend (Table 5.2.17). The autoregression analysis for the de-seasonalized and de-trend time series revealed second-order autoregressive pattern (Table 5.2.18).

75 (G) P. aeruginosa Bacteremia incidence

The monthly time series of the healthcare-associated P. aeruginosa bacteremia count was shown in Figure 5.2.20. Seasonal variation was not significant (Table 5.2.19). Figure 5.2.21 is the time series after de-seasonalization of the HAI P.

aeruginosa bacteremia, and there were four outliers noted in this figure. Despite

deleting the outliers, there was no significant trend (Table 5.2.20). The autoregression analysis for the de-seasonalized and de-trend time series revealed lacking of

autoregressive pattern (Table 5.2.21).

Autoregressive integrated moving-average model (ARIMA)

The traditional ARIMA time series analysis for the original overall HAI

incidence revealed an order one moving average after one differencing of the original incidence time series. The model for bacteremia, pneumonia, and surgical site

infections incidence was ARIMA (0,1,1). The ARIMA model for UTI was ARIMA (1.0.2).

In the bacteremia, the E. coli incidence shows no significant autoregressive or moving average order. In P. aeruginosa incidence, both the order two autoregressive

76

and moving average order were significant, the absolute value of the higher order was small than that of the low order are also shown in (Table 5.2.22, Table 5.2.23).

5.3 Generalized Time-series Model with covariates, seasonality, time trend,

and autoregressive order: A MLE approach

and autoregressive order: A MLE approach