最大似然法在時間序列資料分析的應用與全身麻醉藥（三氟氯溴乙烷）在細胞膜內的分子動力學模擬

(1)

國立臺灣大學理學院物理學系碩士論文

Department of Physics College of Science

National Taiwan University Master Thesis

最大似然法在時間序列資料分析的應用與全身麻醉藥

（三氟氯溴乙烷）在細胞膜內的分子動力學模擬

Ｍaximum Likelihood Method in Time Series Data Analysis and Molecular Dynamics Simulations of a

General Anesthetic inside a Membrane

涂楷旻 Kai-Ming Tu

指導教授：龐寧寧博士 Advisor: Ning-Ning Pang, Ph.D.

共同指導教授：梁國淦博士 Coadvisor: Kuo-Kan Liang, Ph.D.

中華民國 97 年 7 月 July, 2008

(2)

Acknowledgments

I am very grateful to my supervisor, Prof. Pang for her guidance, to Dr. Liang for his kind assistance, to Dr. Chau for his patience and help, to Prof. Tzeng and Prof. Lu for their advice and to Prof. Matubayasi for his clear explanation of his method. Also, I would like to thank my family and my girlfriend for their unwavering support.

For the second part of this thesis, Dr. Chau wrote the program and did most of the calculations for 1 atm and 400 atm, and Prof. Matubayasi did all the analysis from energy distribution histograms to energy values. This work has been accepted for publication in Chemical Physics Letters on July 11th, 2008.

(3)

致致致謝謝謝

非非

非常常常感感感謝謝謝我我我的的的指指指導導導老老老師師師，，，龐龐龐教教教授授授對對對我我我的的的諄諄諄諄諄諄教教教誨誨誨，，，梁梁梁老老老師師師給給給我我我的的的慷慷慷慨慨慨協協協助助助，，，周周周博博博士

士

士對對對我我我的的的耐耐耐心心心幫幫幫忙忙忙，，，兩兩兩位位位口口口試試試委委委員員員曾曾曾教教教授授授及及及陸陸陸教教教授授授提提提供供供的的的寶寶寶貴貴貴意意意見見見，，，以以以及及及松松松林林林教教教授

授

授對對對於於於他他他發發發展展展的的的方方方法法法的的的詳詳詳細細細解解解釋釋釋。。。我我我也也也要要要感感感謝謝謝我我我的的的家家家人人人以以以及及及女女女朋朋朋友友友對對對我我我不不不變變變的的的支支支持

持

持，，，你你你們們們是是是我我我前前前進進進的的的力力力量量量。。。此

此此篇篇篇論論論文文文第第第二二二部部部份份份裡裡裡，，，程程程式式式的的的撰撰撰寫寫寫以以以及及及 1 大大大氣氣氣壓壓壓、、、400 大大大氣氣氣壓壓壓大大大部部部分分分的的的計計計算算算是是是由

由由周周周博博博士士士完完完成成成，，，而而而松松松林林林教教教授授授則則則負負負責責責由由由能能能量量量分分分佈佈佈得得得出出出自自自由由由能能能的的的分分分析析析。。。這這這部部部份份份的的的工工工作作作已

已已在在在 2008 年年年 7 月月月 11 日日日被被被 Chemical Physics Letters 接接接受受受.

(4)

Abstract

Two topics are covered by this thesis: the maximum likelihood method and molecular dynamics simulations of general anesthetics, halothane.

For the first topic, I introduce the maximum likelihood method and its application in analyzing time series data, and then test it with simulated data.

For the second topic, I introduce molecular dynamics (MD) simulations and the method of energy representation to general anesthetic research. Using the MD simulations and the method of energy representation, I have calculated the free-energy change of inserting a halothane molecule into different depths of a hydrated dimyristoylphosphatidylcholine (DMPC) bilayer at pressures of 1 atm, 200 atm and 400 atm. It is found that halothane preferentially resides in the region between the headgroup and the lipid tails, between 10 ˚A and 15 ˚A from the centre of the membrane. It is also found that pressure has no detectable effect on the free-energy change of inserting a halothane from bulk water to DMPC, and does not change the regional preference of halothane, either.

Keywords: maximum likelihood, data analysis, energy representation, molecular dynamics simulations, halothane, DMPC, free-energy change

(5)

摘摘摘要要要

此

此此篇篇篇論論論文文文包包包含含含兩兩兩個個個部部部份份份：：：最最最大大大似似似然然然法法法以以以及及及全全全身身身麻麻麻醉醉醉藥藥藥（（（三三三氟氟氟氯氯氯溴溴溴乙乙乙烷烷烷）））的的的分分分子子子動動動力

力

力學學學模模模擬擬擬。。。第第

第一一一部部部份份份，，，介介介紹紹紹最最最大大大似似似然然然法法法在在在分分分析析析時時時間間間序序序列列列資資資料料料上上上的的的應應應用用用，，，並並並且且且以以以電電電腦腦腦模模模擬擬擬的的的資

資

資料料料加加加以以以測測測試試試。。。第

第

第二二二部部部份份份，，，介介介紹紹紹分分分子子子動動動力力力學學學模模模擬擬擬以以以及及及「「「交交交互互互作作作用用用能能能量量量泛泛泛函函函法法法」」」應應應用用用在在在全全全身身身麻麻麻醉醉醉藥

藥藥的的的研研研究究究。。。利利利用用用分分分子子子動動動力力力學學學模模模擬擬擬以以以及及及交交交互互互作作作用用用能能能量量量泛泛泛函函函法法法，，，我我我計計計算算算了了了在在在不不不同同同壓壓壓力力力下

下

下將將將三三三氟氟氟氯氯氯溴溴溴乙乙乙烷烷烷（（（Halothane）））溶溶溶入入入 dimyristoylphosphatidylcholine（（（DMPC）））水水水合

合

合物物物的的的自自自由由由能能能改改改變變變，，，壓壓壓力力力分分分別別別為為為 1 大大大氣氣氣壓壓壓、、、200 大大大氣氣氣壓壓壓以以以及及及 400 大大大氣氣氣壓壓壓。。。結結結果果果顯顯顯示

示

示三三三氟氟氟氯氯氯溴溴溴乙乙乙烷烷烷偏偏偏好好好待待待在在在 DMPC 的的的前前前頭頭頭原原原子子子群群群與與與碳碳碳鏈鏈鏈尾尾尾端端端之之之間間間的的的區區區域域域，，，介介介於於於距距距細

細細胞胞胞膜膜膜中中中央央央 10 埃埃埃至至至 15 埃埃埃的的的距距距離離離。。。另另另外外外也也也發發發現現現壓壓壓力力力對對對於於於三三三氟氟氟氯氯氯溴溴溴乙乙乙烷烷烷從從從純純純水水水溶溶溶入入入 DMPC 的的的自自自由由由能能能改改改變變變所所所造造造成成成的的的影影影響響響很很很小小小，，，且且且對對對於於於三三三氟氟氟氯氯氯溴溴溴乙乙乙烷烷烷的的的偏偏偏好好好區區區域域域沒沒沒有有有可可可觀

觀

觀的的的影影影響響響。。。關關

關鍵鍵鍵詞詞詞：：：最最最大大大似似似然然然法法法，，，資資資料料料分分分析析析，，，交交交互互互作作作用用用能能能量量量泛泛泛函函函法法法，，，分分分子子子動動動力力力學學學模模模擬擬擬，，，全全全身

身

身麻麻麻醉醉醉藥藥藥，，，三三三氟氟氟氯氯氯溴溴溴乙乙乙烷烷烷，，，自自自由由由能能能變變變化化化

(6)

Part I

Maximum Likelihood Method in

Time Series Data Analysis

(11)

Chapter 1 Introduction

When seeking the mechanism behind Nature, we often have to analyze a series of observations ordered in time. This is called a time series.

The data in a time series have an internal structure: the observables could be random, or there may be correlation between the observables. The structure of the time series data could reveal to us the underlying mechanism driving the processes which, in turn, give rise to the observables. If we already have a number of hypotheses about the mechanism, how do we know which one is more probable and what are the values of the parameters related to this hypothesis?

In this work, we have to propose candidate mechanisms, and apply statistical methods to see which mechanism would be in the best position to give the observations. We use statistical methods to achieve this.

In a seminal paper on the application of statistics in the natural sciences, R.A. Fisher divided the problems of statistics into three types [1]:

(1) Problems of Specification. These arise in the choice of the mathe-

(12)

matical form of the population (the term ‘population’ in this case refers to the distribution of the values of observables).

(2) Problems of Estimation. These involve the choice of methods of cal- culating from a sample statistical derivates, or as we shall call them statistics, which are designed to estimate the values of the parameters of the hypothetical population.

(3) Problems of Distribution. These include discussions of the distribution of statistics derived from samples, or in general any functions of quantities whose distribution is known.

The problem of generating candidate mechanisms or hypotheses is not done mathemat- ically; it relies on humans to suggest them. The problem of proposing mathematical forms for candidate mechanisms belongs to the first problem. Once the mathematical form has been determined, the values of the parameters are evaluated, and this is the problem of the second type, that of estimation. In this work, I focus mainly on problems of this type, and the data I aim to analyze is a time series.

A time series data with an internal structure can usually be described by a probability distribution function. For example, take a time series data of an ion channel which has an open and a closed state. In this case, the data are the time durations of the open state, or open time durations. Suppose an ion channel is closed at the beginning of the time series, and remains closed for the next second, then opens for 2 s, closes for 1 s and then opens for 3 s, we obtain two opening time durations, one of 2 s and the other of 3 s (Fig. 1.1). In my research, the time series is much longer and consists of a

(13)

large number of observations of open time durations. In the case of ion channels, it has been observed that the open time durations has a large variation. Its order ranges from several µs to several ms. However, for most ion channels, the probability of having a short open time duration is large, and it drops as the open time duration increases. It can be shown that the probability distribution of the open time durations follows an exponential decay.

Open

Closed

1S 1S

2S 3S

Figure 1.1: An example of time series data.

The analysis of ion channel data is, of course, not limited to quantifying the probability distribution of ion channel opening time duration. We could justifiably define the probability distribution of ion channel closed time durations. More sophisticated analysis would investigate the probability distribution of time duration of a sequential state of the channel being open and then closed. This is known as two-dimensional analysis, because if the data are plotted out, in addition to the time axis, there are two more axes, one for open time durations and the other for closed time durations.

Naturally, this kind of analysis can be extended to more complicated sequential states, leading to higher-dimensional analyses.

(14)

Chapter 2 Methods

When we have a candidate mechanism to account for a time series, we propose a probability distribution for these data. We then need to know how well the probability distribution generated from this candidate mechanism agrees with the observed probability distribution. In this process, we need to determine the numerical values for the parameters of the probability distribution.

2.1 Maximum Likelihood

The probability distribution we wish to know can be written as a probability density function (pdf) p of the observables, y₁, y₂, ..., y_n, with some parameters, θ₁, θ₂, ..., θ_m which describe this probability density function:

p(y₁, y₂, ..., y_n|θ₁, θ₂, ..., θ_m) ≡ p(ˆy|ˆθ) (2.1.1)

where ˆy is an n-dimensional vector representing one datum and ˆθ an m-dimensional vector representing a set of parameters. The line between ˆy and ˆθ means “ˆy given ˆθ”,

(15)

and denotes a conditional probability. In practice, however, we are given a number of data ˆy and we wish to find what the best values for the parameters ˆθ are.

So R.A. Fisher introduced a quantity named “likelihood” [1]:

The likelihood that any parameter (or set of parameters) should have any assigned value (or set of values) is proportional to the probability that if this were so, the totality of observations should be that observed.

Simply put, likelihood is a quantity to show how likely, in a relative sense, a set of parameters are responsible for a set of observed data. If the total number of data are k, and ˆyⁱ denotes the ith datum, we can define likelihood as:

L(ˆθ|{ˆyⁱ}) ≡ p({ˆyⁱ}|ˆθ)

= p(ˆy¹|ˆθ) × p(ˆy²|ˆθ) × ... × p(ˆy^k|ˆθ)

=

k

Y

i=1

p(ˆyⁱ|ˆθ) (2.1.2)

where {ˆyⁱ} denotes a set of data, i = 1, ..., k.

Given a set of observed data {ˆyⁱ}, and a postulated pdf p(ˆy|ˆθ), with undetermined parameters ˆθ, the most probable values of those parameters will maximize the likelihood L(ˆθ|{ˆyⁱ}). Hence we can find the most probable values of the parameters by maximizing its likelihood.

In practice, we often maximize the logarithm of the likelihood or log-likelihood, l(ˆθ|{ˆyⁱ}), instead of the likelihood because of the less computing time cost in doing addition than doing multiplication. The log-likelihood is defined as:

l(ˆθ|{ˆyⁱ}) ≡ log L(ˆθ|{ˆyⁱ}) =

k

X

i=1

log p(ˆyⁱ|ˆθ) (2.1.3)

(16)

2.2 Maximization of Functions

In this work the downhill simplex method [2, 3] was used to maximize the log-likelihood function.

The problem of maximization is actually equivalent to the problem of minimization.

By adding a negative sign, the maximum becomes minimum. To make the picture easier, minimization is discussed instead of maximization in this chapter.

If there are N undetermined parameters, θ1...θN, we can say they form an N - dimensional parameter space. Each point in the parameter space represents a set of parameters and in the log-likelihood case, there exist a corresponding log-likelihood with respect to that point. What we wish to find is a point with the maximum value of log-likelihood, or equivalently, the minimum value of negative log-likelihood.

The concept of the downhill simplex method is simple. Like rolling a ball down from a hill, this method ‘rolls’ a simplex in an N -dimensional parameter space, so that in the end the simplex will stop at a minimum. A simplex in an N -dimensional space is a geometric object with N + 1 points, or vertices. For example, a simplex in two dimensions is a triangle and in three dimensions a tetrahedron. To find the minimum value of the negative log-likelihood, we first give an initial simplex, or initial guess, in the N -dimensional parameter space, then let the simplex roll. For each computational step, a simplex will try to take one of the following four actions in order to make the highest (worst) point lower (better): (a) reflection, (b) reflection and expansion, (c) contraction, (d) multiple contraction. See Fig. 2.1 for the four possible actions in a two-dimensional space, in which case the simplex is a triangle, and Fig. 2.2 for the

(17)

algorithm flowchart.

The advantage of the downhill simplex method is that there is no need to evaluate the first derivatives of the function, only the values of the function itself is needed.

Therefore this method can be used regardless whether the function is differentiable. In situations where we do not know the ‘landscape’ of the function, a method which does not require the function to be differentiable is more suitable and ‘safer’.

Like many other minimization methods, the downhill simplex method also suffers from a local minimum problem. The ‘rolling’ simplex can be easily ‘trapped’ by a local minimum (Fig 2.3). Unfortunately there is no simple solution to the local minimum problem. We can only use some techniques to enhance our confidence in the minimum we found. For example, we can choose several different initial guesses and perform the minimization several times to see if the results are consistent, or we can draw a profile of log-likelihood with respect to each parameter to see if the minimum we found, at least in this particular scale and resolution, is really the global minimum.

2.3 Log Likelihood Ratio Test

In most cases, we cannot suggest a specific mechanism to account for the observed data.

Hence we do not have a complete pdf to describe the experimental data. All that we can do is to suggest that the hypothetical pdf has a certain mathematical form. But the pdf can be an addition of multiple components, say, an addition of three exponential terms. If we do not know how many components there are in the pdf, we do not know how many parameters we should use to evaluate the log-likelihood. Sometimes

(18)

(a)

(b)

(c)

(d)

high

low

reflection

reflection and expansion

contraction

multiple contraction

Figure 2.1: Four possible actions for a step in the downhill simplex method are shown.

At the beginning of the step, the initial simplex, in this case a triangle, is shown on top.

At the end of the step, the simplex can take any one of the four actions according to the log-likelihood change of the highest point, (a) a reflection away from the high point, (b) a reflection and expansion away from the high point, (c) an one-dimensional contraction from the high point, or (d) an overall contraction towards the low point.

(19)

Figure 2.2: The algorithm flowchart of the downhill simplex method. The terminating condition can be (1) whether the fractional change of the vector distance moved in this step is less than a pre-defined tolerance or (2) whether the fractional difference between the highest and the lowest log-likelihood is less than a pre-defined tolerance. The symbols (a) (b) (c) (d) denote the four actions.

(20)

A

B

C

θ¹ θ² θ³

− Log-likelihood

Parameter

Figure 2.3: A schematic plot of the log-likelihood function with only one parameter. Point B is the global minimum while points A and C are local minima. Three initial guesses are shown. When the initial guesses are θ1 and θ3, the searching will be trapped at A and C respectively. Only when the searching is starting from θ₂, the global minimum B can be found.

we can determine the number of components by visually inspecting the diagram of the experimental data, but most of the time it is not that simple. What we then need is a statistical approach: the log likelihood ratio test.

The log likelihood ratio test is based on the following theorem. Let l₁(ˆθ_n₁) and l₂(ˆθ_n₂) be two log-likelihood functions with n₁ and n₂ parameters, respectively, and n₂ > n₁. Define a quantity R, called the log likelihood ratio (logarithm of the ratio of the likelihoods)

R ≡ l₂− l₁ = log L₂− log L₁ = log L₂ L₁

. (2.3.1)

It can be shown [4] that:

If the most appropriate number of parameters is n1 and the number of data

(21)

is large, the quantity 2R will have a χ² distribution with n₂− n₁ degrees of freedom.

The χ² distribution is explained in the next paragraph. Note that R is defined only when L₂/L₁ > 1, i.e. R > 0. This is very reasonable since the more parameters we use to fit the data, the higher the likelihood can be. From the definition, we can see that R is a quantity describing the increase of the ‘goodness of fit’ when the number of parameters is increased.

Before proceeding further, the χ² distribution should be defined. A χ² distribution is actually a special case of Γ distribution with parameters α = ν/2 and β = 2. The cumulative distribution function (cdf) of a Γ distribution is

F_γ(x; α, β) = γ(α, x/β)

Γ(α) ≡ P (α, x/β) (2.3.2)

where Γ(α) is the gamma function, x ≥ 0 and α, β > 0

Γ(α) ≡ Z ∞

0

x^α−1e^−αdx , (2.3.3)

γ(α, x) is the incomplete gamma function ,

γ(α, x) ≡ Z x

0

x^0α−1e^−αdx⁰ (2.3.4)

and P (α, x) is the regularized incomplete gamma function

P (α, x) ≡ γ(α, x)

Γ(α) (2.3.5)

and ν, in the log likelihood case mentioned above, equals n₂ − n₁, which means ‘the degrees of freedom’. Therefore, by inserting the parameters α = ν/2 and β = 2 into

(22)

the cdf of a Γ distribution, Eq. (2.3.2), we have the cdf of a χ² distribution

F_χ²(χ²; ν) = P ν 2,χ²

2

(2.3.6)

By differentiating the cdf we obtain the pdf

p_χ²(χ²; ν) = (1/2)^k/2x^k/2−1e^−x/2

Γ(ν/2) (2.3.7)

Fig. 2.4 shows the pdf and cdf of a χ² distribution.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0 1 2 3 4 5 6 7 8

ν = 1 ν = 2 ν = 3 ν = 4 ν = 5

pdf

0 0.2 0.4 0.6 0.8 1

0 1 2 3 4 5 6 7 8

ν = 1 ν = 2 ν = 3 ν = 4 ν = 5

cdf Figure 2.4: χ² distribution

With the value 2R and its χ² distribution, how do we use them to decide the most appropriate number of parameters ? First we need to ask the question: what does it mean by ‘the most appropriate’ ? Since we know in general the fitting will be better and better if we use more and more parameters, it is natural to say that when the number of parameters reaches the most appropriate one, any further increase of the number of parameters will have a large probability to make only an insignificant increase in the likelihood and hence a small value of 2R. An insignificant increase means there is a large probability that this amount of increase is only a result of chance. The value of that probability can be obtained from another function Q(α, x), also confusingly

(23)

named the regularized incomplete gamma function, or the regularized complementary incomplete gamma function for discrimination. Q(α, x) is defined as:

Q(α, x) ≡ Γ(α, x)

Γ(α) = 1 − P (α, x) (2.3.8)

where Γ(α, x) is the (complementary) incomplete gamma function

Γ(α, x) ≡ Z ∞

x

x^α−1e^−αdx = Γ(α) − γ(α, x) (2.3.9)

We can write Q(α, x) in the χ² case as

Q(α, x) = Q ν 2,χ²

2

≡ Q(χ²|ν) (2.3.10)

This so called χ² probability function, Q(χ²|ν), is the probability that an increase of the ‘goodness of fit’, 2R, due to the increase of the number of parameters, ν, is only a result of chance. So when Q is small, it means that the increase of the number of parameters is necessary to give a better description of the data. Usually the value of Q is also called P value. See Fig. 2.5 for pictures of P and Q.

In practice, we often require the acceptance value of Q to be smaller than 0.05, 0.01 or even 0.001. For example, when we increase the number of parameters from four to five and obtain the log likelihood ratio 2R₁, we have Q(2R₁|1) = Q 1

2, R₁

≡ Q₁. We again increase the number of parameters from five to six and obtain Q₂. If we set our standard to be 0.05, and Q1 < 0.05 while Q2 > 0.05, this means that the most appropriate number of parameters is five. However, if Q₂ is still less than 0.05 when the number of parameters is six, then we shall need to increase the number of parameters further.

(24)

x

0

P(α,x)

∞ 0 x

Q(α,x)

∞

Figure 2.5: P (α, x) and Q(α, x) are actually integrations of the pdf of χ² distribution.

The integration range of P is from 0 to x and that of Q is from x to ∞.

2.4 Generation of Non-Uniform Random Numbers

In order to test the maximum likelihood method with some simulated data, we must generate some random numbers with a certain pre-defined pdf, which is more complex than a simple uniform pdf.

Continuous uniform random (actually pseudo-random) numbers can be easily obtained by using the intrinsic random number generators of many programming lan- guages. It is not that easy, however, to obtain non-uniform random numbers directly.

We have to use some techniques to generate non-uniform random numbers out of uniform random numbers. There are two common methods for doing so, the inverse transform method [3] and the rejection method [3, 5].

2.4.1 Inverse Transform Method

The inverse transform method is based on the proposition that, given an invertible cumulative distribution function (cdf) F (x) and a uniform random number U with

(25)

range (0, 1), the random number X ≡ F⁻¹(U ) has a cdf F (x). It can be proved as follows. Define the cdf of X as F_X(x):

F_X(x) ≡ Prob(X < x)

= Prob(F⁻¹(U ) < x)

= Prob(F (F⁻¹(U )) < F (x))

= Prob(U < F (x))

≡ F (x) (2.4.1)

at the third equality, we have used a property of cdf that it is a monotonically increasing function.

With the knowledge above, we can generate a random number X with a cdf F (x) by the following procedure:

Step 1: Find the inverse function F⁻¹(u) = x.

Step 2: Generate a uniform random number U .

Step 3: X ≡ F⁻¹(U ) is the random number we wish to find.

For example, to obtain a random number X with pdf

f (x) = 1

λe⁻^x^λ , (2.4.2)

we first integrate the pdf to obtain its cdf

F (x) = Z x

0

f (x⁰)dx⁰ = 1 − e⁻^x^λ (2.4.3)

and find the inverse function

F⁻¹(u) = −λ ln(1 − u) , (2.4.4)

(26)

so we obtain

X = −λ ln(1 − U )

= −λ ln(U ) . (2.4.5)

Since U is a uniform random number with range (0, 1), 1 − U is also uniform on (0, 1).

We can therefore equivalently use 1 − U to save computing time.

2.4.2 Rejection Method

This method is first proposed by John von Neumann in 1951 [5]. Suppose we already have a method to generate a random number with a pdf g(x), and we wish to generate a random number X with a pdf f (x). We can first generate a random number Y with a pdf g(x) and accept this Y with a probability proportional to f (Y )/g(Y ). The accepted Y , defined to be X, will then have a pdf f (x).

The protocol is as follows:

Step 0: Find a constant c (the lower the better) such that f (x)

g(x) ≤ c for all x.

Step 1: Generate Y with pdf g(x).

Step 2: Generate a uniform random number U .

Step 3: If U ≤ f (Y )

cg(Y ), X = Y , otherwise return to Step 1.

Note that c can never be less than 1. We can see this by integrating the inequality of Step 0, f (x) ≤ c g(x), over all values of x.

The validity of this protocol can be proved as follows. Define the pdf of X as p_X(x):

(27)

p_X(x)dx = Prob{X → x + dx}

= Prob{Y → x + dx | Acceptance}

= Prob{Y → x + dx, Acceptance}

Prob{Acceptance}

= Prob

Y → x + dx, U ≤ f (x) cg(x)

Prob{Acceptance}

=

Prob {Y → x + dx} Prob

U ≤ f (x) cg(x)

Prob{Acceptance} by independence

=

{p_Y(x)dx} f (x) cg(x)

1/c

= g(x)dxf (x) g(x)

= f (x)dx (2.4.6)

where

Prob(Acceptance) = R f (x)dx R cg(x)dx = 1

c (2.4.7)

See Fig. 2.6 for some specific pictures about this method.

In order to generate the target random numbers quickly, the number of the rejection events should be as few as possible. Since the more the rejection events, the longer the time we need to collect a required number of random numbers. Therefore the efficiency of the rejection method is proportional to the probability of acceptance, 1/c. Hence we should always try to make the constant c be as close to unity as possible.

Compared with the inverse transform method, the rejection method may be slower since it has to generate two random numbers (often several times) to get only one

(28)

f(x)

g(x)

x’ x

f(x) cg(x)

x’ x

(a) (b)

Δ

Figure 2.6: (a) f (x) is the desired pdf, g(x) the pdf we can already get, x⁰ the point that makes ^{f (x)}_g(x) maximum, i.e. ^{f (x)}_g(x) ≤ ^{f (x}_g(x0⁰)⁾ ≡ c. Note there is always at least one cross point between f (x) and g(x) because the areas of f (x) and g(x) are both normalized to unity. (b) The multiplier c makes the curve cg(x) cover all the area under f (x). The Probn

U ≤ _cg(x)^{f (x)}o is equal to the ratio of the dark gray area f (x)∆, to the whole gray area cg(x)∆.

desired random number. But the rejection method is more versatile than the inversion method since the target pdf of the latter has to be an invertible function while that of the former does not.

(29)

Chapter 3 Simulation and Analysis

In this chapter, a demonstration is presented, in which the maximum likelihood method and the log likelihood ratio test were used to analyze the simulated time series data whose pdf was known.

3.1 Simulated Data

The simulated data used have a pdf with five exponential components. The pdf p(t) is:

p(t) =

5

X

i=1

a_ip_i(t) =

5

X

i=1

a_i 1 τ_ie⁻^τi^t

(3.1.1)

where a_i is the weight of ith component with a conditionP5

i=1a_i = 1, and τ_i is the ith characteristic time constant. The total number of independent parameters are therefore 5 × 2 − 1 = 9 with one dependent parameter a5 = 1 −P4

i=1ai.

The procedures of the generation of the simulated data are as follows. First, one of those five exponential components is chosen with a probability proportional to its weight, ai. The inverse transform method is then used to generate a random number

(30)

obeying the exponential pdf just chosen, _τ¹

ie⁻^τi^t . This whole process of choosing an exponential component and then generating a random number obeying that exponential pdf is repeated until the required number of random numbers has been generated.

These procedures, of course, can be easily extended to generate random numbers whose pdf have arbitrary number of exponential components. In this work the uniform random numbers, which were needed by the inverse transform method, were generated by the intrinsic subroutines, RANDOM SEED() and RANDOM NUMBER(), of Intel R

Fortran compiler.

The parameters of the simulated data used are listed in Table 3.1, and the histogram of the data is shown in Fig. 3.1a.

number of data points: 100000 a₁ = 0.023 τ₁ = 0.002 a₂ = 0.068 τ₂ = 0.04 a₃ = 0.227 τ₃ = 0.5 a₄ = 0.455 τ₄ = 0.8 a5 = 0.227 τ5 = 5

Table 3.1: The parameters of the simulated data are listed here. The values of the weights a_i’s are actually assigned in a relative manner, which before normalization and rounding are 0.1, 0.3, 1, 2, 1, respectively.

3.2 Logarithmic Histograms

Histograms with linear axes such as Fig. 3.1a are actually not so informative. Another more illuminative way to plot the histograms with a logarithmic time axis was first

(31)

(a) (b)

Figure 3.1: Two histograms of the same simulated data with different binning scales are shown. (a) Linear histogram. Events are binned with a constant bin width in linear scale.

(b) Logarithmic histogram. Events are binned with a constant bin width in logarithmic scale and the histogram is plotted with a logarithmic x-axis and a square-root y-axis. The pdf and the corresponding parameters of the simulated data are shown in Eq. (3.1.1) and Table 3.1, respectively.

proposed by Blatz and Magleby [6] and developed further by Sigworth and Sine [7].

In this work Sigworth and Sine’s method is used to plot the logarithmic histograms.

With this method, histograms are plotted with bins which have a constant width in the logarithmic scale. Note that this is not the same as simply doing a logarithmic transformation to the time axis of a linear histogram, because the transformation is done before the events are binned.

Here is the explanation of this method. Considering a pdf with n components:

p(t) =

n

X

i=1

a_i τi

e⁻^τi^t (3.2.1)

where the symbols are defined in the same way as Eq. (3.1.1). A logarithmic transformation to the time axis is:

x = log₁₀(t) (3.2.2)

Here base 10 was used for consistency. First we note that if t < t⁰, log₁₀(t) < log₁₀(t⁰).

(32)

Therefore the cdf becomes

F (t) ≡ Prob{t < t⁰}

= Prob{log₁₀(t) < log₁₀(t⁰)}

= Prob{x < x⁰)}

≡ F_log(x) (3.2.3)

where F_log(x) is the cdf with a logarithmic time axis. In other words, we found that the cdf’s of t and x are the same. Finally, the pdf of x, p_log(x) can be obtained by differentiating the cdf Flog(x) with respect to x:

p_log(x) ≡ dF_log(x) dx

= dF (t) d log₁₀(t)

= dt

d log₁₀(t) dF (t)

dt

=

n

X

i=1

t ln(10) p(t)

=

n

X

i=1

a_iln(10) τi

exp

ln t − t τi

=

n

X

i=1

a_iln 10 τ_i exp

ln 10^x− 10^x τ_i

(3.2.4)

The shape of p_log(x) when n = 1 is like a skewed-bell. If n > 1 the shape will be a superposition of each component’s shape (Fig 3.1b). The advantage of this kind of histograms is that the peak of each component’s shape indicates the corresponding characteristic time constant of that component. This can be seen by differentiating p_log(x) with respect to x and equating it to zero.

In addition to the transformation of the time axis, Sigworth and Sine also changed

(33)

the ordinate of the histograms from the linear scale to the square-root scale. It is done by simply taking the square-root of the events number after they are binned. This change of the ordinate makes the standard deviation for each bin equal. [7]

3.3 Data Analysis

After generating 100000 simulated time series data, the test analysis was performed without using the pdf as part of the input. It was tested how the proposed method could recover the underlying patterns of the simulated time series data.

Initially I assessed the logarithmic histogram of the simulated data to give a rough solution (Solution 1). The other three solutions (Solutions 2, 3 and 4) were obtained by simply adding more parameters with some deviation from the existing parameters.

Using these four rough solutions as the initial guesses of the maximum likelihood, four optimized solutions with respect to each number of components were obtained. The results are shown in Fig. 3.2 and Table 3.3. From Fig. 3.2 we see that the data were not well fitted with three components, but they seemed to be better fitted with four, five and six components. So which one was the most appropriate choice for describing the simulated data? With the help of the log likelihood ratio test, we can see from Table 3.2 that the most appropriate number of components should be five. When the number of components increased from five to six, the fit slightly improved. However, the probability of this improvement being due to chance became much larger. This means that the increase of the number of components from five to six was unnecessary.

In contrast, the other two increases (from three to four, and then from four to five) were

(34)

necessary because the probability of the improvement of the fit being due to chance was negligibly small. Hence this demonstration also showed the advantage of using the log likelihood ratio test: even if all the fitting histograms seem very good, the log likelihood ratio test can still give us quantitative indexes to help us make judgement.

It is found that the most appropriate number of components is five. The optimized values of the corresponding parameters are listed in Table 3.3, Solution 3. We can compare the original parameters with the parameters determined from the simulated data. In the following discussion, we define the parameter values we originally put into the simulation to be the ‘original values’, and we define the parameter values obtained from the simulated data using the maximum likelihood method the ‘recovered values’.

It can be seen that, for the first, second and fifth components, the difference between the original values and the recovered values are small (<5% of each other). However, for the third and fourth components, the difference between the original and recovered values can be larger than 20%. Note also that the characteristic time constants for the first, second and fifth components are very different from each other; they are orders of magnitude apart. The characteristic time constants for the third and fourth components are more or less the same. This makes it difficult to separate the third and fourth components. This is a general problem of data analysis, and is not an indication of any specific weakness of this method.

The other difficulty in using this method arises when the number of parameters is large, say 15 or 20. Under these conditions, the maximization of the log-likelihood will often become unstable. In other words, different initial guesses will easily give wildly different maximization points. However, the problem is not in the maximum

(35)

likelihood method itself, it is in the maximization. The maximization can be trapped in a local maximum. Future work includes investigating other maximization methods to see if they will make the maximum likelihood method more stable. The Monte Carlo method, for example, allows the searching to jump out of the local maximum and hence can avoid the local maximum problem.

components R P value 3 → 4 647.934 4.034 × 10⁻²⁸² 4 → 5 18.649 7.959 × 10⁻⁹ 5 → 6 0.150 8.603 × 10⁻¹

Table 3.2: The results of the log likelihood ratio test. The first column is the change of the number of components, the second column the log likelihood ratio and the third column the value of the χ² probability function Q(χ² = 2R|ν = 2). Note that the degrees of freedom ν are the number of the increased independent parameters, which are 2 for all three cases here.

(36)

Simulated Data

Solution 1

3 components Solution 2

4 components

Solution 3

5 components Solution 4

6 components

Figure 3.2: The logarithmic histogram of the simulated data is shown on top. The other four histograms are obtained by the maximum likelihood method with different solutions of number of components. In these five histograms, the red step-lines all denote the distribution of the events and the superimposed blue solid lines are the sum of the blue dashed lines.

In the top histogram the blue dashed lines denotes the five pdf components of the simulated data, while in the other four histograms the blue dashed lines are the pdf components obtained by the maximum likelihood method.

(37)

Simulateddata a1=0.023τ1=0.002a2=0.068τ2=0.04a3=0.227τ3=0.5a4=0.455τ4=0.8a∗5=0.227τ5=5

Solution1:3components

initialguessresultsa1=0.167τ1=0.01a1=0.063τ1=0.011a2=0.5τ2=1a2=0.692τ2=0.633a ∗3=0.333τ3=10a ∗3=0.245τ3=4.7

Log-likelihood=−118321.732 Solution2:4components

initialguessresultsa1=0.133τ1=0.01a1=0.024τ1=0.002a2=0.2τ2=0.1a2=0.074τ2=0.045a3=0.4τ3=1a3=0.671τ3=0.689a ∗4=0.267τ4=10a ∗4=0.231τ4=4.87

Log-likelihood=−117673.798

Solution3:5components

initialguessresultsa1=0.118τ1=0.01a1=0.024τ1=0.002a2=0.176τ2=0.1a2=0.065τ2=0.039a3=0.118τ3=0.2a3=0.285τ3=0.495a4=0.353τ4=1a4=0.408τ4=0.867a∗5=0.235τ5=10a∗5=0.218τ5=5.033

Log-likelihood=−117655.149 Solution4:6components

initialguessresultsa1=0.105τ1=0.01a1=0.024τ1=0.002a2=0.158τ2=0.1a2=0.065τ2=0.039a3=0.105τ3=0.2a3=0.300τ3=0.504a4=0.105τ4=1a4=0.392τ4=0.872a5=0.316τ5=5a5=0.216τ5=4.982a∗6=0.211τ6=10a∗6=0.003τ6=7.833

Log-likelihood=−117654.999

Table3.3:Theparametersofthesimulateddataareshowninthetoptableforcomparison.Theotherfourtablescontaintheinitialguessesandthecorrespondingmaximumlikelihoodresults. ∗Thesean’sarenotfreeparametersandareobtainedbytheequation:an=1− Pn−1i=1ai.

(38)

Chapter 4 Conclusion

In this work, the maximum likelihood method is described, which is a powerful tool for time series analysis. It can provide the user with numerical results to describe the observed events. However, experiments are still required to define the actual physical processes responsible for these events. Unfortunately, limitation of time does not allow me to apply this method on real experimental data. Future work would include using this method on real data to further study its performance.

(39)

(40)

Part II

Molecular Dynamics Simulations of a General Anesthetic inside a

Membrane

(41)

Chapter 5 Introduction

General anesthesia (GA) is a state of complete unconsciousness caused by drugs. It was first publicly demonstrated by Morton in Massachusetts General Hospital in 1846. The first general anesthetics used were chloroform and ether. However, the former was quite toxic and the latter was highly inflammable. Later, safer alternatives such as isoflurane and enflurane were developed, and they remain some of the most commonly used drugs in clinical medicine. Unlike regional or local anesthesia, the exact mechanism of GA is still an unsolved problem.

At the turn of the 19th and 20th centuries, Meyer and Overton independently proposed a hypothesis which described the correlation between the efficacy of an anesthetic and its solubility in lipids. This hypothesis, which states that the logarithm of the efficacy of a general anesthetic was proportional to the logarithm of its lipophilicity, was subsequently called the Meyer-Overton rule. This rule suggests a lipid-rich region of the body was involved in GA.

There was little progress in elucidating the mechanism of action of general anes-

(42)

thetics until the middle of the 20th century. Johnson and Flagler [8] placed tadpoles in a container and administered ethanol to them. The animals became anesthetized and fell to the bottom of the container at atmospheric pressure. When the pressure was raised to over 100 atm, they observed that GA was reversed, and the tadpoles started to swim. This was repeated on a number of living organisms and on a large variety of general anesthetics [9, 10, 11]. The only report of pressure reversal in humans appeared in 1979 [12].

Pressure reversal is not a universal phenomenon, as species variation has been observed [13]. For example, Paton and his co-workers discovered that in the common frog, Rana temporaria, in the presence of general anesthetics, activity increased as pressure increased. This is what we would expect from pressure reversal. However, in the fresh- water shrimp, Gammarus pulex, increased pressure reduced the swimming activity in the presence of general anesthetics.

J.R. Trudell et al. spin-labelled phosphatidylcholine, mixed them with water and organic solvents, sonicated the mixture and produced a translucent vesicle suspension.

Halothane was added to this suspension at concentrations of 49mmol, 147mmol or 490mmol per mole of lipid, whilst the concentration of methoxyfluorane was 58mmol, 174mmol or 580mmol per mole of lipid. Electron spin resonance (ESR) spectra were measured, and they showed that anisotropic motion of phosphatidylcholine within the phospholipid bilayer was increased. There was a concomitant decrease of the order parameter S⁰_n as the concentration of anesthetic increased [14]. On application of pressure up to 274 atm by increasing helium, a non-anesthetic gas at these pressures, in the container, these changes were reversed: S_n⁰ increased and the spectra shifted

(43)

back [15]. The mechanism of this change, however, remained unclear, but the most likely cause seemed to be restriction of motion caused by phospholipid molecules coming closer together.

The results of these experiments lead to two conclusions: the cell membrane is prob- ably involved in general anesthetic action, and ambient pressure affects general anesthetic action. To clarify the effect of pressure on the distribution of general anesthetics in the cell membrane, many scientists have performed experiments and simulations to determine the location of general anesthetics in the membrane.

K. Tu et al. [16] performed a 1.6-ns simulation of 64 DPPC molecules hydrated in 1792 water molecules, with 4 halothane molecules in this system. After equilibration, they found that halothane was preferentially located about 10 ˚A from the center of the membrane. The halothane molecules exhibited no orientational preference. Koubi et al. [17] performed a 2-ns simulation of 64 DPPC molecules hydrated in 1792 water molecules, but with 32 halothane molecules. After equilibration, they found that halothane was preferentially located about 10 ˚A from the center of the membrane. How- ever, the concentration of halothane used in these studies were several times in excess of the concentration of halothane used in clinical work.

Subsequently, Pickholz et al. [18] developed a coarse-grained model, and applied it to simulate hydrated DOPC at different halothane concentrations. Their results also showed that halothane was always preferentially located at about 10 ˚A from the center of the membrane. They also evaluated the potential of mean force of extracting a halothane from its equilibrium position into the solution. They found that this was largest when the halothane was just below the headgroup of the phospholipid.

(44)

However, none of them have explored the effect of pressure. In this work, we performed a series of molecular dynamics simulations on a model membrane, DMPC, and a model general anesthetic, halothane. The chemical structure of these two molecules are shown in Fig. 5.1(a) and Fig. 5.1(b). We evaluated the free energy change of inserting halothane into a membrane of fully hydrated DMPC at different depths, and at different pressures. The following diagrams show the molecules used in our study:

F

C

Br

Cl C F

F

H

(a) Halothane

O O

P N⁺

-

(b) Dimyristoylphosphatidylcholine

Figure 5.1: (a) Halothane, CF₃CHClBr, IUPAC name 1,1,1-trifluoro-2-chloro-2-bromo- ethane, is a nonflammable, halogenated, hydrocarbon anesthetic [19]. It is in liquid state and volatile at room temperature. (b) Dimyristoylphosphatidylcholine, a phospholipid which consists of polar head and two non-polar tails.

(45)

Chapter 6 Methods

Methods used in this work are explained in this chapter. We used the molecular dynamics (MD) simulation method to generate samples of the configurations of the solute (halothane) and solvent (hydrated DMPC bilayer) system. Then we apply the method of energy representation of solution theory (ERnST) to calculate the solvation free energy of halothane into the lipid bilayer or in bulk water. To carry out MD simulations, we used the program DL POLY version 2.15, developed by the Daresbury Laboratory, UK [20]. In the following sections, the methods of molecular dynamics (MD) simulations employed in the DL POLY program is firstly introduced. These include the Verlet integration scheme, the empirical force fields, the Nos´e-Hoover thermostats and barostats, and the Ewald summation method [21]. Although the Monte Carlo method is not used in this work, it is still introduced together with the MD simulations for completeness. Then, the introduction to the ERnST follows.

(46)

6.1 Molecular Dynamics Simulations

What is a simulation? In a simulation, the researcher invokes a system, and allows components of the system to interact according to defined rules. These days, all the required calculations are usually performed by a computer. By making these rules similar to how a real system would behave, the researcher can collect useful data on these artificial systems, and use them to gain insight into the real system under study. The results of the simulation can be validated by evaluating macroscopic parameters from the simulation, and comparing their values from experiments. Commonly used parameters including free energy changes, molecular diffusion coefficients and re-orientational coefficients.

There are many atomistic simulations methods, but they come under two main categories: molecular dynamics and Monte Carlo. Both methods are the same in that a large number of molecular configurations are generated, but the procedures of moving from one configuration to another are different.

In molecular dynamics, the position and velocity of each molecule are noted, and the force on each evaluated. Each step consists of using this information to calculate the position, velocity and force on each atom a very short time interval away, typically of the order of femtoseconds. This is repeated many many times, and one builds up a time series of molecular configurations. The whole trajectory describes the evolution of the molecular system in time. Observing these configurations would be similar to watching a film depicting molecular motion. The advantage of molecular dynamics is that one observes how a system changes in time, and one can also obtain velocity and

(47)

force data from the simulation. However, if the model used is such that the velocity and the force cannot be calculated, then molecular dynamics would fail. Monte Carlo becomes then the only method possible.

In Monte Carlo simulations, a random process determines how a configuration is changed to another. The advantage of Monte Carlo is that, under certain circumstances, it samples more configurations than molecular dynamics, so it is useful for exploring, e.g., the possible conformers of a molecule. Neither velocity nor force calculations are required, so many more models can be applied to this method than molecular dynamics.

One usually assumes that the system is ergodic. Hence, it does not matter what the starting position is, because one will always obtain the same result for the ensemble averages if a Monte Carlo simulation is performed for a sufficiently large number of steps.

Otherwise the main ingredients of the two methods are essentially the same. Both require a model describing how these atoms interact, sometimes called a potential or a force-field. Both require a method of advancing from one configuration to the other.

And, last but not least, both require good analysis methods to analyze the collection of configurations produced. Since performing a simulation and obtaining molecular configurations have become much less difficult with the rapid increase of computer program packages, the challenge is then to perform intelligent analyses by optimally using the collection of configurations (or, in the case of molecular dynamics, the trajectory), and thus to obtain useful scientific understanding.

The general use of simulations should not be simply to reproduce experimental findings. After all, if an observable can be obtained by experiments, there is little point

(48)

in obtaining that information using a more roundabout and potentially less reliable way.

Simulations are best used to perform ‘impossible’ experiments or to gain insight into a system. For example, if one wants to study the effect of size on the hydration of non- polar solutes, one could, of course, place inert gases of different sizes in water, and study their hydration pattern. However, by replacing one molecule with another, one is not merely changing the size, but also altering the electronic properties. Simulations allow us to circumvent this problem by allowing us to invent spheres whose only difference is the size, with everything else kept identical [22]. Another example is to study the solvation properties of simple solutes with different electric charges. Simulations allow scientists to keep other properties constant, whilst the charge and size of the particles are considered as dynamic variables using an extended Lagrangian [23]. The selectivity of micropores for ions of different sizes has been used as models to study the selectivity of ion channels [24]. Thus simulations can be used to isolate factors contributing to an effect, and give researchers a better understanding.

Simulations can be used to investigate events and mechanisms inaccessible by experiments. However, in this sort of simulations, there are still certain quantities which are measurable by experiments. The simulation values of those quantities should be compared with the experimental values. If they agree, there is a higher probability that the simulation potentials are correct. The scientist can then use the system to generate data which can only be obtained by simulations.

(49)

6.1.1 Development of the Monte Carlo method

The first simulation method invented was the Monte Carlo method. It was developed by Stanislaw Ulam and Nicholas Metropolis [25]. Ulam was convalescing from an illness when he started to ponder about the probabilities of winning a game of solitaire. He realized that, instead of going through the combinatorics, he could just lay out the cards many many times and observe the outcome. This was also the era when the first computers were being used, so he discussed the idea with John von Neumann and started planning actual calculations.

This work was expanded a few years later to calculate the equation of state of N hard discs in two dimensions. Metropolis et al. [26] then applied their own method to a system of 224 idential discs in a square with periodic boundary conditions, and calculated the area occupied by the discs. Periodic boundary conditions [27] mean that the squares are laid out in a ‘container’, so that a particle leaving the ‘container’ on the left side will re-enter it on the right side. The effect is to create an infinite space using a finite number of particles. Note that in the evaluation of interaction between atoms, there is a maximum distance, rcut. Beyond the distance rcut, even though two atoms could interact, their interaction is considered zero, to simplify calculations.

The work of Metropolis et al. [26] was extended by Wood and Parker [28] a few years later, in a paper which studied the equation of state of Lennard-Jones molecules in three dimensions. These are spherical molecules which interacted with each other according to the relation:

U_LJ(r) =

"

r r_o

⁻¹²

− 2 r r_o

⁻⁶#

, (6.1.1)

(50)

where is a constant, r_o the radius of the Lennard-Jones potential well, and r the distance between the two atoms. They chose a set of Lennard-Jones parameters to model argon, and set the temperature at 55^◦C. Using 32 or 108 molecules in a cubic box with periodic boundary conditions, they performed the simulation 31 times, each lasting from 27000 to 261000 configurations, at different densities. The authors discovered that their simulation results agreed well with some experimental data but not with others. They also observed a phase transition, but not at the pressure that it would occur under experimental conditions. This work shows that even a simple simulation system is capable of reproducing features of an experimental system qualitatively, and sometimes even quantitatively.

Wood and Parker [28] also carefully considered the problem of using periodic boundary conditions. They chose a truncation scheme called the minimum-image distance method, namely any pairwise interactions among the fundamental set of N molecules are included, but for each pair, only the smallest distance between any images of the two molecules are taken. This method of minimum-image distance was subsequently adopted in almost all simulations using periodic boundary conditions.

Wood and Jacobson [29] subsequently repeated some of the Monte Carlo simulations with longer chain length, and compared their results with the first molecular dynamics simulation of the same system, the work of Alder and Wainwright [30]. They found good agreement between the two methods.