Failure in Langevin approximation to Gillespie algorithm in single gene

Chapter 4 The Two-step of Single Gene Expression

4.2 Failure in Langevin approximation to Gillespie algorithm in single gene

In order to dissect the noise out of this stochastic process in single gene expression, we rewrite the scheme in Eq. (4.4) into Langevin form. According to Eq. (3.37), the corresponding Langevin form is

)].

For particle number of mRNA n_m, the second term inside the bracket describe the average increment dtc₁ and the corresponding fluctuation c₁dtN₁(0,1) for reaction channel R , which is the constant production from DNA to mRNA (Eq. (4.5)). ₁ Similarly, the third term inside the bracket describes the mRNA degradation channel

R .2 N₁(0,1) and N₂(0,1) are mutually independent unit normal random variable.

Similarly, for particle number of protein n , the second term describe the translation of _p mRNA into protein, and the third term describe protein degradation. Since dt must satisfies the two conditions discussed in section 3.3.2 to make Eq. (4.9) a good approximation, and it will be valid especially for the system with large number of molecules. Unfortunately, in real biological system, the copy number of mRNA is usually too low to make a good approximation. In the following figures, we simulate the two-step gene expression model with three different sizes of dt, using the same

Figure 4-5 Approximating Gillespie algorithm with different size of dt in Langevin.

(M denotes “mean” and F denotes “Fano factor,” respectively). Parameters used in simulation are listed in Table 4-1 of section 4.1.

0 2 4 6

Steady state distribution (mRNA)

Particle number (mRNA)

Occurrence

Steady state distribution (protein)

Particle number (protein)

Occurrence

Steady state distribution (mRNA)

Particle number (mRNA)

Occurrence

Steady state distribution (protein)

Particle number (protein)

Occurrence

Steady state distribution (mRNA)

Particle number (mRNA)

Occurrence

Steady state distribution (protein)

Particle number (protein)

Occurrence

Figure 4-6 Langevin trajectory for mRNA (green) with different size of dt,

compared to the stochastic trajectory of Gillespie (red) and deterministic ODE solution (dashed line). Parameters used in simulation are listed in Table 4-1 of section 4.1.

0 0.5 1 1.5 2

x 10⁴ 0

1 2 3 4

Sample trajectory (mRNA)

time(sec)

particle number (mRNA)

Langevin (dt = 1 sec) Gillespie

deterministic ODE

0 0.5 1 1.5 2

x 10⁴ 0

1 2 3 4

Sample trajectory (mRNA)

time(sec)

particle number (mRNA) Langevin (dt = 10 sec)

Gillespie

deterministic ODE

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 10⁴ 0

0.5 1 1.5 2 2.5 3

Sample trajectory (mRNA)

time(sec)

particle number (mRNA)

Langevin (dt = 100 sec) Gillespie

deterministic ODE

Figure 4-7 Langevin trajectories for protein with different size of dt, compared to the trajectory of Gillespie algorithm and deterministic ODE solution. Parameters used in simulation are listed in Table 4-1 of section 4.1.

The results of Figure 4-5 and Figure 4-6 show that most of errors result from the failure in approximating Poisson random variables with normal random variables at mRNA level. At steady state, the copy number of mRNA is small and most of cell contains no mRNA (0.2 at steady state of deterministic ODE as shown in Eq. (4.3)).

Under this situation, using a continuous normal random variable to approximate these rare events is obviously unwise. While stochasticity in Langevin is generated by the normal distributed noise term which takes value from a continuous real number space (Eq. (3.37)), the Gillespie algorithm is a discrete particle number based simulation.

Hence, it is inevitable that when Gillespie algorithm takes zero value but the Langevin is a small nonzero real number. As shown in Figure 4-6. This invalid approximation for

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 10⁴ 0

50 100 150 200 250 300

Sample trajectory (protein)

time(sec)

particle number (protein)

Langevin (dt = 1 sec) Langevin (dt = 10 sec) Langevin (dt = 100 sec) Gillespie

deterministic ODE

The error in mRNA level will pass to downstream protein level, which leads to an invalid approximation of protein distribution while the discreteness in protein level seems to be fine when approximated with continuous normal random variables (Figure 4-7). This is due to higher quantity in protein molecules (100 at steady state of deterministic ODE as shown in Eq. (4.3)). As we have discussed in section 3.3.2, higher particle number can lead to higher propensity function, which makes it easier to find a proper dt that is large enough for a Poisson random variable to be approximated by a normal random variable (Eq. (3.33)).

Although the approximation could be better if we raise the steady state level of mRNA, but it will be kind of contradictory to the simulation of noise behavior in biology since the noise is less important in a higher copy number system, not to mention mRNA copy number is actually very low in nature. Another way to solve the problem of discreteness in a rare event case is using a larger dt to compensate a small propensity function, but a large dt will be susceptible to the change of system state, and it will be against the leaping condition (Table 3-3), which requires dt to be small enough to make every reaction occurs within dt not to change the propensity function significantly. In order to inspect the combinational effect of higher copy number and larger dt , we scan for the parameters set with different sizes of k_m and different sizes of dt in corresponding Langevin equation while fixing the other parameters. For each combination (k_m, dt), the steady state values of 10,000 Langevin trajectories are collected and the statistics are computed to give the mean and Fano factors. Error rate is

defined as 100% 1

result analytical

result simulated

−

×



 



 . The result is demonstrated in the following

figures:

Figur n (upper pan )). The error

ctories as co

The errors o nation of pa nels) and in r are the diff ompared to

dt (sec)

of Langevin arameters (k n Fano facto fference betw or (lower pa

ween the sta results at ste

m (M/sec) ror in mean

6 8

km (M/sec) r in Fano facto

6 8

as compare ed for scann anels) for tra atistic value eady state (E

10 12

x 10^-3

10 12

x 10^-3

ed to analyti ning. Shown aditional La es of protein

Eq. (4.8) in ns from 10,0 n section 4.1

in q.

000 ).

The scanning result of Figure 4-8 shows different error levels under different )

(k_m dt . We found that although there exists some “safe region” (yellow strip) in which the error of Fano factor is around 0%, this Langevin form is still inapplicable if we consider a model of gene regulation where the k_m of a downstream gene is regulated and the value changes as time progresses. This means if we want to simulate a model of gene regulation with Langevin equation, the time step dt needs to be “reset”

each time the system proceed a unit time of dt in order to keep a well approximation to Gillespie algorithm. Furthermore, even if the Fano factor is right, it might be the consequence of wrong mean and average. For instance, if we take a look at

) 250

, M/s 10 6

(k_m = × ^-3 dt= s in yellow region, the actual Langevin simulation result is the following

Figure 4-9 Approximating Gillespie algorithm with (km = 0.006, dt = 250) in Langevin equation. (M denotes “mean” and F denotes “Fano factor,” respectively)

Since even k_m increase to 6×10^-3M/s compared to original 2×10^-3 M/s in Table 4-1, the analytical steady state copy number of mRNA is just 0.6 and it is still a

0 2 4 6 8

0 1000 2000 3000 4000 5000 6000

Steady state distribution (mRNA)

Particle number

Occurrence

[Gillespie]

M = 0.607 F = 0.9977 [Langevin]

(dt = 250 sec) M = 1.021 F = 1.428

0 500 1000 1500

0 500 1000 1500 2000

Steady state distribution (protein)

Particle number

Occurrence

[Gillespie]

M = 298.4 F = 20.12 [Langevin]

(dt = 250 sec) M = 508.9 F = 19.33

rare event case. Nevertheless, it is still possible to apply this Langevin approximation on different parameter set. According to single molecular level counting of mRNA in living cell [29], the number of mRNA molecular of MS2 coat protein in E. coli can be 5~10 which is 25~50 times of original parameters in Table 4-1 at steady state . In this case, a proper dt can be found for Langevin simulation. The next figure shows a valid approximation with parameters (km = 0.08, dt = 50), which is the case for high expression rate.

Figure 4-10 Particle number distribution in mRNA (left) and Protein (right) at a higher expression rate with parameters (km = 0.08, dt = 50), the figure shows that Langevin approximation can be valid in a higher expression case.

In general, while different gene has different level of steady state mRNA molecules, the copy number of mRNA is still low for most cases. Besides, each gene has “ON” and

“OFF” states. For gene in “OFF” state, the expression level will definitely lower than ON state hence has much fewer mRNA molecules. In simulation aspects, low copy number of mRNA is the main reason that Langevin equation fail to approximate Gillespie algorithm, and it is not practical to use an approach that we are not sure when it is valid. Hence, we need to find another approach to dissect the noise out of Gillespie

0 5 10 15 20 25

0 200 400 600 800 1000 1200 1400

Steady state distribution (mRNA)

Particle number

Occurrence

[Gillespie]

M = 7.981 F = 0.9973 [Langevin]

(dt = 50 sec) M = 7.984 F = 1.315

25000 3000 3500 4000 4500 5000 5500 200

400 600 800 1000 1200 1400 1600 1800

Steady state distribution (protein)

Particle number

Occurrence

[Gillespie]

M = 4000 F = 20.04 [Langevin]

(dt = 50 sec) M = 3997 F = 20.58

Chapter 5 The Burst Production Model of Single

在文檔中朗之萬方程式在蛋白質瞬間增量模型之應用 (頁 47-55)