Identification of yeast cell cycle
Transcription Factors (TFs) using
dynamic system model
Wei-Sheng Wu (吳謂勝)
Lab. of Computational Systems Biology Dept. of Electrical Engineering
National Cheng Kung University Taiwan
2010/10/07
Outlines
• Biological background
• Methods
• Results
• Conclusions
About yeast
• Yeast is a uni-cellular organism.
• Cell size: 3–4 µm in diameter
• Yeast has been extensively studied by biologists, so many kinds of experimental data are available for us to solve biological problems in computational way.
• This work is try to identify yeast cell cycle TFs using computational analyses.
Yeast cell cycle process
The cell cycle is the series of events that takes place in a cell leading to its division and duplication.
Proteins involved in yeast cell cycle
process (so-called cell cycle proteins)
Regulation of Protein Synthesis
Protein Transcription
Translation
Transcriptional Regulation
Regulatory region Coding region
Gene
mRNA Transcription Factors (TFs)
(activator or repressor)
TF binding sites
Gene expression
2[ ] x t
1[ ] x t
Information processing device
Transcription
1 2
[ ] ( [ ], [ ]) y t = f x t x t
f
Complexity of studying transcriptional
regulation in yeast
• Yeast has ~200 TFs and ~6000 genes
• To understand the transcriptional regulation in yeast, we need to know 1. For each gene, which TFs would
regulate its transcription? (usually
<10)
2. For each TF, which genes are regulated by it? (one~several hundreds)
This work aims to identify the TFs that regulate the
transcription of those genes that produce cell cycle proteins These TFs are called cell cycle TFs
How to identify the TFs that regulate the
transcription of a target gene
Transcription of a target gene
~200 TFs in yeast
mRNA of the target gene
Step 1: Identify the TFs that bind the target gene
(usually <10) among all possible TFs (~200) in yeast
Step 1
?
~200 TFs
1 200
[ ] ( [ ], , [ )] y t = f x t … x t
1 , 2 3
[ ] ( [ ] [ ], [ ]) y t = f x t x t x t [ ]
y t
1[ ] x t
2[ ] x t
3[ ] x t
[ ] y t
Reduce complexity
ChIP-chip experiment
ChIP-chip data
ij
1:
m =
6000 200
0 1
1
1 0
ij x
M m
⎡ ⎤
⎢ ⎥
⎢ ⎥
= ⎡ ⎤ ⎣ ⎦ = ⎢ ⎥
⎢ ⎥
⎣ ⎦
j-th TF can bind i-th gene
6000 genes 200 TFs
Transcripton factor binding site
(TFBS) analysis
TFBS data
ij
1:
m =
6000 200
0 1
1
1 0
ij x
M m
⎡ ⎤
⎢ ⎥
⎢ ⎥
= ⎡ ⎤ ⎣ ⎦ = ⎢ ⎥
⎢ ⎥
⎣ ⎦
j-th TF can bind i-th gene
6000 genes 200 TFs
TF mutant experiments
Have effect?
TF mutant data
ij
1:
m =
6000 200
0 1
1
1 0
ij x
M m
⎡ ⎤
⎢ ⎥
⎢ ⎥
= ⎡ ⎤ ⎣ ⎦ = ⎢ ⎥
⎢ ⎥
⎣ ⎦
j-th TF can bind i-th gene
6000 genes 200 TFs
i-th row j-th column
Step 1: Construct TF-gene binding matrix
ij 1:
b =
6000 200
0 1
1
1 0
ij x
B b
⎡ ⎤
⎢ ⎥
⎢ ⎥
⎡ ⎤
= ⎣ ⎦ = ⎢ ⎥
⎢ ⎥
⎣ ⎦
i-th row j-th column
j-th TF can bind i-th gene
Step 1
• Step 2: Identify the TFs that regulate the target gene among the TFs that bind the target gene
Transcription of a target gene TFs that regulate
the target gene mRNA of the
target gene
Step 2 Binding
information
Regulation information
1
,
2 3[ ] ( [ ] [ ] , [ ])
y t = f x t x t x t y t [ ] = f x t x t ( [
1] ,
2[ ])
Step 2: Refine the TF-gene binding
matrix B into TF-gene regulatoy matrix C
Step 2
6000 200
0 1
1
1 0
ij x
C c
⎡ ⎤
⎢ ⎥
⎢ ⎥
= ⎡ ⎤⎣ ⎦ = ⎢ ⎥
⎢ ⎥
⎣ ⎦
i-th row j-th column
ij 1:
c = j-th TF can regulate the transcription of i-th gene
ij 1:
b =
6000 200
0 1
1
1 0
ij x
B b
⎡ ⎤
⎢ ⎥
⎢ ⎥
= ⎡ ⎤⎣ ⎦ = ⎢ ⎥
⎢ ⎥
⎣ ⎦
i-th row j-th column
j-th TF can bind i-th gene
DNA microarray experiments
Microarray time-series data
• Data value is in log2 domain
25 time points
6000 genes
1, , 6000 , 5
[ ] , 1 , 2
i
i
x t = t =
If |b2| & |b3| are >> 0, but |b1| is not
(
1 1 2 2 3 3)
[ 1] [ ] [ ] [ ] [ ] [ ]
y t + = b x t ⋅ + ⋅ b x t + ⋅ b x t + k − ⋅ a y t + ε t
TF3 TF2 TF1
TF3 TF2
Dynamic modelling of transcription
6000 200
1 1 1
X
B
⎡ ⎤
⎢ ⎥
= ⎢ ⎥
⎢ ⎥
⎣ ⎦
6000 200
1 1 0
X
C
⎡ ⎤
⎢ ⎥
= ⎢ ⎥
⎢ ⎥
⎣ ⎦
1
,
2 3[ ] ( [ ] [ ] , [ ])
y t = f x t x t x t
1[ ]
2[ ] x t
3[ ] x t x t
[ ] y t
f
production term – degration term + noise term
[ ]
3[ ] y t x t
2[ ] x t
1[ ] x t
Details of Methods
1
[ 1] [ ] [ ] [ ]
N
i i
i
y t k b x t a y t ε t
=
⎛ ⎞
+ = ⎜ + ⋅ ⎟ − ⋅ +
⎝ ∑ ⎠
ε[ ]t ∼ N(0,σ 2)In general,
Dynamic modeling of gene expression
(
1 1 2 2 3 3)
[ 1] [ ] [ ] [ ] [ ] [ ]
y t + = b x t ⋅ + ⋅ b x t + ⋅ b x t + k − ⋅ a y t + ε t
1 2
3
, ,
[ ] [ ] [ ], [ ] x t x t x t y t
Microarray experiments can measure the mRNA time profiles of
production term – degration term + noise term
[ ]
1
[ 1] 1[ ] N[ ] [ ] 1 N [ ] b
y t x t x t y t b t
a k
ε
⎡ ⎤⎢ ⎥
+ = − ⋅⎢ ⎥⎢ ⎥ +
⎢ ⎥⎢ ⎥
⎢ ⎥⎣ ⎦
1
2 1 1 2 1 1 1 2 1
3 1 2 2 2 2 2 2
1 1 2 1 1 1 1
[ ] [ ] [ ] [ ] [ ] 1 [ ]
[ ] [ ] [ ] [ ] [ ] 1 [ ]
[ ] [ ] [ ] [ ] [ ] 1 [ ]
N N
N
M M M N M M M
b
y t x t x t x t y t b t
y t x t x t x t y t t
b
y t x t x t x t y t a t
k
ε ε
− − − − ε −
⎡ ⎤⎢ ⎥
⎡ − ⎤
⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎡ ⎤
⎢ ⎥ ⎢ − ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥=⎢ ⎥⋅⎢ ⎥+⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ − ⎢ ⎥ ⎢ ⎥
⎣ ⎦ ⎣ ⎦ ⎢ ⎥ ⎣ ⎦
⎢ ⎥⎣ ⎦
1 2
[ ], [ ], [ ], , N[ ]
y t x t x t x t t = t t1, ,2 ,tM
e
Y = Φ ⋅ θ +
⇒
for
1
[ 1] [ ] [ ] [ ]
N
i i
i
y t k b x t a y t ε t
=
⎛ ⎞
+ = ⎜⎝ +
∑
⋅ ⎟⎠ − ⋅ +If we have the following M-point time series microarray data
Estimating the parameters of the model
by ML method
⇒
⇒
∑
= E{ }
eeT = σ 2IAssume i.e.
( ( )
Σ)
⎩⎨⎧− Σ ⎭⎬⎫= −
− exp 2
det 2
) 1 (
1 2
/ 1 1
e e e
p
T
π M
( )
⎩⎨⎧− −Φ⋅ −Φ⋅ ⎭⎬⎫=
= ( −1)/2 2
2 2
2
2
) (
) exp (
2 ) 1
, ( )
,
( σ
θ θ
σ πσ θ σ
θ p Y Y
L
T M
[ ]
∑
−= + − ⋅
− −
−
= 1
1
2 2 1
2
2 [ ] [ ]
2 ) 1 2
2 log(
) 1 ,
( log
M
k
k
k t
t M y
L φ θ
πσ σ σ
θ
2
1 1
[ ],t , [tM ] i i d. . . N(0, )
ε ε − ∼ σ
⇒
2&
θ σ
The ML estimate of equals
1
ˆ ˆ
ˆ[ 1] [ ] ˆ ˆ[ ]
N
i i
i
y t b x t k a y t
=
⎛ ⎞
+ = ⎜ ⋅ + ⎟ − ⋅
⎝
∑
⎠The transcriptional regulatory mechanism of a target gene could be modeled by the following dynamic equation
) 0 ,
( log
) 0 ,
( log
2 2 ˆ 2
2
ˆ 2
∂ =
∂
∂ =
∂
=
=
σ σ
θ θ
σ σ θ θ
σ θ L L
[ ]
( ˆ) ( ˆ)1 ˆ 1
) ( )
1 ( ˆ 1
1
1
1
2 φ θ θ θ
σ −Φ⋅ −Φ⋅
= −
⋅
− −
=
∑
−= + Y Y
t M t
M y
M
k
T k
k
1
1 2
ˆ
ˆ (
T)
TY [ b b ˆ b ˆ
Na ˆ k ˆ ]
θ = Φ Φ Φ
−=
TF is said to have a significant regulatory effect on
the target gene’s transcription if statistically b
i0
ˆi i
ii
t b
= s u
(
1) (
2)
df = M − − N +
u
iii − ( Φ Φ
T)
−1( ) ( )
ˆ ˆ
( ) ( )
1 2
Y T Y
s M N
θ θ
− Φ ⋅ − Φ ⋅
= − − +
σ
The test statistic for testing the null hypothesis
is the
is an unbiased estimator of
i
~ t distribution
with
where th diagonal element of
is
Reject H
0if where is the type I error t
i> t
α/ 2α
0
:
i0
H b =
Step 2: Refine the TF-gene binding
matrix B into TF-gene regulatoy matrix C
Step 2
6000 200
0 1
1
1 0
ij x
C c
⎡ ⎤
⎢ ⎥
⎢ ⎥
= ⎡ ⎤⎣ ⎦ = ⎢ ⎥
⎢ ⎥
⎣ ⎦
i-th row j-th column
ij 1:
c = j-th TF can regulate the transcription of i-th gene
ij 1:
b =
6000 200
0 1
1
1 0
ij x
B b
⎡ ⎤
⎢ ⎥
⎢ ⎥
⎡ ⎤
= ⎣ ⎦ = ⎢ ⎥
⎢ ⎥
⎣ ⎦
i-th row j-th column
j-th TF can bind i-th gene
Identify yeast cell cycle TFs
TF's
6000 200
1 1 1 C X
⎡ ⎤
⎢ ⎥
⎢ ⎥
⎢ ⎥
= ⎢ ⎥
⎢ ⎥
⎢ ⎥
⎣ ⎦
200 TFs
6000 genes
The procedure for checking
whether TF j is a cell cycle TF
• S: the set of cell cycle-regulated genes
• G: the set of genes that are regulated by TF j
• : the set of cell cycle-regulated genes that are also regulated by TF j
• F: the set of all genes in the yeast genome
• The p-value for rejecting the null hypothesis (H0: is observed by chance) is calculated as
( )
x T
F S S
G x p P x T x
F G
≥
⎛ − ⎞
⎛ ⎞⎜ ⎟
⎜ ⎟ −
⎝ ⎠⎝ ⎠
= ≥ =
⎛ ⎞
⎜ ⎟
⎝ ⎠
∑
T = ∩S G
If p-value <0.05, then this TF is regarded as a cell cycle TF
T
17 cell cycle TFs are identified
• 12 known cell cycle TFs according to the
MIPS database, including the nine well-
known major cell cycle TFs (Ace2, Fkh1,
Fkh2, Mbp1, Mcm1, Ndd1, Swi4, Swi5,
and Swi6), and Cin5, Cst6, and Stb1.
• Five predicted novel cell cycle TFs (Ash1,
Rlm1, Ste12, Stp1 and Tec1)
• Ash1, Rlm1,
Ste12 and Tec1
have also been
predicted as
cell cycle TFs in
previous
computational
studies.
Protein-protein interaction TF-gene binding
Compare with existing methods
• Jaccard similarity score: TP/(TP+FP+FN)
• TP: true positives
• FP: false positives
• FN: false negatives
Known cell cycle TFs
Cell cycle TFs predicted by
Method 1
Cell cycle TFs predicted by
Method 2
36