• 沒有找到結果。

參加在美國舊金山的2010 IEEE Workshop on Signal Processing Systems 並順道參訪芝加哥大學生態與 演化學系李文雄院士實驗室

N/A
N/A
Protected

Academic year: 2021

Share "參加在美國舊金山的2010 IEEE Workshop on Signal Processing Systems 並順道參訪芝加哥大學生態與 演化學系李文雄院士實驗室"

Copied!
35
0
0

加載中.... (立即查看全文)

全文

(1)

Identification of yeast cell cycle

Transcription Factors (TFs) using

dynamic system model

Wei-Sheng Wu (吳謂勝)

Lab. of Computational Systems Biology Dept. of Electrical Engineering

National Cheng Kung University Taiwan

2010/10/07

(2)

Outlines

• Biological background

• Methods

• Results

• Conclusions

(3)

About yeast

• Yeast is a uni-cellular organism.

• Cell size: 3–4 µm in diameter

• Yeast has been extensively studied by biologists, so many kinds of experimental data are available for us to solve biological problems in computational way.

• This work is try to identify yeast cell cycle TFs using computational analyses.

(4)

Yeast cell cycle process

The cell cycle is the series of events that takes place in a cell leading to its division and duplication.

(5)

Proteins involved in yeast cell cycle

process (so-called cell cycle proteins)

(6)

Regulation of Protein Synthesis

Protein Transcription

Translation

(7)

Transcriptional Regulation

Regulatory region Coding region

Gene

mRNA Transcription Factors (TFs)

(activator or repressor)

TF binding sites

Gene expression

2[ ] x t

1[ ] x t

Information processing device

Transcription

1 2

[ ] ( [ ], [ ]) y t = f x t x t

f

(8)

Complexity of studying transcriptional

regulation in yeast

Yeast has ~200 TFs and ~6000 genes

To understand the transcriptional regulation in yeast, we need to know 1. For each gene, which TFs would

regulate its transcription? (usually

<10)

2. For each TF, which genes are regulated by it? (one~several hundreds)

This work aims to identify the TFs that regulate the

transcription of those genes that produce cell cycle proteins These TFs are called cell cycle TFs

(9)

How to identify the TFs that regulate the

transcription of a target gene

Transcription of a target gene

~200 TFs in yeast

mRNA of the target gene

Step 1: Identify the TFs that bind the target gene

(usually <10) among all possible TFs (~200) in yeast

Step 1

?

~200 TFs

1 200

[ ] ( [ ], , [ )] y t = f x tx t

1 , 2 3

[ ] ( [ ] [ ], [ ]) y t = f x t x t x t [ ]

y t

1[ ] x t

2[ ] x t

3[ ] x t

[ ] y t

Reduce complexity

(10)

ChIP-chip experiment

(11)

ChIP-chip data

ij

1:

m =

6000 200

0 1

1

1 0

ij x

M m

⎡ ⎤

⎢ ⎥

⎢ ⎥

= ⎡ ⎤ ⎣ ⎦ = ⎢ ⎥

⎢ ⎥

⎣ ⎦

j-th TF can bind i-th gene

6000 genes 200 TFs

(12)

Transcripton factor binding site

(TFBS) analysis

(13)

TFBS data

ij

1:

m =

6000 200

0 1

1

1 0

ij x

M m

⎡ ⎤

⎢ ⎥

⎢ ⎥

= ⎡ ⎤ ⎣ ⎦ = ⎢ ⎥

⎢ ⎥

⎣ ⎦

j-th TF can bind i-th gene

6000 genes 200 TFs

(14)

TF mutant experiments

Have effect?

(15)

TF mutant data

ij

1:

m =

6000 200

0 1

1

1 0

ij x

M m

⎡ ⎤

⎢ ⎥

⎢ ⎥

= ⎡ ⎤ ⎣ ⎦ = ⎢ ⎥

⎢ ⎥

⎣ ⎦

j-th TF can bind i-th gene

6000 genes 200 TFs

i-th row j-th column

(16)

Step 1: Construct TF-gene binding matrix

ij 1:

b =

6000 200

0 1

1

1 0

ij x

B b

⎡ ⎤

= ⎣ ⎦ =

i-th row j-th column

j-th TF can bind i-th gene

Step 1

(17)

• Step 2: Identify the TFs that regulate the target gene among the TFs that bind the target gene

Transcription of a target gene TFs that regulate

the target gene mRNA of the

target gene

Step 2 Binding

information

Regulation information

1

,

2 3

[ ] ( [ ] [ ] , [ ])

y t = f x t x t x t y t [ ] = f x t x t ( [

1

] ,

2

[ ])

(18)

Step 2: Refine the TF-gene binding

matrix B into TF-gene regulatoy matrix C

Step 2

6000 200

0 1

1

1 0

ij x

C c

= ⎡ ⎤⎣ ⎦ =

i-th row j-th column

ij 1:

c = j-th TF can regulate the transcription of i-th gene

ij 1:

b =

6000 200

0 1

1

1 0

ij x

B b

= ⎡ ⎤⎣ ⎦ =

i-th row j-th column

j-th TF can bind i-th gene

(19)

DNA microarray experiments

(20)

Microarray time-series data

• Data value is in log2 domain

25 time points

6000 genes

1, , 6000 , 5

[ ] , 1 , 2

i

i

x t = t =

(21)

If |b2| & |b3| are >> 0, but |b1| is not

(

1 1 2 2 3 3

)

[ 1] [ ] [ ] [ ] [ ] [ ]

y t + = b x t ⋅ + ⋅ b x t + ⋅ b x t + k − ⋅ a y t + ε t

TF3 TF2 TF1

TF3 TF2

Dynamic modelling of transcription

6000 200

1 1 1

X

B

= ⎢

6000 200

1 1 0

X

C

= ⎢

1

,

2 3

[ ] ( [ ] [ ] , [ ])

y t = f x t x t x t

1[ ]

2[ ] x t

3[ ] x t x t

[ ] y t

f

production term – degration term + noise term

[ ]

3[ ] y t x t

2[ ] x t

1[ ] x t

(22)

Details of Methods

1

[ 1] [ ] [ ] [ ]

N

i i

i

y t k b x t a y t ε t

=

⎛ ⎞

+ = ⎜ + ⋅ ⎟ − ⋅ +

⎝ ∑ ⎠

ε[ ]t N(0,σ 2)

In general,

Dynamic modeling of gene expression

(

1 1 2 2 3 3

)

[ 1] [ ] [ ] [ ] [ ] [ ]

y t + = b x t ⋅ + ⋅ b x t + ⋅ b x t + k − ⋅ a y t + ε t

1 2

3

, ,

[ ] [ ] [ ], [ ] x t x t x t y t

Microarray experiments can measure the mRNA time profiles of

production term – degration term + noise term

(23)

[ ]

1

[ 1] 1[ ] N[ ] [ ] 1 N [ ] b

y t x t x t y t b t

a k

ε

⎡ ⎤⎢ ⎥

+ = ⎢ ⎥⎢ ⎥ +

⎢ ⎥⎢ ⎥

⎢ ⎥⎣ ⎦

1

2 1 1 2 1 1 1 2 1

3 1 2 2 2 2 2 2

1 1 2 1 1 1 1

[ ] [ ] [ ] [ ] [ ] 1 [ ]

[ ] [ ] [ ] [ ] [ ] 1 [ ]

[ ] [ ] [ ] [ ] [ ] 1 [ ]

N N

N

M M M N M M M

b

y t x t x t x t y t b t

y t x t x t x t y t t

b

y t x t x t x t y t a t

k

ε ε

ε

⎡ ⎤⎢ ⎥

⎢ ⎥

⎢ ⎥

= ⎢ ⎥+

⎢ ⎥

⎢ ⎥

⎦ ⎢ ⎥

⎢ ⎥⎣ ⎦

1 2

[ ], [ ], [ ], , N[ ]

y t x t x t x t t = t t1, ,2 ,tM

e

Y = Φ ⋅ θ +

for

1

[ 1] [ ] [ ] [ ]

N

i i

i

y t k b x t a y t ε t

=

+ = +

− ⋅ +

If we have the following M-point time series microarray data

(24)

Estimating the parameters of the model

by ML method

= E

{ }

eeT = σ 2I

Assume i.e.

( ( )

Σ

)

Σ

=

exp 2

det 2

) 1 (

1 2

/ 1 1

e e e

p

T

π M

( )

Φ Φ

=

= ( 1)/2 2

2 2

2

2

) (

) exp (

2 ) 1

, ( )

,

( σ

θ θ

σ πσ θ σ

θ p Y Y

L

T M

[ ]

= +

= 1

1

2 2 1

2

2 [ ] [ ]

2 ) 1 2

2 log(

) 1 ,

( log

M

k

k

k t

t M y

L φ θ

πσ σ σ

θ

2

1 1

[ ],t , [tM ] i i d. . . N(0, )

ε ε σ

(25)

2

&

θ σ

The ML estimate of equals

1

ˆ ˆ

ˆ[ 1] [ ] ˆ ˆ[ ]

N

i i

i

y t b x t k a y t

=

⎛ ⎞

+ = ⎜ ⋅ + ⎟ − ⋅

The transcriptional regulatory mechanism of a target gene could be modeled by the following dynamic equation

) 0 ,

( log

) 0 ,

( log

2 2 ˆ 2

2

ˆ 2

=

=

=

=

σ σ

θ θ

σ σ θ θ

σ θ L L

[ ]

( ˆ) ( ˆ)

1 ˆ 1

) ( )

1 ( ˆ 1

1

1

1

2 φ θ θ θ

σ −Φ⋅ −Φ⋅

= −

− −

=

= + Y Y

t M t

M y

M

k

T k

k

1

1 2

ˆ

ˆ (

T

)

T

Y [ b b ˆ b ˆ

N

a ˆ k ˆ ]

θ = Φ Φ Φ

=

(26)

TF is said to have a significant regulatory effect on

the target gene’s transcription if statistically b

i

0

ˆi i

ii

t b

= s u

(

1

) (

2

)

df = M − − N +

u

ii

i ( Φ Φ

T

)

1

( ) ( )

ˆ ˆ

( ) ( )

1 2

Y T Y

s M N

θ θ

− Φ ⋅ − Φ ⋅

= − − +

σ

The test statistic for testing the null hypothesis

is the

is an unbiased estimator of

i

~ t distribution

with

where th diagonal element of

is

Reject H

0

if where is the type I error t

i

> t

α/ 2

α

0

:

i

0

H b =

(27)

Step 2: Refine the TF-gene binding

matrix B into TF-gene regulatoy matrix C

Step 2

6000 200

0 1

1

1 0

ij x

C c

= ⎡ ⎤⎣ ⎦ =

i-th row j-th column

ij 1:

c = j-th TF can regulate the transcription of i-th gene

ij 1:

b =

6000 200

0 1

1

1 0

ij x

B b

⎡ ⎤

= ⎣ ⎦ =

i-th row j-th column

j-th TF can bind i-th gene

(28)

Identify yeast cell cycle TFs

TF's

6000 200

1 1 1 C X

⎡ ⎤

⎢ ⎥

⎢ ⎥

⎢ ⎥

= ⎢ ⎥

⎢ ⎥

⎢ ⎥

⎣ ⎦

200 TFs

6000 genes

(29)

The procedure for checking

whether TF j is a cell cycle TF

S: the set of cell cycle-regulated genes

G: the set of genes that are regulated by TF j

: the set of cell cycle-regulated genes that are also regulated by TF j

F: the set of all genes in the yeast genome

The p-value for rejecting the null hypothesis (H0: is observed by chance) is calculated as

( )

x T

F S S

G x p P x T x

F G

⎛ ⎞

⎜ ⎟

⎝ ⎠⎝

= =

T = ∩S G

If p-value <0.05, then this TF is regarded as a cell cycle TF

T

(30)

17 cell cycle TFs are identified

• 12 known cell cycle TFs according to the

MIPS database, including the nine well-

known major cell cycle TFs (Ace2, Fkh1,

Fkh2, Mbp1, Mcm1, Ndd1, Swi4, Swi5,

and Swi6), and Cin5, Cst6, and Stb1.

• Five predicted novel cell cycle TFs (Ash1,

Rlm1, Ste12, Stp1 and Tec1)

(31)

• Ash1, Rlm1,

Ste12 and Tec1

have also been

predicted as

cell cycle TFs in

previous

computational

studies.

Protein-protein interaction TF-gene binding

(32)

Compare with existing methods

• Jaccard similarity score: TP/(TP+FP+FN)

• TP: true positives

• FP: false positives

• FN: false negatives

Known cell cycle TFs

Cell cycle TFs predicted by

Method 1

Cell cycle TFs predicted by

Method 2

36

(33)

Conclusions

• By combinding ChIP-chip, TFBS, and TF

mutant data, we contructed a TF-gene

binding matrix.

• By using microarray data and apply

dynamic system model, we constructed a

TF-gene regulatory matrix.

(34)

• Using hyothesis testing, we identifed 17

cell cycle TFs.

• Most (12/17) of our findings are consistent

with published experimental results.

• We provided partial evidence showing the

biological relevance of our five novel

findings.

• Our performance is better than that of four

previous methods.

(35)

Thank you for your attention!

Any question?

參考文獻

相關文件

2.注重實地演練,角色扮演、跟隨經驗、實地參訪及邀請業界主管演講方 式,使學生能從「經驗中學習」

Artificial Intelligence &amp; Data Science Workshop, Hualien, Taiwan, January 2022 Wistron NeWeb Corporation, Hsinchu, Taiwan, December 2019 Speech Signal Processing Workshop,

In the development of data acquisition interface, matlab, a scientific computing software, was applied to acquire ECG data with real-time signal processing.. The developed

It allows a much wider range of algorithms to be applied to the input data and can avoid problems such as the build-up of noise and signal distortion during processing.. Since

To solve this kind of problems, the attempt to use embedded sensors in conjunction with the sonic echo method for assessing the length of a capped pile was

我說這裏還好,我在 Wichita Kansas State University 教書的 Chopra

Other researchers say one way to solve the problem of wasted food is to take steps to persuade people to stop buying so much food in the first place.. People buy more food

This kind of algorithm has also been a powerful tool for solving many other optimization problems, including symmetric cone complementarity problems [15, 16, 20–22], symmetric