• 沒有找到結果。

An optimal algorithm for identifying a maximum-density segment

N/A
N/A
Protected

Academic year: 2021

Share "An optimal algorithm for identifying a maximum-density segment"

Copied!
72
0
0

加載中.... (立即查看全文)

全文

(1)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 11

An optimal algorithm for

identifying a maximum-density

segment

呂學一 ( 中央研究院 資訊科學所 )

http://www.iis.sinica.edu.tw/~hil/

Microsoft Office

XP is needed to

see all the

animation

(2)

What do

algorithm people

do?

Inventing

efficient recipes

to solve

combinatorial problems

(3)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 33

A famous combinatorial

A famous combinatorial

problem

problem

The Factorization Problem

Input:

a number

N

Output:

“yes”

if

N

is a prime number;

A factorization of N

if

N

is not a prime number.

– For example,

N

= 323264989793317.

(4)

OPEN QUESTION

Is there an efficient recipe for the

(5)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 55

Why Factorization?

The security of many encryption schemes

is based upon the assumption that the

(6)
(7)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 77

RSA factorization

RSA factorization

challenges

challenges

Challenge

Number

Prize ($US)

Challenge

Number

Prize ($US)

RSA-576

$10,000

RSA-896

$75,000

RSA-640

$20,000

RSA-1024

$100,000

RSA-704

$30,000

RSA-1536

$150,000

(8)

US$10,000 –– RSA-576

US$10,000 –– RSA-576

1881988129206079638386972394616504

3980716356337941382700763356422988

8597152346654853190606065047430453

1738801130339671619969232120573403

1879550656996213051687593076502570

59

(9)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 99

RSA-576 factored in

RSA-576 factored in

December 3, 2003

December 3, 2003

3980750864240649373971255005503864

9119906436234252670840638518957594

6388957261768583317

4727721461074353025362230719730482

2463291469530209711645985217113052

0711256363590397527

At the same time, Adi Shamir gave two

(10)

US$20,000 –– RSA-640

US$20,000 –– RSA-640

3107418240490043721350750035888567

9300373460228427275457201619488232

0644051808150455634682967172328678

2437916272838033415471073108501919

5485290073377248227835257423864540

14691736602477652346609

(11)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 1111

US$200,000 –– RSA-2048

US$200,000 –– RSA-2048

25195908475657893494027183240048398571429282126204

03202777713783604366202070759555626401852588078440

69182906412495150821892985591491761845028084891200

72844992687392807287776735971418347270261896375014

97182469116507761337985909570009733045974880842840

17974291006424586918171951187461215151726546322822

16869987549182422433637259085141865462043576798423

38718477444792073993423658482382428119816381501067

48104516603773060562016196762561338441436038339044

14952634432190114657544454178424020924616515723350

77870774981712577246796292638635637328991215483143

81678998850404453640235273819513786365643912120103

97122822120720357

(12)

Short of cash?

(13)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 1313

RSA 2003 (April ’03)

(14)

2002 Turing Award

2002 Turing Award

(June’03)

(15)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 1515

The awarded paper

The awarded paper

Only 7 pages.

– “A Method for Obtaining Digital Signatures

and Public Key Cryptosystems”,

Communications of the ACM 21, 120-126,

(16)

“PRIMES is in P”

Agarwal, Kayal, and Saxena

(17)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 1717

PRIMES is in P

PRIMES is in P

The PRIMES problem:

– Input: a number N.

– Output:

“yes”if N is a prime number.

“no” if N is not a prime number.

Only 9 pages!

Running time is O(n

12

), where n is the

(18)

NEW YORK TIMES

NEW YORK TIMES

, Aug.

, Aug.

8, 2002

8, 2002

Previous algorithmic results that caught

the attention of the New York Times

– 1984, Karmarkar’s algorithm for solving

linear programs.

– 1979, Khachian’s algorithm for solving linear

programs.

(19)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 1919

The latest version (v.3) of

The latest version (v.3) of

AKS’s paper

AKS’s paper

The running time is now improved from

(20)

What do algorithm people

What do algorithm people

do?

do?

Looking for important/interesting

combinatorial problems

Coming up with efficient recipes to solve

(21)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 2121

Bioinformatics

Bioinformatics

(22)
(23)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 2323

Finding a DNA segment with Max

GC-density in linear time

WABI  J. Comput. Sys. Sci.

ESA  SIAM J. Computing

(24)

DNA Sequences

DNA Sequences

[Chargaff and Vischer, 1949]

– DNA consisting of A, G, T, C

Adenine ( 腺嘌呤 )

Guanine ( 鳥糞嘌呤 )

Cytosine ( 胞嘧啶 )

Thymine ( 胸腺嘧啶 )

(25)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 2525

[Vischer, Zamenhof, Chargaff, 1949]

– Negative evidences for the widely believed

%A = %G = %T = %C.

(26)

Edwin Chargaff,

Edwin Chargaff,

1905-

Observing

– %A ~ %T

– %G ~ %C

“A comparison of the

molar proportions

reveals certain

(27)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 2727

Double Helix

Double Helix

[Watson and Crick, Nature,

April 25, 1953]

– Biologist (age 23, fresh Ph.D.) +

Physicist (age 35, still a Ph.D.

student)

(28)

1962 Nobel Prize in

1962 Nobel Prize in

Physiology or Medicine

Physiology or Medicine

(29)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 2929

DNA’s picture

DNA’s picture

[Alexander Rich, 1973]

– Structure biologist at MIT.

(30)

Celebrating

Celebrating

50 years

50 years

of Double

of Double

Helix (April 25, 1953 – 2003)

(31)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 3131

Francis Crick 1916-2004

Francis Crick 1916-2004

Passed away on July 28, 2004

(32)

Maurice Wilkins 1916-2004

Maurice Wilkins 1916-2004

(33)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 3333

GC-content

GC-content

Non-uniformity of nucleotide composition

– 25% - 75% in genomes of all of organisms

– 40% - 50% in typical mammalian genomes

– 30% - 60% in human chromosomes

(34)

GC content

GC content

GC-content is positively correlated with

– gene length,

– gene density,

– patterns of coden usage,

(35)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 3535

The Problem

The Problem

Input:

– an n-bit string S,

– an integer L.

Output:

– a substring S[i, j] of S with maximum density

over all substrings of S with at least L bits.

(36)

Example

Example

S = 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1

L = 1, 1

0 0 1 1 0 0 1 0 1 1 0 1 0 0 1

L = 2, 1 0 0 1 1

0 0 1 0 1 1 0 1 0 0 1

L = 3, 1 0 0 1 1 0 0 1 0 1 1

0 1 0 0 1

(37)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 3737

density of each segment

density of each segment

in O(1) time

in O(1) time

prefix-sum(i) = S[1]+S[2]+…+S[i],

– all n prefix sums are computable in O(n) time.

sum(i, j) = prefix-sum(j) – prefix-sum(i-1)

density(i, j) = sum(i, j) / (j-i+1)

(38)

Good partners

Good partners

Finding the

best ending position

g(i)

for

each i=1,2,…,n.

L

g(i)

maximing avg[i,

g(i)

]

(39)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 3939

Previous Work

Previous Work

[Huang, CABIOS ’94]

O(nL) time

.

Key observation: no need to examine substrings

longer than

2L.

L

i+L

g(i)

L

(40)

Recent Progress

Recent Progress

[Lin, Jiang, Chao, J. Computer Systems

and Science (JCSS), 2002]

O(n log L) time

.

– Techniques:

Right-skew decomposition

.

(41)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 4141

Our results

Our results

(42)

Reviewing Lin, Jiang, and

Chao’s Algorithm

(43)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 4343

Right-Skew Substring

Right-Skew Substring

S[i, j] is

right-skew

if for each k = i,…, j-1

– density[i, k] ≤ density[k+1, j].

S =

1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1

1

0 0 1

1 0 0 1 0 1 1 0 1 0 0 1

1 0

0 1 1

0 0 1 0 1 1 0 1 0 0 1

1 0 0 1 1 0

0 1 0 1 1

0 1 0 0 1

(44)

Right-Skew

Right-Skew

Decomposition

Decomposition

Partition S into substrings S

1

,S

2

,…,S

k

such

that

– each S

i

is a right-skew substring of S

– density(S

1

)

>

density(S

2

)

>

>

density(S

k

)

(45)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 4545

An example

An example

1

1

0

1

1

0

1

0

1

1

0

0

1

1

>

2/3

>

3/5

>

1/3

(46)

Why RS-decomposition?

Why RS-decomposition?

1.

It suffices to search for g(i) among the

boundaries of RS-decomposition of

S[i,

n]

.

2.

The boundaries’s “potential” of being a

(47)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 4747

Illustration

Illustration

L

i+L

g(i)

i+L

L

(48)

Preprocessing steps

Preprocessing steps

1.

RS-decomposition of S[i, n] for each i.

2.

Jumping table that enables binary search

(49)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 4949

First preprocessing:

First preprocessing:

All RS-decompositions

All RS-decompositions

The RS-decomposition of each S[i, n]

– Linear time for each i = 1, …, n.

All n RS-decompositions

– [Lin et al.] O(n

2

) time  O(n) time.

L

(50)

Key: nested structures

Key: nested structures

(51)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 5151

Second Preprocessing:

Second Preprocessing:

Jumping Table

Jumping Table

Why?

– Need a table that enables jumping over 2

k

right-skew components in O(1) time for each

k.

[Lin et al.] O(n

2

) time  O(n log L) time.

L

(52)

LJC’s Algorithm

LJC’s Algorithm

Three main steps:

1.

All RS-decompositions in O(n) time.

2.

Jumping table in O(n log L) time.

3.

For each i=1, 2,…, n

Binary searching g(i) in O(log L) time.

(53)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 5353

(54)

A new important

A new important

observation

observation

i < j < g(j) < g(i) implies

density(i, g(i)) is no more than

density(j, g(j))

(55)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 5555

(56)

Searching for all g(i) in

Searching for all g(i) in

linear time

(57)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 5757

We still need RS-decomp.

We still need RS-decomp.

L

(58)

A generalized version

A generalized version

Input:

– an n-bit string S

– an integer L

an integer U

Output: a substring S[i, j] of S with maximum

(59)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 5959

Difficulty

Difficulty

(60)

Idea: Partition the input

Idea: Partition the input

into blocks of length U-L

into blocks of length U-L

For each index i, g(i) can only be in two

consecutive blocks.

i

U-L

U-L

U-L

U-L

U-L

U-L

U-L

(61)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 6161

Consider two cases

Consider two cases

separately

separately

L

i

L

i

L+d

i

(62)

Case 1

Case 1

L

i

Taking care of all indices i with the same

left block together.

Just like no U is specified.

L

(63)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 6363

Case 2

Case 2

Taking care of all indices i with the same

left block together.

(64)

Need RS-decompositions for

Need RS-decompositions for

all prefixes of each block

all prefixes of each block

(65)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 6565

ESA/SICOMP result

ESA/SICOMP result

An Optimal Algorithm for the

Maximum-Density Segment Problem, with Kai-min

Chung, in Proceedings of the 11th

Annual European Symposium on

Algorithms, Budapest, Hungary,

(66)

Kai-min’s idea

Kai-min’s idea

For a segment

– i k j

For a feasible segment

lowest density

lowest density

prefix of

prefix of

lowest density

lowest density

prefix of

prefix of

lowest density

lowest density

prefix of

prefix of

max-density

max-density

segment got

segment got

(67)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 6767

Our algorithm

Our algorithm

For each possible j, we find a “good”

candidate S(i

j

,j) and look for the max-density

segment over all S(i

j

,j)

– i

j

-1 j

removable prefix

L

(68)

The features of our new

The features of our new

algorithm

algorithm

No need of the clever but somewhat

complicated right-skew decomposition.

As a result, our algorithm can process the

(69)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 6969

Some thoughts

Some thoughts

Almost all algorithmic results start with

very simple observations.

Experience and skills (in analysis) are

certainly crucial, but being creative and

diligent is even more important in doing

good (algorithmic) research.

(70)

Would you like to join the algorithmic

adventure?

(71)

2004/12/13

2004/12/13 Maximum-Density Segment @ EE.NTUMaximum-Density Segment @ EE.NTU 7171

Ads

Ads

vol 1

vol 2

one-way functions

pseudo-randomness

zero-knowledge proofs

encryption schemes

digital signatures

cryptographic protocols

(72)

Today’s slides

Today’s slides

can be found at

參考文獻

相關文件

In particular, we present a linear-time algorithm for the k-tuple total domination problem for graphs in which each block is a clique, a cycle or a complete bipartite graph,

GMRES: Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems..

For an important class of matrices the more qualitative assertions of Theorems 13 and 14 can be considerably sharpened. This is the class of consistly

Wang, A recurrent neural network for solving nonlinear convex programs subject to linear constraints, IEEE Transactions on Neural Networks, vol..

Abstract Based on a class of smoothing approximations to projection function onto second-order cone, an approximate lower order penalty approach for solving second-order cone

Based on a class of smoothing approximations to projection function onto second-order cone, an approximate lower order penalty approach for solving second-order cone

where L is lower triangular and U is upper triangular, then the operation counts can be reduced to O(2n 2 )!.. The results are shown in the following table... 113) in

In this paper, we develop a novel volumetric stretch energy minimization algorithm for volume-preserving parameterizations of simply connected 3-manifolds with a single boundary