From Evolutionary Biology to Interior-Point Methods

(1)

Methods

Paul Tseng

Mathematics, University of Washington Seattle

WCOM, UBC Kelowna October 27, 2007

Joint work with Immanuel Bomze and Werner Schachinger (Univ. Vienna) Abstract

This is a talk given at WCOM Kelowna, October 2007.

(2)

Talk Outline

• Affine-Scaling Method for Linearly Constrained Optimization

• Replicator Dynamics in Evolutionary Biology

• 1st-Order Interior-Point Methods for Linearly Constrained Optimization

? Convergence & Convergence Rate

? Numerical Tests

• Conclusions & Open Questions

(3)

Linearly Constrained Smooth Optimization

(P) max

x f (x) s.t. Ax = b, x ≥ 0

f : <ⁿ → < is continuously diff., A ∈ <^m×n of rank m, b ∈ <^m

Seek a stationary pt: a feasible x (Ax = b, x ≥ 0) with x ⊥ ∇f (x) − A^>λ ≤ 0 for some λ ∈ <^m.

Primal Nondegeneracy: For any feasible x, the columns of A corresponding to {j | x_j 6= 0} have rank m.

(4)

Affine-Scaling Method for (P)

Given Ax⁰ = b, x⁰ > 0, generate for t = 0, 1, ...

d^t = arg max

d

∇f (x^t)^>d | Ad = 0, k(X^t)⁻¹dk₂ ≤ 1 x^t+1 = x^t + α^td^t > 0

with X^t = diag(x^t), α^t > 0 suitably chosen Dikin ’67, ’72

(5)

The AS method is simple, fairly efficient in practice, but difficult to analyze.

Barnes, Bonnans, Dikin, Gonzaga, Mascarenhas, Monma, Monteiro, Roos, Saigal, J. Sun, T, Tsuchiya, Vanderbei, Ye, ...

• When f is linear, {x^t} and {f (x^t)} converge linearly; ^{Luo, T ’92}

the limit x¯ attains maximum if α^t is not “too large”; Tsuchiya ’91, Tsuchiya & Muramatsu ’95

¯

x may fail to attain maximum if α^t is “too large”. Mascarenhas ’97, Terlaky & Tsuchiya ’99

• When f is concave or convex and assuming primal nondeg, every cluster pt of {x^t} is a stationary pt of (P). Gonzaga & Carlos ’90, Monteiro & Wang ’98

What about general f? And convergence rate?

(6)

Replicator Dynamics in Evolutionary Biology

x^t_j = fraction of jth genotype in pop. at time t, j = 1, ..., n Initially, x⁰_j > 0 and P

j x⁰_j = 1. x^t = (x^t_j)ⁿ_j=1 evolves according to

x^t+1 = X^tQx^t

(x^t)^>Qx^t, t = 0, 1, ...

with adaptation coefficients Q_ii > 0 and Q_ij ≥ 0 for all i, j ..., Haldane ’32, ..., Moran ’62, ...

• {x^t} converges and its limit x¯ satisfies x¯_j((Q¯x)_j − ¯λ) = 0 for all j, with λ = max¯ _j(Q¯x)_j; convergence rate is linear iff (Q¯x)_j < ¯λ whenever x¯_j = 0;

otherwise k¯x − x^tk = O(1/√

t). Lyubich et al. ’80

(7)

Connecting RD with AS

Rewrite RD iteration as

x^t+1 − x^t = X^t[g^t − e (x^t)^>g^t] (x^t)^>g^t

with g^t = Qx^t, e = (1, ..., 1)^>.

Thus

x^t+1 = x^t + α^td^t with d^t = X^tr(x^t) where α^t = 1/(x^t)^>g^t and

r(x) = ∇f (x) − e x^>∇f (x)

(8)

Solve for d^t in AS iteration when A = e^> yields d^t ∝ (X^t)²

g^t − e e^>(X^t)²g^t kx^tk²₂

with g^t = ∇f (x^t), e = (1, ..., 1)^>.

Thus

x^t+1 = x^t + α^td^t with d^t = (X^t)²r(x^t) where α^t > 0 and

r(x) = ∇f (x) − e e^>X²∇f (x) kxk²

(9)

1st-Order Interior-Point Methods for (P)

Given Ax⁰ = b, x⁰ > 0, generate for t = 0, 1, ...

x^t+1 = x^t + α^td^t with d^t = (X^t)^2γr_γ(x^t) where γ > 0,

r_γ(x) = ∇f (x) − A^>(AX^2γA^>)⁻¹AX^2γ∇f (x) and α^t is the largest α ∈ {α^t₀(β)^k}_k=0,1,... satisfying

f (x^t + αd^t) ≥ f (x^t) + σα(g^t)^>d^t, Armijo-type inexact LS

where 0 < β, σ < 1, g^t = ∇f (x^t), and

0 < α^t₀ < ( ∞ if d^t ≥ 0;

−1

min_j d^t_j/x^t_j else.

(10)

γ = 1/2 =⇒ RD

γ = 1 =⇒ AS

This method is simple, suited for large problems (n ≥ 10000).

Convergence of {x^t}? Convergence rate of {f (x^t)}, {x^t}?

Choosing γ?

(11)

Convergence Results

: Assume {x feasible | f (x) ≥ f (x⁰)} is bounded. Then x^t > 0, {f (x^t)} ↑, and {x^t}, {d^t} are bounded.

(a) Assume primal nondeg, f is concave or convex, and we choose

inf_t α^t₀ > 0, sup_t α^t₀ < ∞. Then every cluster pt of {x^t} is a stationary pt of (P).

(b) Assume f is quadratic and we choose inf_t α^t₀ > 0. Then υ − f (x^t) = O

1/t1/ max{γ,2γ−1} , with υ = lim

t→∞f (x^t).

Assume in addition we choose γ < 1. Then {x^t} converges and its limit x¯ satisfies

k¯x − x^tk = O 1/t

1−γ 2γ

.

Under primal nondeg, x¯ is a stationary pt of (P). Moreover, if γ ≤ ¹₂ and

¯

x − r_γ(¯x) > 0, then {f (x^t)} converges Q-linearly and {k¯x − x^tk} converges R-linearly. If instead sup_t α^t₀ < ∞, γ ≥ ¹₂ and x − r¯ _γ(¯x) 6> 0, then {k¯x − x^tk}

cannot converge linearly.

• Thus, γ < 1 seems preferrable.

(12)

Numerical Tests

:

• Implement 1st-order IP method in Matlab. For Armijo LS, use β = .5, σ = .1, α^t₀ = min

0.95α^t_feas, max

10⁻⁵, α^t−1 β²

, α^t_feas = −1

min_j(d^t_j/x^t_j), with α⁻¹ = ∞.

• Numerical tests on

max_x f (x) s.t. e^>x = 1, x ≥ 0

with −f from Mor ´e-Garbow-Hillstrom set (least square), and n = 1000.

• Initial x⁰ = e/n. Terminate when resid := k min{x^t, −r_γ(x^t)}k ≤ tol.

(13)

f (x) γ #iter #f-eval cpu (sec) obj resid

BAL .8 ^†8 84 0.02 9.98998·10⁸ 1.4·10⁻⁶

1 7 8 0.03 9.98998·10⁸ 3.6·10⁻⁸

1.2 8 9 0.02 9.98998·10⁸ 3.7·10⁻⁷

BT .8 9146 27409 20.68 999.031 9.9·10⁻⁴

1 17559 52665 23.31 999.055 9.9·10⁻⁴

1.2 527757 1.58·10⁶ 1192.55 999.081 9.9·10⁻⁴

DBV .8 99 299 0.42 4.9·10⁻⁸ 9.8·10⁻⁵

1 146 440 0.52 4.5·10⁻⁸ 9.8·10⁻⁵

.2 240 722 1.02 4.0·10⁻⁸ 9.9·10⁻⁵

EPS .8 424 1269 1.89 1.3·10⁻⁶ 9.9·10⁻⁴

1 987 2958 3.52 3.9·10⁻⁶ 9.9·10⁻⁴

1.2 1963 5887 8.76 8.0·10⁻⁶ 9.4·10⁻⁴

ER .8 5 6 0.01 498.002 5.2·10⁻⁷

1 7 8 0.03 498.002 2.6·10⁻⁷

1.2 10 11 0.03 498.002 3.7·10⁻⁷

LR1 .8 20 21 0.04 3.32834·10⁸ 9.5·10⁻⁷

1 19 20 0.03 3.32839·10⁸ 3.5·10⁻⁷

1.2 20 21 0.03 3.33481·10⁸ 9.9·10⁻⁷

VD .8 22 74 0.04 6.22504·10²² 2.4·10⁻⁹

1 19 46 0.04 6.22504·10²² 2.6·10⁻⁸

1.2 18 50 0.05 6.22504·10²² 5.7·10⁻⁹

Table 1: Performance of 1st-order IP method.

† Quit due to Armijo ascent condition not met when α < 10⁻²⁰

(14)

Conclusions & Open questions

1. Theory and practice suggest γ < 1 is preferrable to γ ≥ 1.

2. The method and its analysis readily extends to 0 ≤ x ≤ u by replacing X^t with min{X^t, U − X^t}.

3. Convergence of {x^t} when γ ≥ 1 or when f is not quadratic?

4. Linear convergence of {f (x^t)} and {x^t} when γ > ¹₂ or when f is not quadratic?

5. Convergence of {x^t} for 2nd-order AS method?

(15)

Convergence Proof Ideas

(a) f (x^t+1) − f (x^t) ≥ σα^tkη^tk² with η^t = (X^t)^γr(x^t).

(b) If f is quadratic, use linearity of KT condition to show

∆^t := υ − f (x^t) = O

kη^tk^min{^1+γ² ^,¹^γ^} .

(c) If f is quadratic and γ < 1, then

kx^t+1 − x^tk = α^tk(X^t)^γη^tk = O(kη^tk) = O kη^tk² kη^tk

= O ∆^t − ∆^t+1 (∆^t)^1+γ²

!

= O

Z ∆^t

∆^t+1

t⁻^1+γ² dt

!

= O

(∆^t)^1−γ² − (∆^t+1)^1−γ²

(d) Under primal nondegeneracy, r_γ is continuous on the feasible set.