Methods
Paul Tseng
Mathematics, University of Washington Seattle
WCOM, UBC Kelowna October 27, 2007
Joint work with Immanuel Bomze and Werner Schachinger (Univ. Vienna) Abstract
This is a talk given at WCOM Kelowna, October 2007.
Talk Outline
• Affine-Scaling Method for Linearly Constrained Optimization
• Replicator Dynamics in Evolutionary Biology
• 1st-Order Interior-Point Methods for Linearly Constrained Optimization
? Convergence & Convergence Rate
? Numerical Tests
• Conclusions & Open Questions
Linearly Constrained Smooth Optimization
(P) max
x f (x) s.t. Ax = b, x ≥ 0
f : <n → < is continuously diff., A ∈ <m×n of rank m, b ∈ <m
Seek a stationary pt: a feasible x (Ax = b, x ≥ 0) with x ⊥ ∇f (x) − A>λ ≤ 0 for some λ ∈ <m.
Primal Nondegeneracy: For any feasible x, the columns of A corresponding to {j | xj 6= 0} have rank m.
Affine-Scaling Method for (P)
Given Ax0 = b, x0 > 0, generate for t = 0, 1, ...
dt = arg max
d
∇f (xt)>d | Ad = 0, k(Xt)−1dk2 ≤ 1 xt+1 = xt + αtdt > 0
with Xt = diag(xt), αt > 0 suitably chosen Dikin ’67, ’72
The AS method is simple, fairly efficient in practice, but difficult to analyze.
Barnes, Bonnans, Dikin, Gonzaga, Mascarenhas, Monma, Monteiro, Roos, Saigal, J. Sun, T, Tsuchiya, Vanderbei, Ye, ...
• When f is linear, {xt} and {f (xt)} converge linearly; Luo, T ’92
the limit x¯ attains maximum if αt is not “too large”; Tsuchiya ’91, Tsuchiya & Muramatsu ’95
¯
x may fail to attain maximum if αt is “too large”. Mascarenhas ’97, Terlaky & Tsuchiya ’99
• When f is concave or convex and assuming primal nondeg, every cluster pt of {xt} is a stationary pt of (P). Gonzaga & Carlos ’90, Monteiro & Wang ’98
What about general f? And convergence rate?
Replicator Dynamics in Evolutionary Biology
xtj = fraction of jth genotype in pop. at time t, j = 1, ..., n Initially, x0j > 0 and P
j x0j = 1. xt = (xtj)nj=1 evolves according to
xt+1 = XtQxt
(xt)>Qxt, t = 0, 1, ...
with adaptation coefficients Qii > 0 and Qij ≥ 0 for all i, j ..., Haldane ’32, ..., Moran ’62, ...
• {xt} converges and its limit x¯ satisfies x¯j((Q¯x)j − ¯λ) = 0 for all j, with λ = max¯ j(Q¯x)j; convergence rate is linear iff (Q¯x)j < ¯λ whenever x¯j = 0;
otherwise k¯x − xtk = O(1/√
t). Lyubich et al. ’80
Connecting RD with AS
Rewrite RD iteration as
xt+1 − xt = Xt[gt − e (xt)>gt] (xt)>gt
with gt = Qxt, e = (1, ..., 1)>.
Thus
xt+1 = xt + αtdt with dt = Xtr(xt) where αt = 1/(xt)>gt and
r(x) = ∇f (x) − e x>∇f (x)
Solve for dt in AS iteration when A = e> yields dt ∝ (Xt)2
gt − e e>(Xt)2gt kxtk22
with gt = ∇f (xt), e = (1, ..., 1)>.
Thus
xt+1 = xt + αtdt with dt = (Xt)2r(xt) where αt > 0 and
r(x) = ∇f (x) − e e>X2∇f (x) kxk2
1st-Order Interior-Point Methods for (P)
Given Ax0 = b, x0 > 0, generate for t = 0, 1, ...
xt+1 = xt + αtdt with dt = (Xt)2γrγ(xt) where γ > 0,
rγ(x) = ∇f (x) − A>(AX2γA>)−1AX2γ∇f (x) and αt is the largest α ∈ {αt0(β)k}k=0,1,... satisfying
f (xt + αdt) ≥ f (xt) + σα(gt)>dt, Armijo-type inexact LS
where 0 < β, σ < 1, gt = ∇f (xt), and
0 < αt0 < ( ∞ if dt ≥ 0;
−1
minj dtj/xtj else.
γ = 1/2 =⇒ RD
γ = 1 =⇒ AS
This method is simple, suited for large problems (n ≥ 10000).
Convergence of {xt}? Convergence rate of {f (xt)}, {xt}?
Choosing γ?
Convergence Results
: Assume {x feasible | f (x) ≥ f (x0)} is bounded. Then xt > 0, {f (xt)} ↑, and {xt}, {dt} are bounded.(a) Assume primal nondeg, f is concave or convex, and we choose
inft αt0 > 0, supt αt0 < ∞. Then every cluster pt of {xt} is a stationary pt of (P).
(b) Assume f is quadratic and we choose inft αt0 > 0. Then υ − f (xt) = O
1/t1/ max{γ,2γ−1} , with υ = lim
t→∞f (xt).
Assume in addition we choose γ < 1. Then {xt} converges and its limit x¯ satisfies
k¯x − xtk = O 1/t
1−γ 2γ
.
Under primal nondeg, x¯ is a stationary pt of (P). Moreover, if γ ≤ 12 and
¯
x − rγ(¯x) > 0, then {f (xt)} converges Q-linearly and {k¯x − xtk} converges R-linearly. If instead supt αt0 < ∞, γ ≥ 12 and x − r¯ γ(¯x) 6> 0, then {k¯x − xtk}
cannot converge linearly.
• Thus, γ < 1 seems preferrable.
Numerical Tests
:• Implement 1st-order IP method in Matlab. For Armijo LS, use β = .5, σ = .1, αt0 = min
0.95αtfeas, max
10−5, αt−1 β2
, αtfeas = −1
minj(dtj/xtj), with α−1 = ∞.
• Numerical tests on
maxx f (x) s.t. e>x = 1, x ≥ 0
with −f from Mor ´e-Garbow-Hillstrom set (least square), and n = 1000.
• Initial x0 = e/n. Terminate when resid := k min{xt, −rγ(xt)}k ≤ tol.
f (x) γ #iter #f-eval cpu (sec) obj resid
BAL .8 †8 84 0.02 9.98998·108 1.4·10−6
1 7 8 0.03 9.98998·108 3.6·10−8
1.2 8 9 0.02 9.98998·108 3.7·10−7
BT .8 9146 27409 20.68 999.031 9.9·10−4
1 17559 52665 23.31 999.055 9.9·10−4
1.2 527757 1.58·106 1192.55 999.081 9.9·10−4
DBV .8 99 299 0.42 4.9·10−8 9.8·10−5
1 146 440 0.52 4.5·10−8 9.8·10−5
.2 240 722 1.02 4.0·10−8 9.9·10−5
EPS .8 424 1269 1.89 1.3·10−6 9.9·10−4
1 987 2958 3.52 3.9·10−6 9.9·10−4
1.2 1963 5887 8.76 8.0·10−6 9.4·10−4
ER .8 5 6 0.01 498.002 5.2·10−7
1 7 8 0.03 498.002 2.6·10−7
1.2 10 11 0.03 498.002 3.7·10−7
LR1 .8 20 21 0.04 3.32834·108 9.5·10−7
1 19 20 0.03 3.32839·108 3.5·10−7
1.2 20 21 0.03 3.33481·108 9.9·10−7
VD .8 22 74 0.04 6.22504·1022 2.4·10−9
1 19 46 0.04 6.22504·1022 2.6·10−8
1.2 18 50 0.05 6.22504·1022 5.7·10−9
Table 1: Performance of 1st-order IP method.
† Quit due to Armijo ascent condition not met when α < 10−20
Conclusions & Open questions
1. Theory and practice suggest γ < 1 is preferrable to γ ≥ 1.
2. The method and its analysis readily extends to 0 ≤ x ≤ u by replacing Xt with min{Xt, U − Xt}.
3. Convergence of {xt} when γ ≥ 1 or when f is not quadratic?
4. Linear convergence of {f (xt)} and {xt} when γ > 12 or when f is not quadratic?
5. Convergence of {xt} for 2nd-order AS method?
Convergence Proof Ideas
(a) f (xt+1) − f (xt) ≥ σαtkηtk2 with ηt = (Xt)γr(xt).
(b) If f is quadratic, use linearity of KT condition to show
∆t := υ − f (xt) = O
kηtkmin{1+γ2 ,1γ} .
(c) If f is quadratic and γ < 1, then
kxt+1 − xtk = αtk(Xt)γηtk = O(kηtk) = O kηtk2 kηtk
= O ∆t − ∆t+1 (∆t)1+γ2
!
= O
Z ∆t
∆t+1
t−1+γ2 dt
!
= O
(∆t)1−γ2 − (∆t+1)1−γ2
(d) Under primal nondegeneracy, rγ is continuous on the feasible set.