A
PATH-FOLLOWING ALGORITHM FORLINEAR
PROGRAMMING USING
QUADRATIC AND
LOGARITHMICPENALTY
FUNCTIONS*PAUL TSENGt
Abstract. Motivated bya recent work ofSetiono, apath-following algorithm for linear pro- gramming using both logarithmic and quadratic penalty functions isproposed. Inthe algorithm, alogarithmic andaquadraticpenalty isplaced on, respectively, the nonnegativityconstraintsand anarbitrary subset of the equality constraints; Newton’smethod isapplied to solve the penalized problem, andaftereachNewton stepthe penalty parametersaredecreased. Thisalgorithm main- tains neitherprimalnordualfeasibilityand does not requireaPhaseI. Itis shownthatif the initial iterate ischosenappropriatelyandthepenaltyparametersaredecreased tozeroinaparticularway, then thealgorithmis linearly convergent. Numericalresults arealsopresentedsuggesting that the algorithmmay be competitive withinteriorpointalgorithmsinpractice, requiringtypicallybetween 30-45iterationstoaccuratelysolve each Netlibproblemtested.
Keywords, linear program,path-following,Newtonstep, penaltyfunction AMS subject classifications. 49, 90
1. Introduction. Since the pioneeringworkofKarmarkar
[Kar84],
muchinter- est has focused on solving linear programs using interior point algorithms. These interior point algorithms may be classified roughly aseither(i)
projective-scaling(or
potential
reduction), (ii)
affine-scaling, or(iii)
path-following.We
will notattemptto reviewtheliterature on thissubject, which isvast(see
for example[Meg89], [Tod89]
for
surveys).
Our interest is in algorithms of the path-following type, ofthe sort discussed in[GaZS1].
These interior point algorithms typically penalize the nonneg- ativity constraints by a logarithmic function anduseNewton’s
method tosolve the penalized problem, with the penalty parameters decreased after each Newton step(see,
for example,[Gon89], [GMSTW86], [KMY89], [MoA87], [Ren88], [Tse89]).
One disadvantage of interior point algorithms is the need for an initial interior feasible solution.
A
commontechnique for handlingthis is to addanartificalcolumn(see [AKRV89], [BDDW89], [GMSTW86], [Lus90], [MMS89], [MSSPB89], [MoM87]),
but this itselfhas disadvantages. For example, thecost ofthe artificial column must be estimated, andsome type ofrank-1 updatingisneededto solve eachleast square problemwhichcansignificantlyincreasethe solutiontimeanddegrade thenumerical accuracyofthe solutions.
Recently,Setiono
[Set89]
proposedaninterestingalgorithm thatcombinesfeatures ofapath-following algorithmwiththoseof the method of multipliers[HaB70], [Hes69], [Pow69] (also
see[Roc76], [Ber82]).
Thisalgorithmdoesnot requireafeasible solution to start and is comparable to interior point algorithms both in terms of work per iteration and, according to the numerical results reported in[Set89],
interms ofthe total numberofiterations. Todescribe the basic idea in Setiono’salgorithm, considerReceivedbythe editorsApril 16, 1990; acceptedfor publication(inrevisedform)June11,1992.
This researchwassupported bythe U.S.Army ResearchOffice, contract DAAL03-86oK-0171,and wasconducted whiletheauthorwaswiththeLaboratoryfor Information and DecisionSystemsand theCenterforIntelligent ControlSystems,MassachusettsInstitute ofTechnology, Cambridge.
Department of Mathematics, GN-50, University of Washington, Seattle, Washington 98195 (t sengmath, washington, edu).
1578
alinearprogramin the standarddual form
minimize
--bTp
subjectto t
+ ATp
c, t>_ O,
where
A
is some matrix and b and c are vectors of appropriate dimension. Let us attach aLagrange
multiplier vector x to the constraints t+ ATp
c and applythemethod ofmultipliers to the above linear program. Thisproduces thefollowing iterations
1
(t
k+ ATpk C),
k- 1, 2,x+
x+ _J
where
(e k}
is a sequence of monotonically decreasing positive scalars and(tk,p k)
is some(inexact)
solution of the augmented Lagrangian subproblem(1.1)
minimize--bTp + (xk)T(t + ATp- c)- -k[It
1+ ATp- c[I
2 subjectto t>_
0.(An
advantage ofthe above multiplier iterations is that they do not need a feasible solution tostart.) A
key issue associated with the above multiplier iterations con- cerns the efficient generation ofan inexact solution(tk,p k)
ofthe convex quadratic program(1.1)
for each k.(Note
that as ek decreases, theobjective function of(1.1)
becomes progressively more
ill-conditioned.)
Setiono’s algorithm may be viewed as the method of multipliers in which(t,p k)
is generated according to the following scheme, reminiscent ofthepath-following idea: Add a logarithmic penalty function_/k -jm__ ln(tj)
totheobjective of(1.1),
wherek
is some positive scalar monoton-ically decreasingwith k, andapply asingle Newton step, starting from
(tk-l,p k-l),
to the resulting problem.
(If
thetk thusobtained lies outsidethepositiveorthant, itismoved back towards tk-1 until it becomes positive.
1)
Inthis paper, inspired bythe precedingworkofSetiono, we study an algorithm that also addsto the objective aquadraticpenaltyontheequalityconstraints and a logarithmic penaltyon thenonnegativity constraints; and then solves the penalized problem using Newton’s method, with the penalty parameters decreased after each
Newton
step. Unlike Setiono’s algorithm, our algorithmdoes not use the multiplier vector x(so
it may be viewed as a pure penaltymethod)
and allows any subset of theequality constraints to bepenalized.We
show that ifthe problem isprimalnon- degenerate and the iterates start near the optimal solution of the initial penalized problem, then the penalty parameters can be decreased at the rate ofa geometric progression and the iterates converge linearly. To the best of our knowledge, this is the first(global)
linear convergence result for an noninterior point path-following algorithm. We also present numerical results indicating that thealgorithm may po- tentially be competitive with interior point algorithms in practice.We
remark that enalty methods that use either the quadratic or the logarithmic penalty function have been well studied(see,
for example,[Ber82], [FiM68], [Fri57], [JIO78], [Man84],
Moreprecisely, k isgivenbythe formula
k k-
+
.98AkAtk,
whereAtk is the Newton direction(projectedonto the space oft) and Ak is thelargest AE (0,1]
for which k-
_
/ktk isnonnegative. (Thechoice of.98 isarbitrary--anynumberbetween 0 and 1woulddo.)[WBD88]),
but very little is known about penalty methods that use both types of penaltyfunctions(called
mixed interior point-exterior pointalgorithms in[FiM68]).
Upon
completion of this paper, the author learned from Professor O. L. Man- gasarian that the algorithm discussed here is essentially the IDLN(Interior
Dual LeastNorm)
algorithm described in the recent Ph.D. thesis ofSetiono[Set90] (also
see thereport
[Set91]),
with minordifferences inthechoiceofthepenalty parameters.The samereferenceincludes an (asymptotic) linearrateconvergence analysis andex- tensivenumericalresults showing that theIDLN algorithm outperforms the popular simplex code
MINOS
by a factor of 2 or more on the 63 Netlib problems tested.In
short,Setiono’sindependent workprovides furtherevidenceofthepracticalefficiency of the mixed interior point-exterior point solution approach. Finally, we note that while this paperwas under review, other noninterior point methods relating to that studiedhere have been proposed. One method, brought to our attention by one of thereferees,isacertainaugmented Lagrangian algorithm forstochastic programming(see [MuR90]);
another method is a primal-dual exterior point algorithm for linear programming(see [KMM91]). However,
neitherofthesemethods has beenshown to possess the nicetheoretical/numerical
properties enjoyed by the algorithm studied here. Forexample, no convergenceresult is given for the methodin[MuR90]
andnonumericalorrate of convergence result isgiven forthe method in
[KMM91].
This paper proceeds as follows: In
2
we describethe basic algorithm; in3
weanalyzeits convergence; andin
4
werecountour numerical experience withit.In 5
wediscuss extensionsofthis work.
Inour notation, every vector isacolumnvector insome k-dimensionalrealspace
k,
andsuperscript T denotes transpose. For any vector x, wedenote by xj the jth coordinate ofx, by Diag(x) the diagonalmatrixwhose jth diagonal entry is xj, and byIlXlll, Ilxll, Ilxll
theLl-norm, the L2-norm, and theL-norm
ofx, respectively.For any matrix
A,
we denote byAj
the jth column ofA.
We also denoteby e the vector of l’s(whose
dimension will be clear fromthecontext)
and denote byln(.)
thenatural logarithmfunction.
2. Algorithmdescription. Let
A
beann mmatrix,Bbean mmatrix, b bean n-vector, cbean m-vector, and d bean/-vector. Consider thefollowinglinear program associated withA, B,
b, c, and d:minimize
--bTp
subject to t
+ ATp--
c, Bt d, t>_
0,which wecallthe dualproblem. Thedualproblemmay be viewedas astandardlinear program int, inwhichwearbitrarilypartition theequalityconstraintsintotwo subsets andexpress onesubset inthe generator form t
+ ATp
c.(To
seethis, consider the problemof minimizingaTt
subject to t ER
NS,
t_>
0, wherea is anm-vector andR
andSareaffinesetsinm. We
canalwaysexpressR {
t tc-ATp
forsomep}
and S
{
t Bt d}
for some matricesA
andB and somevectors cand d. Using tc-ATp
to eliminate tfrom the objectivefunction,theproblemis now intheform(7:)).)
Theconstraintst+ ATp
ccan bethought ofasthecomplicatingconstraints which, ifremoved,wouldmake(7:))
mucheasiertosolve. Theadvantages of splitting the constraints in this manner willbe explainedattheendof2.
Finally,wenote thatthe form in which we write the equality constraints is unimportant, and the above form is adopted fornotational convenienceonly.
By
attachingLagrange
multipliervectors x and y to theconstraints c-ATp
tandBt d, respectively, weobtainthe following dual of
(7))"
(:P)
subjectminimizetoCAx
TX--
b,dTxy+ BTy >_ O,
whichwecall the primalproblem.
Wemake thefollowing blanketassumptions, whicharestandardforinterior point algorithms, regarding
(7))
and(P).
Assumption
A.
(a) { (x, y) Ax
b,x+ BTy >
0}
isnonempty.(b) { (t, p) Bt
d,t> O,
t+ ATp
c}
isnonempty.(c) A
has fullrowrank.It iswell known that, under parts
(a)
and(b)
of AssumptionA,
both(7))
and(P)
have nonempty bounded optimal solutionsets.Consider the dual problem
(7)).
Suppose that we place a quadratic penalty on the constraints t+ ATp
c with a penalty parameter1/e (e > 0)
and we place alogarithmic penalty on the constraints t
_>
0 with a penalty parameter 7>
0. Thisgives thefollowing approximation to
(7))"
minimize
f,(t,p)
subject to Bt d, t>0,
where
f, (0, c) " n
isthepenalized objective function givenbym
(2.1) f,v(t, p) -llc
tATpll
2/E
j=lln(tj) ebTp Vt > O,
/p.Thepenalizedproblem
(7),v)
has the advantage thatitsobjectivefunctionf,v
is twicedifferentiableand the Hessian
V2f,v
ispositive definiteeverywhere(via
AssumptionA(c)).
Since the feasible set of(7),v)
is nonempty(by
AssumptionA(b))
and itsintersectionwith any level set of
f,v
is bounded(by
AssumptionA(a)),
it is readily seen that(7),v)
has an optimal solution which, by the strict convexity off,v,
isunique.
Note 1.
We
can use penaltyfunctions other thanthe quadraticandthelogarith- mic. Forexample, we can use acubic inplace ofthequadratic and tjln(ty)
inplaceof-ln(tj).
The quadratic and the logarithmic function, however, have nice properties(such
asthe second derivative of thelogarithmic function equalsminusthe squareof itsfirstderivative)
whichmakeglobal convergence analysispossible.It is well known
(see [Roc70])
that(t,p)
isthe optimal solution of(7),)
ifandonlyifit satisfies, togetherwith some uE
z,
theKuhn-Tuckerconditions(2.2)
t>
0, Bt d,V f,v(t, p) + BTu
0 O.Straightforward calculationusing
(2.1)
finds that(2.3) Vf,(t,p) (
t+ ATp-c-e7(T)-le
)
A(t
/ATp- c)
b(2.4) V2f,(t,p)= ( I + e(T)-2 A AA AT)
Twhere
T Diag(t).
The above formulas will be used extensively in the subsequent analysis. Note thatV 2f,
isill-conditioned at theboundaryofits domain.It is not difficult to show that, as e and tend to zero, the optimalsolution of
(7),)
approaches theoptimal solution setof(/)) (see
Lemma3.1).
Thissuggests the following algorithm for solving(:D,). At
eachiteration,wearegivene, anda(t, p)
whichis anapproximate solutionof
(T),);
weapplyaNewtonstepto(7:),)
at(t, p)
to generatea new
(t, p)
andthenwedecreasee and 7- In otherwords, weconsider a sequence ofpenalized problems(7)e,),
k 1, 2, withe
0 and/k
0, andweusea
Newton
stepto followtheoptimalsolution ofone penalizedproblemto that of the next.We
now formallystate thisalgorithm, which wecall theQLPPF (short
forQuadratic-LogarithmicPenalty
Path-Following)
algorithm.QLPPF
AlgorithmIteration0. Choose
e >
0and, >
0. Choose(t , p) (0, (:X:))
m Xn
withBt d.Iteration k. Given
(tk,p k)
E(O,(:x:))
m X}n with Btk --d, compute(/ktk,/kpk,uk)
to be asolution of(2.5) V2 fek,k(tk, pk) l lap Atk
k+ Vfk, (t , pk) + BTuk
0I
O,BAt O,
andset
(2.6) tk+l
t+ At k, pk+l__ pk + Apk,
(2.7) +1= a , e+= ck ,
where ck is some scalar in
(0,1).
Note 2.
As
we noted earlier, theQLPPF
Algorithm is closely linked to the algorithms proposed by Setiono. Inparticular, it can be seen from(2.3), (2.4)
that,in thespecialcase where B is thezero matrix, the direction finding problem
(2.5)
isidentical to thatin theIDLN algorithm ofSetiono
(see [Set91, Eq. (16)])
anddiffersfromthat inthe
IDPP
algorithm ofSetiono(see [Set89, Eq. (6)])
by onlyanorder ektermon the right-hand side
(which
tendsto zero asek
tendstozero).
Note
3.A
unique feature of theQLPPF
Algorithm lies in its handling of the two sets of constraints t+ ATp
c and Bt d, whereby quadratic penalties are placed only on thefirst set while the second set is maintained to be satisfied at all iterations. This feature has the advantage that it enables theQLPPF
Algorithm to provide a unified framework for interiorpoint methods and exterior point methods.As
anexample, ifweputallthe equalityconstraintsintoBt d(and
correspondingly setA
I and c-0),
then it can be seen that theQLPPF
Algorithmwith7k
1for all k reduces to the well-known primal path-following algorithm for maximizing
bTt
subject to the constraints Bt d, t>_
0(see [Gon89], [Wse89], [Ye87]).
If weput all the equality constraints into t
+ ATp
c(and
correspondingly set B 0 and d0),
then, as was noted above, theQLPPF
Algorithmreduces to the IDLN algorithm for maximizingbTp
subject to the constraints t+ ATp
c, t>_
O. Forother choices of constraint splitting, we obtain algorithms somewhere in between.
We
can then envision choosing a constraint splitting so the correspondingQLPPF
Algorithm is in some sense most efficient(e.g.,
fastestconvergence)
for the given problem. Alternatively,wemay choosea constraintsplittingsoto ensurethatcertain"critical" equality constraints are satisfied exactlyat all iterations
(by
puttingthese constraints intoBt-d).
3. Global convergence. In this section, we show that if
(7))
is in some senseprimalnondegenerate andif
(ti, pi)
is "close"tothe optimalsolutionof(7)1,1)
intheQLPPF
Algorithm, then, by decreasing thek’s
at an appropriate rate, the iterates((tk, pk)}
generated by theQLPPF
Algorithm approach the optimal solution set of(7)) (see
Theorem3.4).
Becausethe HessianV2f,
isill-conditioned at theboundary ofits domain, theproof ofthisresult isquite involved and reliescriticallyon finding asuitableLyapunov (i.e., merit)
function to monitor theprogress ofthealgorithm.Forany e
>
0and’y>
0, letp, (0, oo)
m xn
xl
be thefunction givenby(3.1)
p,(t,p, u) max{ -llT(c
1 tATp BTu) + e’yell
-L-IIA(c-
tATp) + ebll } Vt >
O,Vp,
where T
Diag(t).
From(2.2)
and(2.3)
we see that(t,p)
is a solution of(7),)
ifand only if
(t,p)
satisfies Bt d,t>
0 andp,(t,p, u)
0 for some u. Hencep,
actsas a
Lyapunov
function which measures howfar(t, p)
isfrom solving(7),).
Thisnotion ismade precise inthefollowinglemma.
For anye
>
0, lety={ (t,p) A(c-
t-ATp)
-eb, Bt d, t>
O}.
The following lemma shows that any
(t,p)
EJ;
that satisfiesp,(t,p, u) <_ ,
forsome u, iswithinan orderofe
+ (1 + )’y
of beingan optimalsolution of(7)).
LEMMA 3.1. Fix any
> O,
"y>
0 and(0, 1].
Forany(t, p)
thatsatisfies p,(t,p, u) <_ for
someu, the vector(x, y)
given by(3.3)
x(t + ATp- c)/e,
yu/e
is
feasible for (P)
and, together with(t, p), satisfies
cTx
Jr-dTy <_ v* + vi*
e+ m(1 + )’y, bTp >_ v* --m(1 +/)’y,
It + ATp- cll
2<_ 2(* (e)2 + m(1 + )e’y),
where v* denotes the optimal cost
of (:P)
and* min{ IIx*ll2/21(x*,y *)
is ano ti. a
o utio(7’) }.
Proof.
Since(t, p)
is inJ;,
itfollowsfrom the definitionofJ; (see (3.2))
and(3.3)
that
Ax
b. Sincep,(t,p,u) <_ ,
we have from the definition ofp, (see (3.1))
and
(3.3)
thatIT(-x- BTy)+ "Yell <_ "y,
where T Diag(t). Thus(3.4) ’y(1 )e _< T(x + BTy) <_ ’y(1 + )e.
Since t
>
0and/ _<
1,the firstinequalityin(3.4)
yieldsx+BTy >_ ’y(1-)T-le _>
0.Hence, (x, y)
isfeasiblefor(7)).
Let
1
le
d(T, 7r) bTu + ATr 1 V(, ),
p({, ) ll{ll
2+ cT{ + dT V(, b).
Straightforwardalgebra using
(3.3)
yieldspc(x, y) de(t, p) +
tT(x + BTy).
Also,
it followsfromstrongduality for aconvexquadraticprogramand its dual thatmin Pe(, ) A b, + BT >_
0} max{ de(T, r) BT
d,>_
0}.
(,)- (,)
Thus,by letting
(x*, y*)
be any optimal solution of(7)
withIix*]12/2 *,
weobtain from the above tworelationsandcTx
4-dTy
v* thatl[Xll
24-cTx
4-dTy pc(x, y)
de(t, p) +
tT(x + BTy)
<_ pe(x*,y*) + tT(x + BTy) erl*
4-v* 4-tT(x
4-BTy),
Similarly, by letting
(t*, p*)
beany optimal solutionof(T)),
weobtainfromthesametwo relations andthe facts
bTp
v* andt* 4-ATp
cthatbTp- llt
1+ ATp-
cde(t,p)
o(x,
t+
>_ de(if,p*)- tT(x + BTy)
v* tT
(x
4-BTy).
Since 0
<_ tT(x + BTy) <_ m(1 + ) (cf. (3.4)),
theabovetwo relationsyield cTx 4-dT
y<_
v*4-erl 4-m(1
4-bTp >_
v*--m(1 +
Finally, since
(x, y)
is feasiblefor(7))
so thatcTx
4-dTy >_
v*, the first ofthese two relationsalsoyields12
V* V*llxl
//m(l/
Canceling v* from bothsides andusing
(3.3)
completes theproof, r3Since weare dealing with linearprograms, Lemma3.1 implies that, ase $0 and 7 $ 0, the
(x, y)
given by(3.3)
approaches the optimal solution set of(7))
and(t,p)
approaches the optimal solution set of(7)). In
fact, it suffices to decrease e and as far as2L,
where is somescalar constant andL isthesize oftheproblem encoding in binary(defined
as, say, in[Kar84]),
at which timean optimalsolution of(P)
andof
(T))
can be recovered by using the techniques described in, for example,[Kar84]
and
[PaS82].
Foreach
A >
0, let0" (0, oc)
m- [0, oo)
bethe function given by whereOx(t) (lie + ED/eATFAD1/eE]I + IIED/eATFll) Vt > O,
D
(I + AT-2) -1,
E
I- DI/2BT[BDBT]-IBD 1/2,
F
[A(I- D1/2ED/2)AT] -1,
andT
Diag(t). (F
iswelldefined becauseIEII _<
1(E
is aprojectionmatrix)
andIIDII <
1, so that I-D1/2ED
1/2 is positive definite.We
also use the assumption thatA
has full rowrank.)
The quantity0h(t)
estimates thenorm squared ofcertain projection-like operator dependingonA
andt, and it willbe used extensively in our analysis. In general,0 (t)
is rather cumbersome to evaluate, but, as weshall see, it sufficesfor our analysisto bound0h (t)
fromabove(see
Lemma3.3(b)).
3.1. Analyzinga Newtonstep. Inthis subsectionweproveakeylemmathat statesthatif
(,/3)
is "close" tothe optimal solutionof(79,#),
then(t, p)
generated by applyingoneNewtonstepto(79)
at(,/3)
isclosetotheoptimal solution of(79,)
forsome e
<
and some- < .
The notion of "closeness" ismeasuredby theLyapunov
function
p,
andtheproof ofthelemma isbasedon theideas usedin[Tse89, 2].
LEMMA 3.2. Forany
> O,
any>
0 andany(, , z)
E(0, cx)
mxn l
withB-
d, let(t,
p,u)
be given by(3.9)
t-+ At,
(3.10)
pp + Ap,
whereu and
(At, Ap)
togethersolve thefollowing systemof
linear equations()
(3.11) + +
0 0,BAt
O.Suppose that
p,#(-, p, fi) < Z for some/ < min{1, 1/0(t-)}.
Then thefollowing hold:(a) (t,p) e Y.
(b)
For anya satisfying/0#(t--)(fl)
2+
1} <
a<
1(3.12)
maxV I+ZX/Ilbll
we have
Proof. pa,a#(t,p,
Letu) <_ .
(3.13)
(3.14)
gA(c- - ATp) +
b,whereT-
Diag(t-).
Then, by(3.1), (3.15)
By
using(2.3), (2.4)
and(3.13), (3.14),
wewrite(3.11)
equivalently asD-1At + ATAp + BT(u z) -1,
AAt + AA TAp ,
BAt
0,where for convenience welet D
(I + -2)-1.
Solving forAt
gives1/2 T 1/2
At D1/2(E --
EDA FAD E)D1/2-1 D1/2ED1/2ATF,
where welet E
I- D/2BT[BDBT]-IBD/2
and F[A(I- D/2ED/2)AT] -.
(F
iswell definedbythesamereasoning thatthematrix givenby(3.8)
iswelldefined.)
Then, we can bound-At
asfollows:1/2 T 1/
[[T-1At[I _< [[-ID1/2II2[IE-I--ED A FAD 2E[II[II-+-I[-’-ID1/2II[IED1/2ATF[[[[,5[[.
Since
-2D
isdiagonal and each ofits diagonal entry isless than1/(g),
weobtain that1[-2Dl[ < 1/(g)and
hence[[-At[I _< []E + ED/2ATFAD/2E[I]III/()+ [IED/2ATFI[[Ill/x/
(3.16) _<
where the last inequality follows from
(3.15)
and the definition ofT, D, E, F,
and0(t-) (see (3.5)-(3.8)).
Now,
by using(2.3)
and(2.4),
we canwrite(3.11)
equivalentlyasBAt
0 and(3.17) -lAt
(3.18)
0A(c- - At ATp- AT Ap) +
b,sofrom
(3.9)
and(3.10)we
obtainT(c-
tATp- BTu) + e
(c- -- At- AT- ATAp- BTu)
+ e + AT(c- - At- ATp- ATAp- BTu)
-lAt + AT(c- - At AT- A TAp BTu)
-IAT(e + ’(c- 9
-2ATAt, - At- ATp ATAp BTu))
where T
Diag(t), AT
Diag(At), and the third andthe last equality follow from(3.17).
This implies(3.19) _< gO#(t-)(/3) 2,
where the thirdinequality follows from
(3.16).
(a) We
have from(3.16)
and the hypothesis/ < 1/0(t-)
thatII-Atll _<
0#(t-)(/) </ <
1, so(3.9)
and>
0 yields t+ At >
0.Also, B-=
dtogetherwith
BAt
0(see (3.11))
and(3.9)
yields Bt d, and(3.18)
together with(3.9), (3.10)
yields 0A(c-
tATp) +
b. Hence(t, p)
(b)
Fix any a satisfying(3.12)
and let- a,
ca. (Note
that because0(t-)/ <
1, the left-hand quantity in(3.12)
is strictly less than 1, so such anexists.)
Letr
T(c-
tATp BTu)
-t-Thenthetriangle inequality and
(3.19)
implyIlrll/() _< IIT(c-
t-ATp- BTu) + ’ell/(e) + (1 (a)2)/(a)
2<_ 0(t--)03)2/()
2+ (1/(c)
2-1)V/-,
whichtogetherwiththefact
(see (3.12)) (0(t-)(/3)
2+ v/)/(/ + x/) _< (c)2,
yields(3.o) I[,’11/() <_ Z.
Let s
A(c-
tATp)
4-eb.By
using(3.9), (3.10),
and(3.1S),
wehaves
A(c- - At ATp- A Tlkp)
4- ab(a 1)b,
whichtogetherwiththefact
(see (3.12))
1/(1
/<
1,yields
Ilsll/x/g- -- (l/a- 1)llbl] v"-Z/ff’
_</3.This together with
(3.20)
and the definition ofp,,(t,
p,u) (see (3.1))
proves ourclaim, r]
3.2. Bounds.
Lemma
3.2 shows that ifthe rate c at which the penalty pa- rameters e and-
are decreased is not too small(see (3.12)),
then a single Newtonstepsuffices tokeepthe current iterateclosetothe optimalsolution ofthepenalized problem
(T),,).
Thus, in orderto establish the(linear)
convergence oftheQLPPF
Algorithm, it suffices to bound c away from 1 which, according to(3.12),
amountsto bounding
O.(t)
bysome quantity independent of t and e, 7- Itis not difficult to seethatsuchabound doesnotexistfor arbitrary t. Fortunately, weneedto consider onlythosetthat, togetherwith some p,areclose tothe optimalsolutionof(77,,),
inwhichcase, as weshowbelow,suchabound doesexist
(provided
thatacertainprimal nondegeneracyassumptionalsoholds).
Theproof ofthis issomewhat intricate: Foreand
-
large,weargueby showingthatt cannotbetoolarge, i.e., of the ordere+- +
1(see
Lemma3.3(a))
and,for and -ysmall, weargueby showingthat,under the primal nondegeneracy assumption, the columns ofA
corresponding to those components of t which are small(i.e.,
ofthe order3’)
areofrank n.LEMMA
3.3.(a)
For all e>
O, all" > O,
and all(t,p)
E,
satisfyingp,(t,p, u) <_
1for
some u, we haveIltll <_ Mx(e +
7+ 1),
where
Mx >
0 is some scalar depending onA, B,
b, c, andd only.(b)
Suppose that(7>)
isprimalnondegeneratein the sensethat,for
everyoptimalsolution
(x*, y*) of (P),
those columnsof A
corresponding to thepositive componentsof
x*+ BTy
have rank n. Then,for
all e> O, all7 > O,
all )>_ 7/2,
and all(t,p)
E satisfyingpe,(t,p, u) <_
1for
some u, we have0 (t) <
where we
define
(w) M2(vr + l/yr.)
4V
w>
0,with
M2 >_
1 some scalar depending onA, B,
b, c, andd only.Proof. (a)
The proofis by contradiction.Suppose
the contrary, so that there exists asequence{(tk, pk,
uk, ek, .),k)}
such that ek>
0 and.),k >
0 and(3.22) (t k, pk) e
k, pk,(t k, pk,
uk) <_
1 Vk,(3.23) IIt ll/( +
"r ----,oo, oo.By
passing into a subsequence if necessary we will assume that(t
convergesto some limit point, say
(t,p oo) (so (t , poo) 0). We
have from(3.22)
(also
using(3.1)
and(3.2))
thatBtk d, tk >0 Vk, and fromLemma3.1 that
bTp
k>_
v*2m7 k, ]It
k+ ATp
kcll <_ V/2(r/* (e k)2 + 2rnekk)
Vk.Upon
dividing bothsidesof the above fourrelationsbyI[(tk, pk)[
and letting k --,c, we obtainfromBtoo (3.23)
andO, (tk,pk)/[l(tk,pk)[ too >_ O, bTp
oo>_ -
O,(too,poo) [[too+ATp[l=0.
thatThen, any optimal solution to
(73)
would remain optimal when moved along the direction(too, poo),
contradicting theboundedness ofthe optimal solution set of(73) (via
AssumptionA(a)).
(b)
Fixany>
0,- >
0,A >_ e-/2,
andany(t,p)
EY
satisfyingp,(t,p, u) <_
1for some u. Let T--
Diag(t)
andletD, E,
F be given by, respectively,(3.6), (3.7),
and
(3.8).
Then, F-1A(I- D1/2ED1/2)AT, I]E[[ _<
1 andD (I + AT-2) -1.
From thedefinitionof
0h (t) (see (3.5))
wethenobtain(3.24)
O(t) (lie + ED1/2ATFAD1/2E[I + IIED1/UATFII)
_< ([[El[
/[]E[[2[[D/[12[[A[I[[F[[
/I[E[[[ID/2I[[IAll[[F[[)2
wherethestrict inequality follows from thefacts
I[D]l <
1,IIEl[
1. Nowwe bound[[F][.
WehavezT(F-I)z zTA(I- DI/2ED1/2)AT
z>_ zTA(I D)A Tz
m
+
j=l
(3.25) >_ A E(Az)2/(tj)
j=l
>_ A E(Ayz)2/l[t[le
j=l
where the first inequality follows from
[IE[[ _<
1 and a>
0 denotes the smallest eigenvalue ofAA T. (a >
0 becauseA
hasfull rowrank.) Hence, A >_ e7/2
yields(3.26) IIFll < 211tl12/(r).
For e and q, near zero, wegive below a different bound on
IFII- By
the primalnondegeneracy assumption, there exists a constant 5
>
0 depending onA, B,
b, c, anddonly suchthat if(x, y)
isanyfeasible solution of(P)
withcTx + dTy <_
v*+ ,
thenthecolumns
{ Aj J e {1,..., m}
withxj+ Byy _>
5}
have rank n. Considerthe
(t, p)
case in whichE:P
andp,(t, *e
p,+ u) 2m- _<
1,we< 5,
have fromwhere*
Lemmais defined as3.1 thatthe columnsin Lemma 3.1.{ A
SinceJ
E{1,..., m}
with(t + Ayp + Bfu- c)/e _>
5}
have rank n. Since(t,p) e :P
andp,(t,p, u) _<
1,we haveT(c- t-ATp- BTu) _> --2ee (cf. (3.1)
and(3.2))
so that(tj + Ayp + Bfu- c)/e >_
implies tj<_ 2,/5. Hence,
weobtain from(3.25)
thatm
z(-l) > ,(y)/(t)
j=l
>_ () (Yz)/(e)
> (5)’llzll/(e) w,
where the last inequality follows from the fact thatthose
Aj
forwhichtj_ 2"),/5
haverankn, and
a’
issome positivescalar dependingonA
only. SinceA >_ e7/2,
we then haveI111 (2)/((5) ’) <_ 8/((5) ’) 8/((5)’),
where for convenience we let w
/3’. Now,
consider the remaining ce where* +
2my 6. Then,6/(2m),
sopart(a)
and(3.26)
together yieldIIFII 2(M1)(
e+
7+ 1)
22(M1)2
(
1<2(M1) ( +
1+ 2m)
Combiningtheabove two cesandweconcludeha
IFII < { K/w K(+1/)2
ifotherwise,*e + 2m7 < 5;
for somepositivescalarK dependingon
A, B,
b,c, anddonly. Combining the above bound with(3.24)
and we conclude thatO(t) M( + 1/)
4 forsome scalarM2
1 dependingonA, B,
b, c, and donly.3.3. Main convergence result.
By
combiningLemm 3.1 to 3.3, we obtain the followingglobal convergence result for theQLPPF
Algorithm.THEOREM3.4. Supposethat
(P)
isprimalnondegenerateinthesenseof Lemma 3.3(b)
dt
(.) n @ (3.21). f n
thQLPPF Atoth (t,p ) toth
th omu
satisfies (.) (3.28) (.eg)
for
somescalar(3.30)
t
>
0, Bt d,o,. (t ) _< (/),
,Oe,
(t 1, pl,
U1) _
fl,0<<
and
if
we choose(i Okk(tk)()2+x/-
1)
Vk,(3.31)
ak maxZ + v/
1+ V/71/el/llbl[
then ek
O,
7k 0 linearly, and{((tk + ATp
k-C)/ek,uk/ek)}, {(tk,pk)}
approach the optimalsolutionset of, respectively,(79)
and(T)).
Proof.
Firstnotefrom(w) >
1 forall w>
0(see (3.21)
andM2 > 1)
and(3.30)
that
<
1. Also note from(2.7)
that(3.32) ek/7k
We
claimthat(3.33) (t k, pk) _ pk_,_x(t k, pk,
Uk) <_ , p, (t k, pk,
Uk) <_ 3,
for all k
>
2. It is easilyseen byusing(3.27)-(3.31)
and(2.5)-(2.7)
and Lemma3.2that
(3.33)
holds for k 2.Suppose
that(3.33)
holds for all k<
h, for some h>
2. Then,(th,p h)
E3;-1
andp-l,-l(th,ph,
uh) < /3 <
1. Since wealso have from(2.7)
that ehah-leh-1
and 7hah-17h-1
andfrom(3.31)
that(ah-1)2 _>
y/-/(1
-4-X/-) > 1/2,
wecan applyLemma3.3(b)
to conclude that(3.34) Oh9/h(t h) _ ?D(h-1/7h-l) ) (1/71)
where the equality follows from
(3.32).
Then, by(3.30), fl < 1/0 (th).
Since(3.33)
holds for k h, we also have
p,(th,p h,
uh) < ,
soLemma
3.2 together with(2.5)-(2.7)
and(3.31), (3.32)yields
(th+l,p h+l) , pe,/(th+l,ph+l,uh+l) _ , pe+,+(th+l,ph+l,uh+l) _ .
Hence, (3.33)
holds for k h+
1.Since
(3.33)
holds for all k>
2, we see that(3.34)
holds for all h>
2. Then, by(3.30), O(th)
isless than 1 and bounded away from 1 for all h>
2, so that, by(3.31),
ak isless than 1 and bounded away from 1 for all k. Henceek
0, 7k 0 at the rate ofa geometric progression. The remainder of the proof follows from(3.33)
andLemma3.1. 0
Notethat insteadof
Ok (t k)
wecanuse, for example, theupperbound1/(el/71)
intheformula
(3.31),
andlinearconvergence would be preserved.However,
thisbound istypically loose and difficulttocompute. Thereisalso theissueof findingel, 71, ,
(t1, pl)
andu satisfying(3.27)-(3.30),
whichwewill addressin3.4.
3.4. Algorithm initialization.
By
Theorem 3.4, if the primal nondegeneracy assumption thereinholds and ifwe can find e, 7,(t, p)
andusatisfying(3.35) (3.36) (3.37)
t
>
0,Bt
d,O(t) <_ (/), p,(t,p, u) <
then we can
set/3 p,(t,p, u) (assuming p,(t,p, u) # 0)
and start theQLPPF
Algorithm withe, 7,(t,p),
andwewould obtain linearconvergence. But how dowe findsuch e, 7,(t, p),
andu?One obvious way is to fix any e
>
0, any>
0, and then solve the penalized problem(T),).
The solution(t,p)
obtained satisfiest
>
0, Bt d,vf,(t,p) + BTu)
0 -0for some u
(see (2.2))
so, by(2.3)
and(3.1), (3.2),
we havepe,(t,
p,u)
0 and(t,p) e ;. Hence (3.35)
and(3.37)
hold and, by Lemma3.3(b), O(t) <_ (e/-),
so(3.36)
also holds.(Of
course(T),)
need not besolvedexactly.)
Tosolvetheproblem(T),),
we can use any method forconvex differentiable minimization(e.g.,
gradient descent, coordinatedescent),
andwewould typicallywant esmall and/large sothat(T),)
iswellconditioned.Suppose
that there holds Be 0 andAe
b.(This
holds, for example, whenB
isthezeromatrix(which
corresponds to the casewhenall equalityconstraints are penalized), and a change ofvariablex’ ()-lx,
where 2 is any interior feasible solution of(P) (i.e., A
b,> 0)
and Z Diag(), has been made in(P).)
Then we can find a usable e, /,
(t,p),
u immediately: Fix any e> 2(1)([Ic[I + IIBT(BBT)-ldII)
andlet-
e.p Also(AAT)-IA(c-
let wBT(BBT)-ld w),
andt:eew
u
-(BBT)-d.
Then Bw d,
A(c- ATp) Aw, At
eb+ Aw
andBTu
ee t, sothatBt-
Bw
d,A(c-
tATp)
-eb,T(c-
tATp BTu) +
e/e(I + W)(c-
eeATp) + (e)ee
(eI + W)(c- ATp)
ew,where
T Diag(t)
andW Diag(w).
Also fromb(1) >
1 and our choice ofe wehave e
> Ilwll,
sot>
0.Hence,
by(3.2),
wehave(t,p) e J;
and, by(3.1)
and e 7, wehavep,(t,
p,u) II(eI + W)(c ATp) wll/()
2<-- II-- ATplI/ + IIWIIIIc- ATp)II/()
2I1(I AT(AAT)-IA)(c
+ IIWIIII(I AT(AAT)-IA)c + AT(AAT)-IAwlI/(e)
2< I1- wll/
/Ilwll(llcll
/Ilwll)/() ,
where the last inequality follows from the triangle inequality and the nonexpansive property of projection matrices. From our choice ofe we see that
(llcll + Ilwll)/e <
.5/(1),
sothe right-handsideof(3.38)
isbounded above by.5/(1)+ (.5/(1))
2_<
1/(1) 1/(e/),
where the inequality follows from(1) _>
1 and the equality follows from 7 e.Hence (3.35)
and(3.37)
hold. Also, since(t, p) e
J), then(3.37)
together with
Lemma 3.3(b)
shows that(3.36)
holds.4. Numerical results. Tostudythe performance of the
QLPPF
Algorithmin practice, wehave implementedthealgorithmtosolve the specialcaseof(7:))
andinwhich
B
isthe zeromatrix, i.e.,(P)
isofthe form(4.1)
minimizecTxsubject to
Ax
b, x>
O.(This
corresponds to penalizing all equality constraints in the corresponding dualproblem.)
Below we describe our implementation and present our very preliminary numerical experience.1. Initialization. Inour implementation, weset for allproblems e1--
x= lO41[clll/m,
andset
(somewhat
arbitrarily)pl
-0, tmax{e, c/2},
where
"max"
istaken componentwise.(Note
that sinceB
isthezero matrix, t can be set to any positivevector.) We
have chosent so tominimizeIITl(c
tATpl)II
subjectto t
_>
e. Also, care must be exercised in choosinge and-"
if theirvaluesare set toolow, thenthe
QLPPF
Algorithm may fail to converge; if their values are set too high, then theQLPPF
Algorithm may require many iterations to converge.(Notice
that we set e and7
directly proportional to the average costIclll/m
sotheirvalues scale with
c.)
2. Steplength selection. To ensure that the
tk’s
remain inside the positive orthant, weemployabacktrackingscheme similar to that usedbySetiono: whenever tk+ At
k isoutside the positiveorthant, wereplacetheformula for t+1 in(2.6)
by(4.2)
tk+l tkH-.98AkAt k,
where
k
(4.3) Ak=
mintj
However,
this raises a difficulty, namely, forA
much smaller than 1, the vector(.98AkAt k, Ap k)
may be far from theNewton
direction(Atk, Ap k)
and, as a conse- quence, theiterates may fail to converge. Toremedy this, we replace(analogous
to(4.2))
theformula forpk+
in(2.6)
bypk+l pk -t-.98AkAp k,
whenevernonconvergence isdetected.
(The
parameter value .98 ischosen somewhat arbitrarily, but it works wellin ourtests.)
The proper choiceofthe
ak’s
isvery important for theQLPPF
Algorthm: if theak’s
aretoo near 1(so
thepenalty parameters decreaseslowly),
then the algorithm would convergeslowly; if theak’s
aretoonear0(so
thepenalty parameters decrease rapidly), then the algorithm may fail to converge. In our implementation we adjust theak’s
dynamically according to the following rule:max{.3,
.95ok-1}
Ok
.6ok-1
if
A
k 1;if
A _<
.2;otherwise
Vk
_>
2,with
61
set to .5. The rational forthis adjustment rule is that, ifAk
1, then the current iterate is closely following the solution trajectory(so
we can decrease the penalty parameters at afasterrate andstillretainconvergence)
and, ifAk _<
.2, then the current iterate is unable to follow the solution trajectory(so
we must decreasethepenalty parameters at a slower
rate).
3. Termination. To avoid numericalproblems, westop decreasing thepenalty parameters e and
,
when they reach some prespecified tolerances min and min, respectively.In
our testswe set(min--
lo-lellClll/m,
minlO-llClll/m.
We
terminatetheQLPPF
Algorithm when the relativeduality gap andtheviolation ofprimal anddual feasibilityaresmall.More
specifically,weterminatewhenever the current iterate, denoted by(t,p),
satisfies(4.4) IIX(ATp- c)111 _<
10
-7,
(4.5) max{llAx bll , II[-x]+ll t -< 10-,
(4.6) II[ATp- c]+ll
10-7,
where x
(t + ATp- )1 (see
Lemma3.1),
XDiag(x),
and[.1+
denotes theorthogonal projection onto the nonnegativeorthant. Only for three ofourtest prob- lems could the above termination criterion not be met
(owing
to violation of(4.4)
and
(4.6))
inwhich casethe algorithmisterminated wheneverprimal feasibility(4.5)
ismet and
IcTx
v*I/Iv*l
islessthan 5.10-7,
wherev* denotes theoptimalcost of(4.1).
4. Solving for the direction. The most expensive computation at each iter- ation ofthe
QLPPF
Algorithm lies in solving the system oflinear equations(2.5).
Thiscan be seen to entailsolving a singlelinear system of the form
(4.7) AQATw
z,for w, where z is some n-vector and
Q
is some m m diagonal matrix whose jth diagonal entryis(4.8)
+
withe
>
0,>
0, andtsome positivem-vector.(Linear
system of the form(4.7)
alsoarise in interior point algorithms, but
(4.7)
has thenice property that the condition numberofQ
canbe controlled byadjusting thepenalty parameters e and’.)
Inourimplementation,
(4.7)
is solved usingYSMP,
asparse matrixpackage for symmetric positive semidefinitesystems developed atYale University(see [EGSS79], [EGSS82])
andaprecursor to the commercialpackage
SMPAK (Scientific
ComputingAssociates,1985). YSMP
comprisesaset ofFortranroutines implementing theCholeskydecom- positionschemeand, as apreprocessor,the minimum-degree ordering algorithm(see,
e.g.,
[GeL81]).
Inourimplementation, theminimum-degree orderingroutineORDER
iscalled first toobtainapermutationoftherowsand columns of the matrix
AA
T so that fill-in isreduced during factorization. Then,AA
T issymbolically factoredusing the routineSSF. (SSF
is called only once since the nonzero pattern ofAQA
T doesnot change with