PROGRAMMING AND

21  Download (0)

Full text

(1)

A

PATH-FOLLOWING ALGORITHM FOR

LINEAR

PROGRAMMING USING

QUADRATIC AND

LOGARITHMIC

PENALTY

FUNCTIONS*

PAUL TSENGt

Abstract. Motivated bya recent work ofSetiono, apath-following algorithm for linear pro- gramming using both logarithmic and quadratic penalty functions isproposed. Inthe algorithm, alogarithmic andaquadraticpenalty isplaced on, respectively, the nonnegativityconstraintsand anarbitrary subset of the equality constraints; Newton’smethod isapplied to solve the penalized problem, andaftereachNewton stepthe penalty parametersaredecreased. Thisalgorithm main- tains neitherprimalnordualfeasibilityand does not requireaPhaseI. Itis shownthatif the initial iterate ischosenappropriatelyandthepenaltyparametersaredecreased tozeroinaparticularway, then thealgorithmis linearly convergent. Numericalresults arealsopresentedsuggesting that the algorithmmay be competitive withinteriorpointalgorithmsinpractice, requiringtypicallybetween 30-45iterationstoaccuratelysolve each Netlibproblemtested.

Keywords, linear program,path-following,Newtonstep, penaltyfunction AMS subject classifications. 49, 90

1. Introduction. Since the pioneeringworkofKarmarkar

[Kar84],

muchinter- est has focused on solving linear programs using interior point algorithms. These interior point algorithms may be classified roughly aseither

(i)

projective-scaling

(or

potential

reduction), (ii)

affine-scaling, or

(iii)

path-following.

We

will notattemptto reviewtheliterature on thissubject, which isvast

(see

for example

[Meg89], [Tod89]

for

surveys).

Our interest is in algorithms of the path-following type, ofthe sort discussed in

[GaZS1].

These interior point algorithms typically penalize the nonneg- ativity constraints by a logarithmic function anduse

Newton’s

method tosolve the penalized problem, with the penalty parameters decreased after each Newton step

(see,

for example,

[Gon89], [GMSTW86], [KMY89], [MoA87], [Ren88], [Tse89]).

One disadvantage of interior point algorithms is the need for an initial interior feasible solution.

A

commontechnique for handlingthis is to addanartificalcolumn

(see [AKRV89], [BDDW89], [GMSTW86], [Lus90], [MMS89], [MSSPB89], [MoM87]),

but this itselfhas disadvantages. For example, thecost ofthe artificial column must be estimated, andsome type ofrank-1 updatingisneededto solve eachleast square problemwhichcansignificantlyincreasethe solutiontimeanddegrade thenumerical accuracyofthe solutions.

Recently,Setiono

[Set89]

proposedaninterestingalgorithm thatcombinesfeatures ofapath-following algorithmwiththoseof the method of multipliers

[HaB70], [Hes69], [Pow69] (also

see

[Roc76], [Ber82]).

Thisalgorithmdoesnot requireafeasible solution to start and is comparable to interior point algorithms both in terms of work per iteration and, according to the numerical results reported in

[Set89],

interms ofthe total numberofiterations. Todescribe the basic idea in Setiono’salgorithm, consider

Receivedbythe editorsApril 16, 1990; acceptedfor publication(inrevisedform)June11,1992.

This researchwassupported bythe U.S.Army ResearchOffice, contract DAAL03-86oK-0171,and wasconducted whiletheauthorwaswiththeLaboratoryfor Information and DecisionSystemsand theCenterforIntelligent ControlSystems,MassachusettsInstitute ofTechnology, Cambridge.

Department of Mathematics, GN-50, University of Washington, Seattle, Washington 98195 (t sengmath, washington, edu).

1578

(2)

alinearprogramin the standarddual form

minimize

--bTp

subjectto t

+ ATp

c, t

>_ O,

where

A

is some matrix and b and c are vectors of appropriate dimension. Let us attach a

Lagrange

multiplier vector x to the constraints t

+ ATp

c and apply

themethod ofmultipliers to the above linear program. Thisproduces thefollowing iterations

1

(t

k

+ ATpk C),

k- 1, 2,

x+

x

+ _J

where

(e k}

is a sequence of monotonically decreasing positive scalars and

(tk,p k)

is some

(inexact)

solution of the augmented Lagrangian subproblem

(1.1)

minimize

--bTp + (xk)T(t + ATp- c)- -k[It

1

+ ATp- c[I

2 subjectto t

>_

0.

(An

advantage ofthe above multiplier iterations is that they do not need a feasible solution to

start.) A

key issue associated with the above multiplier iterations con- cerns the efficient generation ofan inexact solution

(tk,p k)

ofthe convex quadratic program

(1.1)

for each k.

(Note

that as ek decreases, theobjective function of

(1.1)

becomes progressively more

ill-conditioned.)

Setiono’s algorithm may be viewed as the method of multipliers in which

(t,p k)

is generated according to the following scheme, reminiscent ofthepath-following idea: Add a logarithmic penalty function

_/k -jm__ ln(tj)

totheobjective of

(1.1),

where

k

is some positive scalar monoton-

ically decreasingwith k, andapply asingle Newton step, starting from

(tk-l,p k-l),

to the resulting problem.

(If

thetk thusobtained lies outsidethepositiveorthant, it

ismoved back towards tk-1 until it becomes positive.

1)

Inthis paper, inspired bythe precedingworkofSetiono, we study an algorithm that also addsto the objective aquadraticpenaltyontheequalityconstraints and a logarithmic penaltyon thenonnegativity constraints; and then solves the penalized problem using Newton’s method, with the penalty parameters decreased after each

Newton

step. Unlike Setiono’s algorithm, our algorithmdoes not use the multiplier vector x

(so

it may be viewed as a pure penalty

method)

and allows any subset of theequality constraints to bepenalized.

We

show that ifthe problem isprimalnon- degenerate and the iterates start near the optimal solution of the initial penalized problem, then the penalty parameters can be decreased at the rate ofa geometric progression and the iterates converge linearly. To the best of our knowledge, this is the first

(global)

linear convergence result for an noninterior point path-following algorithm. We also present numerical results indicating that thealgorithm may po- tentially be competitive with interior point algorithms in practice.

We

remark that enalty methods that use either the quadratic or the logarithmic penalty function have been well studied

(see,

for example,

[Ber82], [FiM68], [Fri57], [JIO78], [Man84],

Moreprecisely, k isgivenbythe formula

k k-

+

.98AkAt

k,

whereAtk is the Newton direction(projectedonto the space oft) and Ak is thelargest AE (0,1]

for which k-

_

/ktk isnonnegative. (Thechoice of.98 isarbitrary--anynumberbetween 0 and 1woulddo.)

(3)

[WBD88]),

but very little is known about penalty methods that use both types of penaltyfunctions

(called

mixed interior point-exterior pointalgorithms in

[FiM68]).

Upon

completion of this paper, the author learned from Professor O. L. Man- gasarian that the algorithm discussed here is essentially the IDLN

(Interior

Dual Least

Norm)

algorithm described in the recent Ph.D. thesis ofSetiono

[Set90] (also

see thereport

[Set91]),

with minordifferences inthechoiceofthepenalty parameters.

The samereferenceincludes an (asymptotic) linearrateconvergence analysis andex- tensivenumericalresults showing that theIDLN algorithm outperforms the popular simplex code

MINOS

by a factor of 2 or more on the 63 Netlib problems tested.

In

short,Setiono’sindependent workprovides furtherevidenceofthepracticalefficiency of the mixed interior point-exterior point solution approach. Finally, we note that while this paperwas under review, other noninterior point methods relating to that studiedhere have been proposed. One method, brought to our attention by one of thereferees,isacertainaugmented Lagrangian algorithm forstochastic programming

(see [MuR90]);

another method is a primal-dual exterior point algorithm for linear programming

(see [KMM91]). However,

neitherofthesemethods has beenshown to possess the nice

theoretical/numerical

properties enjoyed by the algorithm studied here. Forexample, no convergenceresult is given for the methodin

[MuR90]

andno

numericalorrate of convergence result isgiven forthe method in

[KMM91].

This paper proceeds as follows: In

2

we describethe basic algorithm; in

3

we

analyzeits convergence; andin

4

werecountour numerical experience withit.

In 5

wediscuss extensionsofthis work.

Inour notation, every vector isacolumnvector insome k-dimensionalrealspace

k,

andsuperscript T denotes transpose. For any vector x, wedenote by xj the jth coordinate ofx, by Diag(x) the diagonalmatrixwhose jth diagonal entry is xj, and by

IlXlll, Ilxll, Ilxll

theLl-norm, the L2-norm, and the

L-norm

ofx, respectively.

For any matrix

A,

we denote by

Aj

the jth column of

A.

We also denoteby e the vector of l’s

(whose

dimension will be clear fromthe

context)

and denote by

ln(.)

the

natural logarithmfunction.

2. Algorithmdescription. Let

A

beann mmatrix,Bbean mmatrix, b bean n-vector, cbean m-vector, and d bean/-vector. Consider thefollowinglinear program associated with

A, B,

b, c, and d:

minimize

--bTp

subject to t

+ ATp--

c, Bt d, t

>_

0,

which wecallthe dualproblem. Thedualproblemmay be viewedas astandardlinear program int, inwhichwearbitrarilypartition theequalityconstraintsintotwo subsets andexpress onesubset inthe generator form t

+ ATp

c.

(To

seethis, consider the problemof minimizing

aTt

subject to t E

R

N

S,

t

_>

0, wherea is anm-vector and

R

andSareaffinesetsin

m. We

canalwaysexpress

R {

t t

c-ATp

forsomep

}

and S

{

t Bt d

}

for some matrices

A

andB and somevectors cand d. Using t

c-ATp

to eliminate tfrom the objectivefunction,theproblemis now intheform

(7:)).)

Theconstraintst

+ ATp

ccan bethought ofasthecomplicatingconstraints which, ifremoved,wouldmake

(7:))

mucheasiertosolve. Theadvantages of splitting the constraints in this manner willbe explainedattheendof

2.

Finally,wenote that

the form in which we write the equality constraints is unimportant, and the above form is adopted fornotational convenienceonly.

By

attaching

Lagrange

multipliervectors x and y to theconstraints c-

ATp

t

(4)

andBt d, respectively, weobtainthe following dual of

(7))"

(:P)

subjectminimizetoC

Ax

TX

--

b,dTxy

+ BTy >_ O,

whichwecall the primalproblem.

Wemake thefollowing blanketassumptions, whicharestandardforinterior point algorithms, regarding

(7))

and

(P).

Assumption

A.

(a) { (x, y) Ax

b,x

+ BTy >

0

}

isnonempty.

(b) { (t, p) Bt

d,t

> O,

t

+ ATp

c

}

isnonempty.

(c) A

has fullrowrank.

It iswell known that, under parts

(a)

and

(b)

of Assumption

A,

both

(7))

and

(P)

have nonempty bounded optimal solutionsets.

Consider the dual problem

(7)).

Suppose that we place a quadratic penalty on the constraints t

+ ATp

c with a penalty parameter

1/e (e > 0)

and we place a

logarithmic penalty on the constraints t

_>

0 with a penalty parameter 7

>

0. This

gives thefollowing approximation to

(7))"

minimize

f,(t,p)

subject to Bt d, t>0,

where

f, (0, c) " n

isthepenalized objective function givenby

m

(2.1) f,v(t, p) -llc

t

ATpll

2

/E

j=l

ln(tj) ebTp Vt > O,

/p.

Thepenalizedproblem

(7),v)

has the advantage thatitsobjectivefunction

f,v

is twice

differentiableand the Hessian

V2f,v

ispositive definiteeverywhere

(via

Assumption

A(c)).

Since the feasible set of

(7),v)

is nonempty

(by

Assumption

A(b))

and its

intersectionwith any level set of

f,v

is bounded

(by

Assumption

A(a)),

it is readily seen that

(7),v)

has an optimal solution which, by the strict convexity of

f,v,

is

unique.

Note 1.

We

can use penaltyfunctions other thanthe quadraticandthelogarith- mic. Forexample, we can use acubic inplace ofthequadratic and tj

ln(ty)

inplaceof

-ln(tj).

The quadratic and the logarithmic function, however, have nice properties

(such

asthe second derivative of thelogarithmic function equalsminusthe squareof itsfirst

derivative)

whichmakeglobal convergence analysispossible.

It is well known

(see [Roc70])

that

(t,p)

isthe optimal solution of

(7),)

ifand

onlyifit satisfies, togetherwith some uE

z,

theKuhn-Tuckerconditions

(2.2)

t

>

0, Bt d,

V f,v(t, p) + BTu

0 O.

Straightforward calculationusing

(2.1)

finds that

(2.3) Vf,(t,p) (

t

+ ATp-c-e7(T)-le

)

A(t

/

ATp- c)

b

(2.4) V2f,(t,p)= ( I + e(T)-2 A AA AT)

T

(5)

where

T Diag(t).

The above formulas will be used extensively in the subsequent analysis. Note that

V 2f,

isill-conditioned at theboundaryofits domain.

It is not difficult to show that, as e and tend to zero, the optimalsolution of

(7),)

approaches theoptimal solution setof

(/)) (see

Lemma

3.1).

Thissuggests the following algorithm for solving

(:D,). At

eachiteration,wearegivene, anda

(t, p)

whichis anapproximate solutionof

(T),);

weapplyaNewtonstepto

(7:),)

at

(t, p)

to generatea new

(t, p)

andthenwedecreasee and 7- In otherwords, weconsider a sequence ofpenalized problems

(7)e,),

k 1, 2, with

e

0 and

/k

0, andwe

usea

Newton

stepto followtheoptimalsolution ofone penalizedproblemto that of the next.

We

now formallystate thisalgorithm, which wecall the

QLPPF (short

for

Quadratic-LogarithmicPenalty

Path-Following)

algorithm.

QLPPF

Algorithm

Iteration0. Choose

e >

0and

, >

0. Choose

(t , p) (0, (:X:))

m X

n

withBt d.

Iteration k. Given

(tk,p k)

E

(O,(:x:))

m X}n with Btk --d, compute

(/ktk,/kpk,uk)

to be asolution of

(2.5) V2 fek,k(tk, pk) l lap Atk

k

+ Vfk, (t , pk) + BTuk

0

I

O,

BAt O,

andset

(2.6) tk+l

t

+ At k, pk+l__ pk + Apk,

(2.7) +1= a , e+= ck ,

where ck is some scalar in

(0,1).

Note 2.

As

we noted earlier, the

QLPPF

Algorithm is closely linked to the algorithms proposed by Setiono. Inparticular, it can be seen from

(2.3), (2.4)

that,

in thespecialcase where B is thezero matrix, the direction finding problem

(2.5)

is

identical to thatin theIDLN algorithm ofSetiono

(see [Set91, Eq. (16)])

anddiffers

fromthat inthe

IDPP

algorithm ofSetiono

(see [Set89, Eq. (6)])

by onlyanorder ek

termon the right-hand side

(which

tendsto zero as

ek

tendsto

zero).

Note

3.

A

unique feature of the

QLPPF

Algorithm lies in its handling of the two sets of constraints t

+ ATp

c and Bt d, whereby quadratic penalties are placed only on thefirst set while the second set is maintained to be satisfied at all iterations. This feature has the advantage that it enables the

QLPPF

Algorithm to provide a unified framework for interiorpoint methods and exterior point methods.

As

anexample, ifweputallthe equalityconstraintsintoBt d

(and

correspondingly set

A

I and c-

0),

then it can be seen that the

QLPPF

Algorithmwith

7k

1

for all k reduces to the well-known primal path-following algorithm for maximizing

bTt

subject to the constraints Bt d, t

>_

0

(see [Gon89], [Wse89], [Ye87]).

If we

put all the equality constraints into t

+ ATp

c

(and

correspondingly set B 0 and d

0),

then, as was noted above, the

QLPPF

Algorithmreduces to the IDLN algorithm for maximizing

bTp

subject to the constraints t

+ ATp

c, t

>_

O. For

other choices of constraint splitting, we obtain algorithms somewhere in between.

We

can then envision choosing a constraint splitting so the corresponding

QLPPF

Algorithm is in some sense most efficient

(e.g.,

fastest

convergence)

for the given problem. Alternatively,wemay choosea constraintsplittingsoto ensurethatcertain

"critical" equality constraints are satisfied exactlyat all iterations

(by

puttingthese constraints intoBt-

d).

(6)

3. Global convergence. In this section, we show that if

(7))

is in some sense

primalnondegenerate andif

(ti, pi)

is "close"tothe optimalsolutionof

(7)1,1)

inthe

QLPPF

Algorithm, then, by decreasing the

k’s

at an appropriate rate, the iterates

((tk, pk)}

generated by the

QLPPF

Algorithm approach the optimal solution set of

(7)) (see

Theorem

3.4).

Becausethe Hessian

V2f,

isill-conditioned at theboundary ofits domain, theproof ofthisresult isquite involved and reliescriticallyon finding asuitable

Lyapunov (i.e., merit)

function to monitor theprogress ofthealgorithm.

Forany e

>

0and’y

>

0, let

p, (0, oo)

m x

n

x

l

be thefunction givenby

(3.1)

p,(t,p, u) max{ -llT(c

1 t

ATp BTu) + e’yell

-L-IIA(c-

t

ATp) + ebll } Vt >

O,

Vp,

where T

Diag(t).

From

(2.2)

and

(2.3)

we see that

(t,p)

is a solution of

(7),)

if

and only if

(t,p)

satisfies Bt d,t

>

0 and

p,(t,p, u)

0 for some u. Hence

p,

actsas a

Lyapunov

function which measures howfar

(t, p)

isfrom solving

(7),).

This

notion ismade precise inthefollowinglemma.

For anye

>

0, let

y={ (t,p) A(c-

t-

ATp)

-eb, Bt d, t

>

O

}.

The following lemma shows that any

(t,p)

E

J;

that satisfies

p,(t,p, u) <_ ,

for

some u, iswithinan orderofe

+ (1 + )’y

of beingan optimalsolution of

(7)).

LEMMA 3.1. Fix any

> O,

"y

>

0 and

(0, 1].

Forany

(t, p)

that

satisfies p,(t,p, u) <_ for

someu, the vector

(x, y)

given by

(3.3)

x

(t + ATp- c)/e,

y

u/e

is

feasible for (P)

and, together with

(t, p), satisfies

cTx

Jr-

dTy <_ v* + vi*

e

+ m(1 + )’y, bTp >_ v* --m(1 +/)’y,

It + ATp- cll

2

<_ 2(* (e)2 + m(1 + )e’y),

where v* denotes the optimal cost

of (:P)

and

* min{ IIx*ll2/21(x*,y *)

is an

o ti. a

o utio

(7’) }.

Proof.

Since

(t, p)

is in

J;,

itfollowsfrom the definitionof

J; (see (3.2))

and

(3.3)

that

Ax

b. Since

p,(t,p,u) <_ ,

we have from the definition of

p, (see (3.1))

and

(3.3)

that

IT(-x- BTy)+ "Yell <_ "y,

where T Diag(t). Thus

(3.4) ’y(1 )e _< T(x + BTy) <_ ’y(1 + )e.

Since t

>

0

and/ _<

1,the firstinequalityin

(3.4)

yields

x+BTy >_ ’y(1-)T-le _>

0.

Hence, (x, y)

isfeasiblefor

(7)).

Let

1

le

d(T, 7r) bTu + ATr 1 V(, ),

p({, ) ll{ll

2

+ cT{ + dT V(, b).

(7)

Straightforwardalgebra using

(3.3)

yields

pc(x, y) de(t, p) +

tT

(x + BTy).

Also,

it followsfromstrongduality for aconvexquadraticprogramand its dual that

min Pe(, ) A b, + BT >_

0

} max{ de(T, r) BT

d,

>_

0

}.

(,)- (,)

Thus,by letting

(x*, y*)

be any optimal solution of

(7)

with

Iix*]12/2 *,

weobtain from the above tworelationsand

cTx

4-

dTy

v* that

l[Xll

24-

cTx

4-

dTy pc(x, y)

de(t, p) +

tT

(x + BTy)

<_ pe(x*,y*) + tT(x + BTy) erl*

4-v* 4-

tT(x

4-

BTy),

Similarly, by letting

(t*, p*)

beany optimal solutionof

(T)),

weobtainfromthesame

two relations andthe facts

bTp

v* andt* 4-

ATp

cthat

bTp- llt

1

+ ATp-

c

de(t,p)

o(x,

t

+

>_ de(if,p*)- tT(x + BTy)

v* tT

(x

4-

BTy).

Since 0

<_ tT(x + BTy) <_ m(1 + ) (cf. (3.4)),

theabovetwo relationsyield cTx 4-

dT

y

<_

v*4-erl 4-

m(1

4-

bTp >_

v*

--m(1 +

Finally, since

(x, y)

is feasiblefor

(7))

so that

cTx

4-

dTy >_

v*, the first ofthese two relationsalsoyields

12

V* V*

llxl

/

/m(l/

Canceling v* from bothsides andusing

(3.3)

completes theproof, r3

Since weare dealing with linearprograms, Lemma3.1 implies that, ase $0 and 7 $ 0, the

(x, y)

given by

(3.3)

approaches the optimal solution set of

(7))

and

(t,p)

approaches the optimal solution set of

(7)). In

fact, it suffices to decrease e and as far as2

L,

where is somescalar constant andL isthesize oftheproblem encoding in binary

(defined

as, say, in

[Kar84]),

at which timean optimalsolution of

(P)

and

of

(T))

can be recovered by using the techniques described in, for example,

[Kar84]

and

[PaS82].

Foreach

A >

0, let

0" (0, oc)

m

- [0, oo)

bethe function given by where

Ox(t) (lie + ED/eATFAD1/eE]I + IIED/eATFll) Vt > O,

D

(I + AT-2) -1,

E

I- DI/2BT[BDBT]-IBD 1/2,

F

[A(I- D1/2ED/2)AT] -1,

(8)

andT

Diag(t). (F

iswelldefined because

IEII _<

1

(E

is aprojection

matrix)

and

IIDII <

1, so that I-

D1/2ED

1/2 is positive definite.

We

also use the assumption that

A

has full row

rank.)

The quantity

0h(t)

estimates thenorm squared ofcertain projection-like operator dependingon

A

andt, and it willbe used extensively in our analysis. In general,

0 (t)

is rather cumbersome to evaluate, but, as weshall see, it sufficesfor our analysisto bound

0h (t)

fromabove

(see

Lemma

3.3(b)).

3.1. Analyzinga Newtonstep. Inthis subsectionweproveakeylemmathat statesthatif

(,/3)

is "close" tothe optimal solutionof

(79,#),

then

(t, p)

generated by applyingoneNewtonstepto

(79)

at

(,/3)

isclosetotheoptimal solution of

(79,)

for

some e

<

and some

- < .

The notion of "closeness" ismeasuredby the

Lyapunov

function

p,

andtheproof ofthelemma isbasedon theideas usedin

[Tse89, 2].

LEMMA 3.2. Forany

> O,

any

>

0 andany

(, , z)

E

(0, cx)

mx

n l

with

B-

d, let

(t,

p,

u)

be given by

(3.9)

t-

+ At,

(3.10)

p

p + Ap,

whereu and

(At, Ap)

togethersolve thefollowing system

of

linear equations

()

(3.11) + +

0 0,

BAt

O.

Suppose that

p,#(-, p, fi) < Z for some/ < min{1, 1/0(t-)}.

Then thefollowing hold:

(a) (t,p) e Y.

(b)

For anya satisfying

/0#(t--)(fl)

2

+

1

} <

a

<

1

(3.12)

max

V I+ZX/Ilbll

we have

Proof. pa,a#(t,p,

Let

u) <_ .

(3.13)

(3.14)

g

A(c- - ATp) +

b,

whereT-

Diag(t-).

Then, by

(3.1), (3.15)

By

using

(2.3), (2.4)

and

(3.13), (3.14),

wewrite

(3.11)

equivalently as

D-1At + ATAp + BT(u z) -1,

AAt + AA TAp ,

BAt

0,

where for convenience welet D

(I + -2)-1.

Solving for

At

gives

1/2 T 1/2

At D1/2(E --

ED

A FAD E)D1/2-1 D1/2ED1/2ATF,

where welet E

I- D/2BT[BDBT]-IBD/2

and F

[A(I- D/2ED/2)AT] -.

(F

iswell definedbythesamereasoning thatthematrix givenby

(3.8)

iswell

defined.)

Then, we can bound

-At

asfollows:

1/2 T 1/

[[T-1At[I _< [[-ID1/2II2[IE-I--ED A FAD 2E[II[II-+-I[-’-ID1/2II[IED1/2ATF[[[[,5[[.

(9)

Since

-2D

isdiagonal and each ofits diagonal entry isless than

1/(g),

weobtain that

1[-2Dl[ < 1/(g)and

hence

[[-At[I _< []E + ED/2ATFAD/2E[I]III/()+ [IED/2ATFI[[Ill/x/

(3.16) _<

where the last inequality follows from

(3.15)

and the definition of

T, D, E, F,

and

0(t-) (see (3.5)-(3.8)).

Now,

by using

(2.3)

and

(2.4),

we canwrite

(3.11)

equivalentlyas

BAt

0 and

(3.17) -lAt

(3.18)

0

A(c- - At ATp- AT Ap) +

b,

sofrom

(3.9)

and

(3.10)we

obtain

T(c-

t

ATp- BTu) + e

(c- -- At- AT- ATAp- BTu)

+ e + AT(c- - At- ATp- ATAp- BTu)

-lAt + AT(c- - At AT- A TAp BTu)

-IAT(e + ’(c- 9

-2

ATAt, - At- ATp ATAp BTu))

where T

Diag(t), AT

Diag(At), and the third andthe last equality follow from

(3.17).

This implies

(3.19) _< gO#(t-)(/3) 2,

where the thirdinequality follows from

(3.16).

(a) We

have from

(3.16)

and the hypothesis

/ < 1/0(t-)

that

II-Atll _<

0#(t-)(/) </ <

1, so

(3.9)

and

>

0 yields t

+ At >

0.

Also, B-=

dtogether

with

BAt

0

(see (3.11))

and

(3.9)

yields Bt d, and

(3.18)

together with

(3.9), (3.10)

yields 0

A(c-

t

ATp) +

b. Hence

(t, p)

(b)

Fix any a satisfying

(3.12)

and let

- a,

c

a. (Note

that because

0(t-)/ <

1, the left-hand quantity in

(3.12)

is strictly less than 1, so such an

exists.)

Let

r

T(c-

t

ATp BTu)

-t-

Thenthetriangle inequality and

(3.19)

imply

Ilrll/() _< IIT(c-

t-

ATp- BTu) + ’ell/(e) + (1 (a)2)/(a)

2

<_ 0(t--)03)2/()

2

+ (1/(c)

2-

1)V/-,

whichtogetherwiththefact

(see (3.12)) (0(t-)(/3)

2

+ v/)/(/ + x/) _< (c)2,

yields

(3.o) I[,’11/() <_ Z.

(10)

Let s

A(c-

t

ATp)

4-eb.

By

using

(3.9), (3.10),

and

(3.1S),

wehave

s

A(c- - At ATp- A Tlkp)

4- ab

(a 1)b,

whichtogetherwiththefact

(see (3.12))

1/(1

/

<

1,

yields

Ilsll/x/g- -- (l/a- 1)llbl] v"-Z/ff’

_</3.

This together with

(3.20)

and the definition of

p,,(t,

p,

u) (see (3.1))

proves our

claim, r]

3.2. Bounds.

Lemma

3.2 shows that ifthe rate c at which the penalty pa- rameters e and

-

are decreased is not too small

(see (3.12)),

then a single Newton

stepsuffices tokeepthe current iterateclosetothe optimalsolution ofthepenalized problem

(T),,).

Thus, in orderto establish the

(linear)

convergence ofthe

QLPPF

Algorithm, it suffices to bound c away from 1 which, according to

(3.12),

amounts

to bounding

O.(t)

bysome quantity independent of t and e, 7- Itis not difficult to seethatsuchabound doesnotexistfor arbitrary t. Fortunately, weneedto consider onlythosetthat, togetherwith some p,areclose tothe optimalsolutionof

(77,,),

in

whichcase, as weshowbelow,suchabound doesexist

(provided

thatacertainprimal nondegeneracyassumptionalso

holds).

Theproof ofthis issomewhat intricate: Fore

and

-

large,weargueby showingthatt cannotbetoolarge, i.e., of the ordere

+- +

1

(see

Lemma

3.3(a))

and,for and -ysmall, weargueby showingthat,under the primal nondegeneracy assumption, the columns of

A

corresponding to those components of t which are small

(i.e.,

ofthe order

3’)

areofrank n.

LEMMA

3.3.

(a)

For all e

>

O, all

" > O,

and all

(t,p)

E

,

satisfying

p,(t,p, u) <_

1

for

some u, we have

Iltll <_ Mx(e +

7

+ 1),

where

Mx >

0 is some scalar depending on

A, B,

b, c, andd only.

(b)

Suppose that

(7>)

isprimalnondegeneratein the sensethat,

for

everyoptimal

solution

(x*, y*) of (P),

those columns

of A

corresponding to thepositive components

of

x*

+ BTy

have rank n. Then,

for

all e

> O, all7 > O,

all )

>_ 7/2,

and all

(t,p)

E satisfying

pe,(t,p, u) <_

1

for

some u, we have

0 (t) <

where we

define

(w) M2(vr + l/yr.)

4

V

w

>

0,

with

M2 >_

1 some scalar depending on

A, B,

b, c, andd only.

Proof. (a)

The proofis by contradiction.

Suppose

the contrary, so that there exists asequence

{(tk, pk,

u

k, ek, .),k)}

such that ek

>

0 and

.),k >

0 and

(3.22) (t k, pk) e

k, pk,

(t k, pk,

u

k) <_

1 Vk,

(3.23) IIt ll/( +

"r ----,oo, oo.

(11)

By

passing into a subsequence if necessary we will assume that

(t

convergesto some limit point, say

(t,p oo) (so (t , poo) 0). We

have from

(3.22)

(also

using

(3.1)

and

(3.2))

that

Btk d, tk >0 Vk, and fromLemma3.1 that

bTp

k

>_

v*

2m7 k, ]It

k

+ ATp

k

cll <_ V/2(r/* (e k)2 + 2rnekk)

Vk.

Upon

dividing bothsidesof the above fourrelationsby

I[(tk, pk)[

and letting k --,c, we obtainfrom

Btoo (3.23)

and

O, (tk,pk)/[l(tk,pk)[ too >_ O, bTp

oo

>_ -

O,

(too,poo) [[too+ATp[l=0.

that

Then, any optimal solution to

(73)

would remain optimal when moved along the direction

(too, poo),

contradicting theboundedness ofthe optimal solution set of

(73) (via

Assumption

A(a)).

(b)

Fixany

>

0,

- >

0,

A >_ e-/2,

andany

(t,p)

E

Y

satisfying

p,(t,p, u) <_

1

for some u. Let T--

Diag(t)

andlet

D, E,

F be given by, respectively,

(3.6), (3.7),

and

(3.8).

Then, F-1

A(I- D1/2ED1/2)AT, I]E[[ _<

1 and

D (I + AT-2) -1.

From thedefinitionof

0h (t) (see (3.5))

wethenobtain

(3.24)

O(t) (lie + ED1/2ATFAD1/2E[I + IIED1/UATFII)

_< ([[El[

/

[]E[[2[[D/[12[[A[I[[F[[

/

I[E[[[ID/2I[[IAll[[F[[)2

wherethestrict inequality follows from thefacts

I[D]l <

1,

IIEl[

1. Nowwe bound

[[F][.

Wehave

zT(F-I)z zTA(I- DI/2ED1/2)AT

z

>_ zTA(I D)A Tz

m

+

j=l

(3.25) >_ A E(Az)2/(tj)

j=l

>_ A E(Ayz)2/l[t[le

j=l

where the first inequality follows from

[IE[[ _<

1 and a

>

0 denotes the smallest eigenvalue of

AA T. (a >

0 because

A

hasfull row

rank.) Hence, A >_ e7/2

yields

(3.26) IIFll < 211tl12/(r).

For e and q, near zero, wegive below a different bound on

IFII- By

the primal

nondegeneracy assumption, there exists a constant 5

>

0 depending on

A, B,

b, c, anddonly suchthat if

(x, y)

isanyfeasible solution of

(P)

with

cTx + dTy <_

v*

+ ,

(12)

thenthecolumns

{ Aj J e {1,..., m}

withxj

+ Byy _>

5

}

have rank n. Consider

the

(t, p)

case in whichE

:P

and

p,(t, *e

p,

+ u) 2m- _<

1,we

< 5,

have fromwhere

*

Lemmais defined as3.1 thatthe columnsin Lemma 3.1.

{ A

Since

J

E

{1,..., m}

with

(t + Ayp + Bfu- c)/e _>

5

}

have rank n. Since

(t,p) e :P

and

p,(t,p, u) _<

1,we have

T(c- t-ATp- BTu) _> --2ee (cf. (3.1)

and

(3.2))

so that

(tj + Ayp + Bfu- c)/e >_

implies tj

<_ 2,/5. Hence,

weobtain from

(3.25)

that

m

z(-l) > ,(y)/(t)

j=l

>_ () (Yz)/(e)

> (5)’llzll/(e) w,

where the last inequality follows from the fact thatthose

Aj

forwhichtj

_ 2"),/5

have

rankn, and

a’

issome positivescalar dependingon

A

only. Since

A >_ e7/2,

we then have

I111 (2)/((5) ’) <_ 8/((5) ’) 8/((5)’),

where for convenience we let w

/3’. Now,

consider the remaining ce where

* +

2my 6. Then,

6/(2m),

sopart

(a)

and

(3.26)

together yield

IIFII 2(M1)(

e

+

7

+ 1)

2

2(M1)2

(

1

<2(M1) ( +

1

+ 2m)

Combiningtheabove two cesandweconcludeha

IFII < { K/w K(+1/)2

ifotherwise,

*e + 2m7 < 5;

for somepositivescalarK dependingon

A, B,

b,c, anddonly. Combining the above bound with

(3.24)

and we conclude that

O(t) M( + 1/)

4 forsome scalar

M2

1 dependingon

A, B,

b, c, and donly.

3.3. Main convergence result.

By

combiningLemm 3.1 to 3.3, we obtain the followingglobal convergence result for the

QLPPF

Algorithm.

THEOREM3.4. Supposethat

(P)

isprimalnondegenerateinthesense

of Lemma 3.3(b)

dt

(.) n @ (3.21). f n

th

QLPPF Atoth (t,p ) toth

th om

u

satisfies (.) (3.28) (.eg)

for

somescalar

(3.30)

t

>

0, Bt d,

o,. (t ) _< (/),

,Oe,

(t 1, pl,

U

1) _

fl,

0<<

(13)

and

if

we choose

(i Okk(tk)()2+x/-

1

)

Vk,

(3.31)

ak max

Z + v/

1

+ V/71/el/llbl[

then ek

O,

7k 0 linearly, and

{((tk + ATp

k-

C)/ek,uk/ek)}, {(tk,pk)}

approach the optimalsolutionset of, respectively,

(79)

and

(T)).

Proof.

Firstnotefrom

(w) >

1 forall w

>

0

(see (3.21)

and

M2 > 1)

and

(3.30)

that

<

1. Also note from

(2.7)

that

(3.32) ek/7k

We

claimthat

(3.33) (t k, pk) _

pk_,_x

(t k, pk,

U

k) <_ , p, (t k, pk,

U

k) <_ 3,

for all k

>

2. It is easilyseen byusing

(3.27)-(3.31)

and

(2.5)-(2.7)

and Lemma3.2

that

(3.33)

holds for k 2.

Suppose

that

(3.33)

holds for all k

<

h, for some h

>

2. Then,

(th,p h)

E

3;-1

and

p-l,-l(th,ph,

u

h) < /3 <

1. Since wealso have from

(2.7)

that eh

ah-leh-1

and 7h

ah-17h-1

andfrom

(3.31)

that

(ah-1)2 _>

y/-/(1

-4-

X/-) > 1/2,

wecan applyLemma

3.3(b)

to conclude that

(3.34) Oh9/h(t h) _ ?D(h-1/7h-l) ) (1/71)

where the equality follows from

(3.32).

Then, by

(3.30), fl < 1/0 (th).

Since

(3.33)

holds for k h, we also have

p,(th,p h,

u

h) < ,

so

Lemma

3.2 together with

(2.5)-(2.7)

and

(3.31), (3.32)yields

(th+l,p h+l) , pe,/(th+l,ph+l,uh+l) _ , pe+,+(th+l,ph+l,uh+l) _ .

Hence, (3.33)

holds for k h

+

1.

Since

(3.33)

holds for all k

>

2, we see that

(3.34)

holds for all h

>

2. Then, by

(3.30), O(th)

isless than 1 and bounded away from 1 for all h

>

2, so that, by

(3.31),

ak isless than 1 and bounded away from 1 for all k. Hence

ek

0, 7k 0 at the rate ofa geometric progression. The remainder of the proof follows from

(3.33)

andLemma3.1. 0

Notethat insteadof

Ok (t k)

wecanuse, for example, theupperbound

1/(el/71)

intheformula

(3.31),

andlinearconvergence would be preserved.

However,

thisbound istypically loose and difficulttocompute. Thereisalso theissueof finding

el, 71, ,

(t1, pl)

andu satisfying

(3.27)-(3.30),

whichwewill addressin

3.4.

3.4. Algorithm initialization.

By

Theorem 3.4, if the primal nondegeneracy assumption thereinholds and ifwe can find e, 7,

(t, p)

andusatisfying

(3.35) (3.36) (3.37)

t

>

0,

Bt

d,

O(t) <_ (/), p,(t,p, u) <

then we can

set/3 p,(t,p, u) (assuming p,(t,p, u) # 0)

and start the

QLPPF

Algorithm withe, 7,

(t,p),

andwewould obtain linearconvergence. But how dowe findsuch e, 7,

(t, p),

andu?

(14)

One obvious way is to fix any e

>

0, any

>

0, and then solve the penalized problem

(T),).

The solution

(t,p)

obtained satisfies

t

>

0, Bt d,

vf,(t,p) + BTu)

0 -0

for some u

(see (2.2))

so, by

(2.3)

and

(3.1), (3.2),

we have

pe,(t,

p,

u)

0 and

(t,p) e ;. Hence (3.35)

and

(3.37)

hold and, by Lemma

3.3(b), O(t) <_ (e/-),

so

(3.36)

also holds.

(Of

course

(T),)

need not besolved

exactly.)

Tosolvetheproblem

(T),),

we can use any method forconvex differentiable minimization

(e.g.,

gradient descent, coordinate

descent),

andwewould typicallywant esmall and/large sothat

(T),)

iswellconditioned.

Suppose

that there holds Be 0 and

Ae

b.

(This

holds, for example, when

B

isthezeromatrix

(which

corresponds to the casewhenall equalityconstraints are penalized), and a change ofvariable

x’ ()-lx,

where 2 is any interior feasible solution of

(P) (i.e., A

b,

> 0)

and Z Diag(), has been made in

(P).)

Then we can find a usable e, /,

(t,p),

u immediately: Fix any e

> 2(1)([Ic[I + IIBT(BBT)-ldII)

andlet

-

e.p Also

(AAT)-IA(c-

let w

BT(BBT)-ld w),

and

t:eew

u

-(BBT)-d.

Then Bw d,

A(c- ATp) Aw, At

eb

+ Aw

and

BTu

ee t, sothat

Bt-

Bw

d,

A(c-

t

ATp)

-eb,

T(c-

t

ATp BTu) +

e/e

(I + W)(c-

ee

ATp) + (e)ee

(eI + W)(c- ATp)

ew,

where

T Diag(t)

and

W Diag(w).

Also from

b(1) >

1 and our choice ofe we

have e

> Ilwll,

sot

>

0.

Hence,

by

(3.2),

wehave

(t,p) e J;

and, by

(3.1)

and e 7, wehave

p,(t,

p,

u) II(eI + W)(c ATp) wll/()

2

<-- II-- ATplI/ + IIWIIIIc- ATp)II/()

2

I1(I AT(AAT)-IA)(c

+ IIWIIII(I AT(AAT)-IA)c + AT(AAT)-IAwlI/(e)

2

< I1- wll/

/

Ilwll(llcll

/

Ilwll)/() ,

where the last inequality follows from the triangle inequality and the nonexpansive property of projection matrices. From our choice ofe we see that

(llcll + Ilwll)/e <

.5/(1),

sothe right-handsideof

(3.38)

isbounded above by

.5/(1)+ (.5/(1))

2

_<

1/(1) 1/(e/),

where the inequality follows from

(1) _>

1 and the equality follows from 7 e.

Hence (3.35)

and

(3.37)

hold. Also, since

(t, p) e

J), then

(3.37)

together with

Lemma 3.3(b)

shows that

(3.36)

holds.

4. Numerical results. Tostudythe performance of the

QLPPF

Algorithmin practice, wehave implementedthealgorithmtosolve the specialcaseof

(7:))

and

(15)

inwhich

B

isthe zeromatrix, i.e.,

(P)

isofthe form

(4.1)

minimizecTx

subject to

Ax

b, x

>

O.

(This

corresponds to penalizing all equality constraints in the corresponding dual

problem.)

Below we describe our implementation and present our very preliminary numerical experience.

1. Initialization. Inour implementation, weset for allproblems e1--

x= lO41[clll/m,

andset

(somewhat

arbitrarily)

pl

-0, t

max{e, c/2},

where

"max"

istaken componentwise.

(Note

that since

B

isthezero matrix, t can be set to any positive

vector.) We

have chosent so tominimize

IITl(c

t

ATpl)II

subjectto t

_>

e. Also, care must be exercised in choosinge and

-"

if theirvalues

are set toolow, thenthe

QLPPF

Algorithm may fail to converge; if their values are set too high, then the

QLPPF

Algorithm may require many iterations to converge.

(Notice

that we set e and

7

directly proportional to the average cost

Iclll/m

so

theirvalues scale with

c.)

2. Steplength selection. To ensure that the

tk’s

remain inside the positive orthant, weemployabacktrackingscheme similar to that usedbySetiono: whenever tk

+ At

k isoutside the positiveorthant, wereplacetheformula for t+1 in

(2.6)

by

(4.2)

tk+l tk

H-.98AkAt k,

where

k

(4.3) Ak=

min

tj

However,

this raises a difficulty, namely, for

A

much smaller than 1, the vector

(.98AkAt k, Ap k)

may be far from the

Newton

direction

(Atk, Ap k)

and, as a conse- quence, theiterates may fail to converge. Toremedy this, we replace

(analogous

to

(4.2))

theformula for

pk+

in

(2.6)

by

pk+l pk -t-.98AkAp k,

whenevernonconvergence isdetected.

(The

parameter value .98 ischosen somewhat arbitrarily, but it works wellin our

tests.)

The proper choiceofthe

ak’s

isvery important for the

QLPPF

Algorthm: if the

ak’s

aretoo near 1

(so

thepenalty parameters decrease

slowly),

then the algorithm would convergeslowly; if the

ak’s

aretoonear0

(so

thepenalty parameters decrease rapidly), then the algorithm may fail to converge. In our implementation we adjust the

ak’s

dynamically according to the following rule:

max{.3,

.95ok-1

}

Ok

.6

ok-1

if

A

k 1;

if

A _<

.2;

otherwise

Vk

_>

2,

(16)

with

61

set to .5. The rational forthis adjustment rule is that, if

Ak

1, then the current iterate is closely following the solution trajectory

(so

we can decrease the penalty parameters at afasterrate andstillretain

convergence)

and, if

Ak _<

.2, then the current iterate is unable to follow the solution trajectory

(so

we must decrease

thepenalty parameters at a slower

rate).

3. Termination. To avoid numericalproblems, westop decreasing thepenalty parameters e and

,

when they reach some prespecified tolerances min and min, respectively.

In

our testswe set

(min--

lo-lellClll/m,

min

lO-llClll/m.

We

terminatethe

QLPPF

Algorithm when the relativeduality gap andtheviolation ofprimal anddual feasibilityaresmall.

More

specifically,weterminatewhenever the current iterate, denoted by

(t,p),

satisfies

(4.4) IIX(ATp- c)111 _<

10

-7,

(4.5) max{llAx bll , II[-x]+ll t -< 10-,

(4.6) II[ATp- c]+ll

10

-7,

where x

(t + ATp- )1 (see

Lemma

3.1),

X

Diag(x),

and

[.1+

denotes the

orthogonal projection onto the nonnegativeorthant. Only for three ofourtest prob- lems could the above termination criterion not be met

(owing

to violation of

(4.4)

and

(4.6))

inwhich casethe algorithmisterminated wheneverprimal feasibility

(4.5)

ismet and

IcTx

v*

I/Iv*l

islessthan 5.10

-7,

wherev* denotes theoptimalcost of

(4.1).

4. Solving for the direction. The most expensive computation at each iter- ation ofthe

QLPPF

Algorithm lies in solving the system oflinear equations

(2.5).

Thiscan be seen to entailsolving a singlelinear system of the form

(4.7) AQATw

z,

for w, where z is some n-vector and

Q

is some m m diagonal matrix whose jth diagonal entryis

(4.8)

+

withe

>

0,

>

0, andtsome positivem-vector.

(Linear

system of the form

(4.7)

also

arise in interior point algorithms, but

(4.7)

has thenice property that the condition numberof

Q

canbe controlled byadjusting thepenalty parameters e and

’.)

Inour

implementation,

(4.7)

is solved using

YSMP,

asparse matrixpackage for symmetric positive semidefinitesystems developed atYale University

(see [EGSS79], [EGSS82])

andaprecursor to the commercialpackage

SMPAK (Scientific

ComputingAssociates,

1985). YSMP

comprisesaset ofFortranroutines implementing theCholeskydecom- positionschemeand, as apreprocessor,the minimum-degree ordering algorithm

(see,

e.g.,

[GeL81]).

Inourimplementation, theminimum-degree orderingroutine

ORDER

iscalled first toobtainapermutationoftherowsand columns of the matrix

AA

T so that fill-in isreduced during factorization. Then,

AA

T issymbolically factoredusing the routine

SSF. (SSF

is called only once since the nonzero pattern of

AQA

T does

not change with

Q.) At

eachiteration, the matrix

AQA

T isnumerically factored by

Figure

Updating...

References

Related subjects :