### A

PATH-FOLLOWING ALGORITHM FOR### LINEAR

PROGRAMMING USING

### QUADRATIC AND

LOGARITHMIC### PENALTY

FUNCTIONS*PAUL TSENGt

Abstract. Motivated bya recent work ofSetiono, ^{a}path-following algorithm for linear pro-
gramming using both logarithmic and quadratic penalty functions isproposed. Inthe algorithm,
alogarithmic andaquadraticpenalty isplaced on, respectively, the nonnegativityconstraintsand
anarbitrary subset of the equality constraints; Newton’smethod isapplied to solve the penalized
problem, andaftereachNewton stepthe penalty parametersaredecreased. Thisalgorithm main-
tains neitherprimalnordualfeasibilityand does not requireaPhaseI. Itis shownthatif the initial
iterate ischosenappropriatelyandthepenaltyparametersaredecreased tozeroinaparticularway,
then thealgorithmis linearly convergent. Numericalresults arealsopresentedsuggesting that the
algorithmmay be competitive withinteriorpointalgorithmsinpractice, requiringtypicallybetween
30-45iterationstoaccuratelysolve each Netlibproblemtested.

Keywords, linear program,path-following,Newtonstep, penaltyfunction AMS subject classifications. 49, 90

1. Introduction. Since the pioneeringworkofKarmarkar

### [Kar84],

muchinter- est has focused on solving linear programs using interior point algorithms. These interior point algorithms may be classified roughly aseither### (i)

projective-scaling### (or

potential

### reduction), (ii)

affine-scaling, or### (iii)

path-following.### We

will notattemptto reviewtheliterature on thissubject, which isvast### (see

^{for}

^{example}

### [Meg89], [Tod89]

for

### surveys).

^{Our}

^{interest}

^{is in}algorithms of the path-following type, ofthe sort discussed in

### [GaZS1].

These interior point algorithms typically penalize the nonneg- ativity constraints by a logarithmic function and^{use}

### Newton’s

method tosolve the penalized problem, with the penalty parameters decreased after each Newton step### (see,

for example,### [Gon89], [GMSTW86], [KMY89], [MoA87], [Ren88], [Tse89]).

One disadvantage of interior point algorithms ^{is} the need for an initial interior
feasible solution.

### A

commontechnique for handlingthis is to addanartificalcolumn### (see [AKRV89], [BDDW89], [GMSTW86], [Lus90], [MMS89], [MSSPB89], [MoM87]),

but this itselfhas disadvantages. For example, thecost ofthe artificial column must be estimated, andsome type ofrank-1 updatingisneededto solve eachleast square problemwhichcansignificantlyincreasethe solutiontimeanddegrade thenumerical accuracyofthe solutions.

Recently,Setiono

### [Set89]

^{proposed}

^{an}interestingalgorithm that

^{combines}features ofapath-following algorithmwiththoseof the method of multipliers

### [HaB70], [Hes69], [Pow69] (also

^{see}

### [Roc76], [Ber82]).

^{This}

^{algorithm}

^{does}not requireafeasible solution to start and is comparable to interior point algorithms both in terms of work per iteration and, according to the numerical results reported in

### [Set89],

^{in}

^{terms}

^{of}the total numberofiterations. Todescribe the basic idea in Setiono’salgorithm, consider

Receivedbythe editorsApril 16, 1990; acceptedfor publication(in^{revised}form)June11,1992.

This researchwassupported bythe U.S.Army ResearchOffice, contract DAAL03-86oK-0171,and wasconducted whiletheauthorwaswiththeLaboratoryfor Information and DecisionSystemsand theCenterforIntelligent ControlSystems,MassachusettsInstitute ofTechnology, Cambridge.

Department of Mathematics, GN-50, University of Washington, Seattle, Washington 98195 (t sengmath, washington, edu).

1578

alinearprogramin the standarddual form

minimize

### --bTp

subjectto t

### + ^{ATp}

^{c,}

^{t}

^{>_} O,

where

### A

^{is}some matrix and b and c are vectors of appropriate dimension. Let us attach

^{a}

### Lagrange

multiplier vector x to the constraints t### + ^{ATp}

^{c}

^{and}

^{apply}

themethod ofmultipliers to the above linear program. Thisproduces thefollowing iterations

1

### (t

^{k}

### + ^{ATpk} C),

^{k-}1, 2,

### x+

^{x}

### + _{_J}

where

### (e k}

^{is}

^{a}

^{sequence}of monotonically decreasing positive scalars and

### (tk,p k)

is some### (inexact)

^{solution}of the augmented Lagrangian subproblem

### (1.1)

^{minimize}

### --bTp + (xk)T(t + ATp- c)- -k[It

1^{+ ATp-} ^{c[I}

^{2}subjectto t

### >_

0.### (An

advantage ofthe above multiplier iterations is that they do not need a feasible solution to### start.) ^{A}

^{key}

^{issue}associated with the above multiplier iterations con- cerns the efficient generation ofan inexact solution

### (tk,p k)

ofthe convex quadratic program### (1.1)

for each k.### (Note

^{that}

^{as}

^{e}

^{k}decreases, theobjective function of

### (1.1)

becomes progressively more

### ill-conditioned.)

^{Setiono’s}

^{algorithm}may be viewed as the method of multipliers in which

### (t,p k)

^{is}generated according to the following scheme, reminiscent ofthepath-following idea: Add a logarithmic penalty function

### _/k -jm__ ^{ln(tj)}

^{to}

^{the}

^{objective}

^{of}

### (1.1),

^{where}

### k

^{is}

^{some}

^{positive}

^{scalar}

^{monoton-}

ically decreasingwith k, andapply ^{a}single Newton step, starting from

### (tk-l,p k-l),

to the resulting problem.

### (If

^{the}

^{t}

^{k}

^{thus}obtained lies outsidethepositiveorthant,

^{it}

ismoved back towards t^{k-1} until it becomes positive.

### 1)

Inthis paper, inspired bythe precedingworkofSetiono, we study an algorithm
that also addsto the objective aquadraticpenaltyontheequalityconstraints and a
logarithmic penaltyon thenonnegativity constraints; and then solves the penalized
problem using Newton’s method, ^{with} the penalty parameters decreased after each

### Newton

step. Unlike Setiono’s algorithm, our algorithmdoes not use the multiplier vector x### (so

^{it}may be viewed as a pure penalty

### method)

^{and}

^{allows}any subset of theequality constraints to bepenalized.

### We

show that ifthe problem isprimalnon- degenerate and the iterates start near the optimal solution of the initial penalized problem, then the penalty parameters can be decreased at the rate ofa geometric progression and the iterates converge linearly. To the best of our knowledge, this is the first### (global)

^{linear}convergence result for an noninterior point path-following algorithm. We also present numerical results indicating that thealgorithm may po- tentially be competitive with interior point algorithms in practice.

### We

remark that enalty methods that use either the quadratic or the logarithmic penalty function have been well studied### (see,

for example,### [Ber82], [FiM68], [Fri57], [JIO78], [Man84],

Moreprecisely, ^{k} isgivenbythe formula

k k-

### +

^{.98AkAt}

^{k,}

whereAt^{k} is the Newton direction(projectedonto the space oft) ^{and} Ak is thelargest AE (0,1]

for which ^{k-}

### _

^{/ktk}

^{is}nonnegative. (Thechoice of.98 isarbitrary--anynumberbetween 0 and 1woulddo.)

### [WBD88]),

^{but}

^{very}

^{little is}

^{known}about penalty methods that use both types of penaltyfunctions

### (called

mixed interior point-exterior pointalgorithms in### [FiM68]).

### Upon

completion of this paper, the author learned from Professor O. L. Man- gasarian that the algorithm discussed here is essentially the IDLN### (Interior

^{Dual}Least

### Norm)

algorithm described in the recent Ph.D. thesis ofSetiono### [Set90] (also

see thereport

### [Set91]),

^{with minor}differences inthechoiceofthepenalty parameters.

The samereferenceincludes an (asymptotic) ^{linear}rateconvergence analysis and^{ex-}
tensivenumericalresults showing that theIDLN algorithm outperforms the popular
simplex code

### MINOS

by a factor of 2 or more on the 63 Netlib problems tested.### In

short,^{Setiono’s}independent workprovides furtherevidenceofthepracticalefficiency of the mixed interior point-exterior point solution approach. Finally, we note that while this paperwas under review, other noninterior point methods relating to that studiedhere have been proposed. One method, brought to our attention by one of thereferees,isacertainaugmented Lagrangian algorithm forstochastic programming

### (see [MuR90]);

^{another}

^{method}

^{is}

^{a}primal-dual exterior point algorithm for linear programming

### (see [KMM91]). However,

^{neither}ofthesemethods has beenshown to possess the nice

### theoretical/numerical

properties enjoyed by the algorithm studied here. Forexample, no convergenceresult is given for the methodin### [MuR90]

^{and}

^{no}

numericalorrate of convergence result isgiven forthe method in

### [KMM91].

This paper proceeds as follows: In

### 2

^{we}

^{describe}

^{the}

^{basic}

^{algorithm;}

^{in}

### 3

^{we}

analyzeits convergence; andin

### 4

^{we}

^{recount}

^{our}numerical experience withit.

### In 5

wediscuss extensionsofthis work.

Inour notation, every vector is^{a}columnvector insome k-dimensionalrealspace

### k,

andsuperscript T denotes transpose. For any vector x,^{we}denote by xj the jth coordinate ofx, by Diag(x) the diagonalmatrixwhose jth diagonal entry is xj, and by

### IlXlll, Ilxll, Ilxll

^{the}

^{Ll-norm,}

^{the}

^{L2-norm,}

^{and the}

^{L-norm}

^{of}

^{x,}respectively.

For any matrix

### A,

we denote by### Aj

the jth column of### A.

We also denoteby e the vector of l’s### (whose

dimension will be clear fromthe### context)

^{and denote}

^{by}

### ln(.)

^{the}

natural logarithmfunction.

2. Algorithmdescription. Let

### A

beann mmatrix,Bbean mmatrix, b bean n-vector,^{c}bean m-vector, and d bean/-vector. Consider thefollowinglinear program associated with

### A, B,

b, c,^{and}d:

minimize

### --bTp

subject to t

### + ATp--

^{c,}

^{Bt}

^{d,}

^{t}

^{>_}

^{0,}

which wecallthe dualproblem. Thedualproblemmay be viewed^{as a}standardlinear
program int, inwhichwearbitrarilypartition theequalityconstraintsintotwo subsets
andexpress onesubset inthe generator form t

### + ^{ATp}

^{c.}

### (To

^{see}this, consider the problemof minimizing

### aTt

subject to t E### R

N### S,

t### _>

0, wherea is anm-vector and### R

andSareaffinesetsin### m. We

canalwaysexpress### R {

^{t}

^{t}

### c-ATp

forsomep### }

and S

### {

^{t}

^{Bt}

^{d}

### }

^{for}

^{some}

^{matrices}

^{A}

^{and}

^{B}

^{and}

^{some}

^{vectors}

^{c}and d. Using t

### c-ATp

to eliminate tfrom the objectivefunction,theproblemis now intheform### (7:)).)

^{The}constraintst

### + ^{ATp}

^{c}

^{can}

^{be}

^{thought of}

^{as}

^{the}complicatingconstraints which, ifremoved,wouldmake

### (7:))

^{much}

^{easier}

^{to}

^{solve.}

^{The}advantages of splitting the constraints in this manner willbe explainedattheendof

### 2.

^{Finally,}

^{we}

^{note that}

the form in which we write the equality constraints is unimportant, and the above form is adopted fornotational convenienceonly.

### By

attaching### Lagrange

multipliervectors x and y to theconstraints c-### ATp

tandBt d, respectively, weobtainthe following dual of

### (7))"

### (:P)

^{subject}

^{minimize}

^{to}

^{C}

^{Ax}

^{T}

^{X}

### --

^{b,}

^{d}

^{T}

^{x}

^{y}

^{+ BTy} ^{>_} ^{O,}

whichwecall the primalproblem.

Wemake thefollowing blanketassumptions, whicharestandardforinterior point algorithms, regarding

### (7))

^{and}

### (P).

Assumption

### A.

### (a) { (x, y) Ax

b,x### + ^{BTy} ^{>}

^{0}

### }

^{is}

^{nonempty.}

### (b) { (t, p) Bt

d,t### > O,

t### + ^{ATp}

^{c}

### }

^{is}

^{nonempty.}

### (c) ^{A}

^{has full}

^{row}

^{rank.}

It iswell known that, under parts

### (a)

^{and}

### (b)

of Assumption### A,

both### (7))

^{and}

### (P)

^{have}nonempty bounded optimal solutionsets.

Consider the dual problem

### (7)).

^{Suppose}

^{that}

^{we}

^{place}

^{a}quadratic penalty on the constraints t

### + ^{ATp}

^{c}

^{with}

^{a}penalty parameter

### 1/e (e > 0)

^{and we}

^{place}

^{a}

logarithmic penalty on the constraints t

### _>

0 with a penalty parameter_{7}

### >

^{0. This}

gives thefollowing approximation to

### (7))"

minimize

### f,(t,p)

subject ^{to} Bt d, t>0,

where

### f, (0, c) " n

isthepenalized objective function givenbym

### (2.1) f,v(t, p) -llc

^{t}

^{ATpll}

^{2}

^{/E}

_{j=l}

^{ln(tj)} ^{ebTp} ^{Vt} ^{>} ^{O,}

^{/p.}

Thepenalizedproblem

### (7),v)

has the advantage thatitsobjectivefunction### f,v

^{is twice}

differentiableand the Hessian

### V2f,v

^{is}positive definiteeverywhere

### (via

Assumption### A(c)).

^{Since}the feasible set of

### (7),v)

^{is}

^{nonempty}

### (by

^{Assumption}

### A(b))

^{and its}

intersectionwith any level set of

### f,v

^{is}

^{bounded}

^{(by}

^{Assumption}

### A(a)),

^{it is}readily seen that

### (7),v)

^{has}

^{an}optimal solution which, by the strict convexity of

### f,v,

^{is}

unique.

Note 1.

### We

can use penaltyfunctions other thanthe quadraticandthelogarith- mic. Forexample, we can use acubic inplace ofthequadratic and tj### ln(ty)

^{in}

^{place}

^{of}

### -ln(tj).

The quadratic and the logarithmic function, however, have nice properties### (such

^{as}the second derivative of thelogarithmic function equalsminusthe squareof itsfirst

### derivative)

^{which}

^{make}global convergence analysispossible.

It is well known

### (see [Roc70])

^{that}

### (t,p)

isthe optimal solution of### (7),)

^{if}

^{and}

onlyifit satisfies, togetherwith some uE

### z,

theKuhn-Tuckerconditions### (2.2)

^{t}

### >

0, Bt d,### V f,v(t, p) + BTu

_{0}

^{O.}

Straightforward calculationusing

### (2.1)

^{finds that}

### (2.3) Vf,(t,p) (

^{t}

^{+} ATp-c-e7(T)-le

### )

### A(t

^{/}

### ATp- _{c)}

b
### (2.4) V2f,(t,p)= ( ^{I} ^{+} ^{e(T)-2} ^{A} ^{AA} AT)

^{T}

where

### T Diag(t).

The above formulas will be used extensively in the subsequent analysis. Note that### V 2f,

^{is}ill-conditioned at theboundaryofits domain.

It is not difficult to show that, as e and tend to zero, the optimalsolution of

### (7),)

approaches theoptimal solution setof### (/)) (see

^{Lemma}

### 3.1).

^{This}suggests the following algorithm for solving

### (:D,). ^{At}

^{each}

^{iteration,}

^{we}

^{are}

^{given}

^{e,}

^{and}

^{a}

### (t, p)

whichis anapproximate solutionof

### (T),);

^{we}

^{apply}

^{a}

^{Newton}

^{step}

^{to}

### (7:),)

^{at}

### (t, p)

to generatea new

### (t, p)

andthenwedecreasee and 7- In otherwords,^{we}

^{consider}

^{a}sequence ofpenalized problems

### (7)e,),

^{k}

^{1, 2,}

^{with}

### e

^{0}

^{and}

### /k

^{0,}

^{and}

^{we}

usea

### Newton

stepto followtheoptimalsolution ofone penalizedproblemto that of the next.### We

now formallystate thisalgorithm, which wecall the### QLPPF (short

^{for}

Quadratic-LogarithmicPenalty

### Path-Following)

algorithm.### QLPPF

AlgorithmIteration0. Choose

### e >

0and### , ^{>}

^{0.}

^{Choose}

### (t , ^{p)} (0, (:X:))

^{m}

^{X}

### n

^{with}

^{Bt}

^{d.}

Iteration k. Given

### (tk,p k)

E### (O,(:x:))

^{m}

^{X}

^{}n}

^{with}

^{Bt}

^{k}--d, compute

### (/ktk,/kpk,uk)

to be asolution of### (2.5) V2 fek,k(tk, pk) l ^{lap} ^{Atk}

^{k}

^{+} ^{Vfk,} ^{(t} , ^{pk)} + ^{BTuk}

_{0}

### I

^{O,}

^{BAt} ^{O,}

andset

### (2.6) tk+l

t### + ^{At} ^{k,} ^{pk+l__ pk} + ^{Apk,}

### (2.7) +1= a , ^{e+=} ^{ck} ,

where c^{k} is some scalar in

### (0,1).

Note 2.

### As

we noted earlier, the### QLPPF

Algorithm is closely linked to the algorithms proposed by Setiono. Inparticular, it can be seen from### (2.3), (2.4)

that,in thespecialcase where B is thezero matrix, the direction finding problem

### (2.5)

^{is}

identical to thatin theIDLN algorithm ofSetiono

### (see [Set91, ^{Eq.} (16)])

^{and}

^{differs}

fromthat inthe

### IDPP

algorithm ofSetiono### (see [Set89, ^{Eq.} (6)])

^{by only}

^{an}

^{order}

^{e}

^{k}

termon the right-hand side

### (which

^{tends}

^{to}

^{zero as}

### ek

tendsto### zero).

### Note

3.### A

unique feature of the### QLPPF

Algorithm lies in its handling of the two sets of constraints t### + ^{ATp}

^{c}

^{and}

^{Bt}

^{d,}whereby quadratic penalties are placed only

^{on}thefirst set while the second set is maintained to be satisfied at all iterations. This feature has the advantage that

^{it}enables the

### QLPPF

Algorithm to provide^{a}

^{unified}framework for interiorpoint methods and exterior point methods.

### As

anexample, ifweputallthe equalityconstraintsintoBt d### (and

correspondingly set### A

I and c-### 0),

^{then}

^{it can}

^{be}

^{seen}

^{that}

^{the}

### QLPPF

Algorithmwith### 7k

^{1}

for all k reduces to the well-known primal path-following algorithm for maximizing

### bTt

subject to the constraints Bt d, t### >_

0### (see [Gon89], [Wse89], [Ye87]).

^{If}

^{we}

put all the equality constraints into t

### + ^{ATp}

^{c}

### (and

correspondingly set B 0 and d### 0),

then,^{as}

^{was}noted above, the

### QLPPF

Algorithmreduces to the IDLN algorithm for maximizing### bTp

subject to the constraints t### + ^{ATp}

^{c,}

^{t}

^{>_}

^{O.}

^{For}

other choices of constraint splitting, we obtain algorithms somewhere in between.

### We

can then envision choosing^{a}constraint splitting so the corresponding

### QLPPF

Algorithm is in some sense most efficient### (e.g.,

^{fastest}

### convergence)

^{for}the given problem. Alternatively,wemay choosea constraintsplitting

^{so}to ensurethatcertain

"critical" equality constraints are satisfied exactlyat all iterations

### (by

puttingthese constraints intoBt-### d).

3. Global convergence. In this section, ^{we} show that if

### (7))

^{is in}

^{some sense}

primalnondegenerate andif

### (ti, pi)

^{is}"close"tothe optimalsolutionof

### (7)1,1)

^{in}

^{the}

### QLPPF

Algorithm, then, by decreasing the### k’s

at an appropriate rate, the iterates### ((tk, pk)}

generated by the### QLPPF

Algorithm approach the optimal solution set of### (7)) (see

^{Theorem}

### 3.4).

^{Because}the Hessian

### V2f,

^{is}ill-conditioned at theboundary ofits domain, theproof ofthisresult isquite involved and reliescritically

^{on}finding asuitable

### Lyapunov (i.e., merit)

function to monitor theprogress ofthealgorithm.Forany ^{e}

### >

^{0}

^{and}

^{’y}

### >

0, let### p, (0, oo)

^{m}

^{x}

### n

^{x}

### l

be thefunction givenby### (3.1)

### p,(t,p, u) max{ -llT(c

1^{t}

^{ATp} ^{BTu) +} ^{e’yell}

### -L-IIA(c-

^{t}

^{ATp)} + ebll ^{}} ^{Vt} ^{>}

^{O,}

^{Vp,}

where T

### Diag(t).

From### (2.2)

^{and}

### (2.3)

^{we see}

^{that}

### (t,p)

^{is a}

^{solution}

^{of}

### (7),)

^{if}

and only if

### (t,p)

^{satisfies}Bt d,

^{t}

### >

^{0}

^{and}

### p,(t,p, u)

^{0}

^{for}

^{some}

^{u.}

^{Hence}

### p,

actsas a

### Lyapunov

function which measures howfar### (t, p)

isfrom solving### (7),).

^{This}

notion ismade precise inthefollowinglemma.

For any^{e}

### >

0, let### y={ (t,p) A(c-

^{t-}

### ATp)

-eb, Bt d, t### >

^{O}

### }.

The following lemma shows that any

### (t,p)

E### J;

that satisfies### p,(t,p, u) ^{<_} ,

^{for}

some u, iswithinan orderofe

### + (1 + ^{)’y}

^{of being}

^{an}

^{optimal}

^{solution}

^{of}

^{(7)).}

LEMMA 3.1. Fix any

### > O,

"y### >

^{0}and

### (0, 1].

^{For}any

### (t, p)

that### satisfies p,(t,p, u) ^{<_} for

^{some}

^{u,}

^{the}

^{vector}

### (x, y)

given by### (3.3)

^{x}

### (t + ATp- ^{c)/e,}

^{y}

### u/e

is

### feasible for (P)

^{and,}

^{together}

^{with}

### (t, p), satisfies

### cTx

Jr-### dTy <_ v* + vi*

^{e}

### + ^{m(1} + ^{)’y,} bTp >_ v* --m(1 +/)’y,

### It + ATp- cll

^{2}

^{<_} ^{2(*} (e)2 + m(1 + ^{)e’y),}

where v* denotes the optimal cost

### of (:P)

^{and}

### * ^{min{} IIx*ll2/21(x*,y *)

^{is an}

### o ti. a

^{ o utio }

### (7’) }.

### Proof.

^{Since}

### (t, p)

is in### J;,

itfollowsfrom the definitionof### J; (see (3.2))

^{and}

### (3.3)

that

### Ax

b. Since### p,(t,p,u) <_ ,

^{we}

^{have}

^{from}the definition of

### p, (see (3.1))

and

### (3.3)

^{that}

### IT(-x- BTy)+ "Yell ^{<_} ^{"y,}

^{where}

^{T}

^{Diag(t).}

^{Thus}

### (3.4) ’y(1 )e ^{_<} T(x + ^{BTy)} ^{<_ ’y(1} + )e.

Since t

### >

^{0}

### and/ _<

1,the firstinequalityin### (3.4)

^{yields}

### x+BTy >_ ’y(1-)T-le _>

0.### Hence, (x, y)

isfeasiblefor### (7)).

Let

1

### le

### d(T, 7r) bTu + ATr 1 ^{V(, ),}

### p({, ) ll{ll

^{2}

^{+ cT{} ^{+} ^{dT} ^{V(, b).}

Straightforwardalgebra using

### (3.3)

^{yields}

### pc(x, y) de(t, p) +

^{t}

^{T}

### (x + ^{BTy).}

### Also,

it followsfromstrongduality for aconvexquadraticprogramand its dual that### min Pe(, ) A ^{b,} + BT ^{>_}

^{0}

^{}} max{ de(T, r) ^{BT}

d, ### >_

0### }.

(,)- (,)

Thus,by letting

### (x*, y*)

be any optimal solution of### (7)

^{with}

### Iix*]12/2 *,

^{we}

^{obtain}from the above tworelationsand

### cTx

4-### dTy

v* that### l[Xll

^{2}

^{4-}

^{cTx}

^{4-}

^{dTy} ^{pc(x, y)}

### de(t, p) +

^{t}

^{T}

### (x + ^{BTy)}

### <_ pe(x*,y*) + tT(x + ^{BTy)} erl*

^{4-}

^{v*}

^{4-}

### tT(x

^{4-}

### BTy),

Similarly, by letting

### (t*, ^{p*)}

^{be}any optimal solutionof

### (T)),

^{we}

^{obtain}

^{from}

^{the}

^{same}

two relations andthe facts

### bTp

v* and_{t* 4-}

### ATp

cthat### bTp- llt

1^{+ ATp-}

^{c}

^{de(t,p)}

### o(x,

^{t}

### +

### >_ de(if,p*)- tT(x + ^{BTy)}

v* t^{T}

### (x

^{4-}

### BTy).

Since 0

### <_ tT(x + ^{BTy)} ^{<_} m(1 + ) (cf. (3.4)),

theabovetwo relationsyield
c^{T}x 4-

### dT

y### <_

v*4-erl 4-### m(1

^{4-}

### bTp >_

v*### --m(1 +

Finally, since

### (x, y)

is feasiblefor### (7))

^{so}

^{that}

### cTx

4-### dTy >_

v*, the first ofthese two relationsalsoyields### 12

^{V*}

^{V*}

### llxl

^{/}

^{/m(l/}

Canceling v* from bothsides andusing

### (3.3)

completes theproof,^{r3}

Since weare dealing ^{with} ^{linear}programs, Lemma3.1 implies that, ^{as}^{e} $0 and
7 $ 0, the

### (x, y)

given by### (3.3)

approaches the optimal solution set of### (7))

^{and}

### (t,p)

approaches the optimal solution set of### (7)). ^{In}

fact, ^{it}suffices to decrease e and as far as2

### L,

where is somescalar constant andL isthesize oftheproblem encoding in binary### (defined

as, say, in### [Kar84]),

at which timean optimalsolution of### (P)

^{and}

of

### (T))

^{can}be recovered by using the techniques described in, for example,

### [Kar84]

and

### [PaS82].

Foreach

### A >

0, let### 0" (0, oc)

^{m}

### - ^{[0,} ^{oo)}

^{be}the function given by where

### Ox(t) (lie + ED/eATFAD1/eE]I + IIED/eATFll) ^{Vt} > O,

D

### (I + AT-2) ^{-1,}

E

### I- DI/2BT[BDBT]-IBD 1/2,

F

### [A(I- D1/2ED/2)AT] -1,

andT

### Diag(t). (F

^{is}

^{well}

^{defined}

^{because}

### IEII ^{_<}

^{1}

^{(E}

^{is a}

^{projection}

^{matrix)}

^{and}

### IIDII ^{<}

^{1,}

^{so}

^{that}

^{I-}

^{D1/2ED}

^{1/2}is positive definite.

### We

also use the assumption that### A

has full row### rank.)

^{The}

^{quantity}

### 0h(t)

estimates thenorm squared ofcertain projection-like operator dependingon### A

andt, and it willbe used extensively in our analysis. In general,### 0 (t)

^{is}rather cumbersome to evaluate, but,

^{as we}shall see, it sufficesfor our analysisto bound

### 0h (t)

^{from}above

### (see

^{Lemma}

### 3.3(b)).

3.1. Analyzing^{a} Newtonstep. Inthis subsectionweproveakeylemmathat
statesthatif

### (,/3)

^{is}"close" tothe optimal solutionof

### (79,#),

^{then}

### (t, p)

generated by applyingoneNewtonstepto### (79)

^{at}

### (,/3)

^{is}

^{close}

^{to}

^{the}optimal solution of

### (79,)

^{for}

some e

### <

and some### - ^{<} ^{.}

^{The notion}of "closeness" ismeasuredby the

### Lyapunov

function

### p,

andtheproof ofthelemma isbasedon theideas usedin### [Tse89, 2].

LEMMA 3.2. Forany

### > O,

any### >

0 andany### (, , z)

^{E}

### (0, cx)

^{m}

^{x}

### n l

with### B-

d, let### (t,

^{p,}

### u)

^{be given}

^{by}

### (3.9)

^{t-}

### + ^{At,}

### (3.10)

^{p}

^{p} + ^{Ap,}

whereu and

### (At, Ap)

togethersolve thefollowing system### of

linear equations### ()

### (3.11) + +

_{0}

^{0,}

^{BAt}

^{O.}

Suppose that

### p,#(-, p, fi) < Z for some/ < min{1, 1/0(t-)}.

^{Then the}following hold:

### (a) (t,p) e Y.

### (b)

^{For}any

^{a}satisfying

### /0#(t--)(fl)

^{2}

### +

^{1}

### } ^{<}

^{a}

^{<}

^{1}

### (3.12)

^{max}

### V ^{I+ZX/Ilbll}

we have

_{Proof.} pa,a#(t,p,

_{Let}

### u) ^{<_} .

### (3.13)

### (3.14)

^{g}

### A(c- - ^{ATp)} ^{+}

^{b,}

whereT-

### Diag(t-).

Then, by### (3.1), (3.15)

### By

using### (2.3), (2.4)

^{and}

### (3.13), (3.14),

^{we}

^{write}

### (3.11)

equivalently as### D-1At + ATAp + BT(u z) -1,

### AAt + ^{AA} TAp ,

### BAt

0,where for convenience welet D

### (I + -2)-1.

Solving for### At

gives1/2 T 1/2

### At D1/2(E --

^{ED}

^{A FAD} ^{E)D1/2-1} D1/2ED1/2ATF,

where welet E

### I- D/2BT[BDBT]-IBD/2

^{and}

^{F}

### [A(I- D/2ED/2)AT] -.

### (F

^{is}well definedbythesamereasoning thatthematrix givenby

### (3.8)

^{is}

^{well}

### defined.)

Then,^{we can}bound

### -At

asfollows:1/2 T 1/

### [[T-1At[I ^{_<} [[-ID1/2II2[IE-I--ED A FAD 2E[II[II-+-I[-’-ID1/2II[IED1/2ATF[[[[,5[[.

Since

### -2D

isdiagonal and each ofits diagonal entry isless than### 1/(g),

^{we}

^{obtain}that

### 1[-2Dl[ < 1/(g)and

hence### [[-At[I ^{_<} []E + ED/2ATFAD/2E[I]III/()+ [IED/2ATFI[[Ill/x/

### (3.16) ^{_<}

where the last inequality follows from

### (3.15)

and the definition of### T, D, E, F,

and### 0(t-) (see (3.5)-(3.8)).

### Now,

by using### (2.3)

^{and}

### (2.4),

^{we can}

^{write}

### (3.11)

equivalently^{as}

### BAt

0 and### (3.17) -lAt

### (3.18)

0### A(c- - ^{At} ^{ATp-} ^{AT} ^{Ap)} ^{+}

^{b,}

sofrom

### (3.9)

^{and}

### (3.10)we

^{obtain}

### T(c-

^{t}

### ATp- BTu) + e

### (c- -- ^{At-} ^{AT-} ^{ATAp- BTu)}

### + e + AT(c- - ^{At-} ATp- ATAp- BTu)

### -lAt + AT(c- - ^{At} ^{AT-} ^{A} ^{TAp} ^{BTu)}

### -IAT(e + ’(c- ^{9}

^{-2}

^{ATAt,} - ^{At-} ^{ATp} ^{ATAp} ^{BTu))}

where T

### Diag(t), AT

Diag(At), and the third andthe last equality follow from### (3.17).

This implies### (3.19) ^{_<} gO#(t-)(/3) 2,

where the thirdinequality follows from

### (3.16).

### (a) ^{We}

^{have}

^{from}

### (3.16)

^{and}

^{the}

^{hypothesis}

### / < 1/0(t-)

^{that}

### II-Atll ^{_<}

### 0#(t-)(/) </ <

1, so### (3.9)

^{and}

### >

^{0}yields t

### + ^{At} ^{>}

^{0.}

^{Also,} ^{B-=}

^{d}

^{together}

with

### BAt

0### (see (3.11))

^{and}

### (3.9)

^{yields}

^{Bt}d, and

### (3.18)

^{together}

^{with}

### (3.9), (3.10)

^{yields}

^{0}

### A(c-

^{t}

### ATp) +

^{b.}

^{Hence}

### (t, p)

### (b)

^{Fix any}

^{a}

^{satisfying}

### (3.12)

^{and let}

### - ^{a,}

^{c}

^{a. (Note}

that because
### 0(t-)/ ^{<}

^{1,}

^{the}

^{left-hand}quantity in

### (3.12)

^{is}strictly less than 1, so such an

### exists.)

^{Let}

r

### T(c-

^{t}

### ATp BTu)

^{-t-}

Thenthetriangle inequality and

### (3.19)

^{imply}

### Ilrll/() ^{_<} IIT(c-

^{t-}

### ATp- BTu) + ’ell/(e) + (1 (a)2)/(a)

^{2}

### <_ 0(t--)03)2/()

^{2}

### + (1/(c)

^{2-}

### 1)V/-,

whichtogetherwiththefact

### (see (3.12)) (0(t-)(/3)

^{2}

### + v/)/(/ + x/) ^{_<} (c)2,

^{yields}

### (3.o) I[,’11/() <_ Z.

Let s

### A(c-

^{t}

### ATp)

4-eb.### By

using### (3.9), (3.10),

^{and}

### (3.1S),

^{we}

^{have}

s

### A(c- - ^{At} ^{ATp-} ^{A} ^{Tlkp)}

^{4- ab}

^{(a} ^{1)b,}

whichtogetherwiththefact

### (see (3.12))

### 1/(1

^{/}

### <

1,yields

### Ilsll/x/g- -- (l/a- 1)llbl] v"-Z/ff’

^{_</3.}

This together with

### (3.20)

^{and}

^{the}

^{definition}

^{of}

### p,,(t,

^{p,}

### u) (see (3.1))

^{proves}

^{our}

claim, r]

3.2. Bounds.

### Lemma

3.2 shows that ifthe rate c at which the penalty pa- rameters e and### -

^{are}

^{decreased}

^{is}

^{not too}

^{small}

^{(see} ^{(3.12)),}

^{then}

^{a}

^{single}

^{Newton}

stepsuffices tokeepthe current iterateclosetothe optimalsolution ofthepenalized problem

### (T),,).

Thus,^{in}orderto establish the

### (linear)

convergence ofthe### QLPPF

Algorithm, it suffices to bound c away from 1 which, according to### (3.12),

^{amounts}

to bounding

### O.(t)

^{by}

^{some}

^{quantity}independent of t and e, 7- Itis not difficult to seethatsuchabound doesnotexistfor arbitrary t. Fortunately,

^{we}needto consider onlythosetthat, togetherwith some p,

^{are}close tothe optimalsolutionof

### (77,,),

^{in}

whichcase, as weshowbelow,suchabound doesexist

### (provided

thatacertainprimal nondegeneracyassumptionalso### holds).

Theproof ofthis issomewhat intricate: Foreand

### -

^{large,}

^{we}

^{argue}

^{by showing}

^{that}

^{t cannot}

^{be}

^{too}large, i.e., of the ordere

### +- ^{+}

^{1}

### (see

^{Lemma}

### 3.3(a))

and,for and -ysmall, weargueby showingthat,under the primal nondegeneracy assumption, the columns of### A

corresponding to those components of t which are small### (i.e.,

^{of}

^{the order}

### 3’)

^{are}

^{of}

^{rank}

^{n.}

### LEMMA

3.3.### (a)

^{For}

^{all}

^{e}

### >

O, all### " ^{>} ^{O,}

^{and all}

^{(t,p)}

^{E}

### ,

satisfying### p,(t,p, u) ^{<_}

^{1}

### for

^{some}

^{u,}

^{we}

^{have}

### Iltll ^{<_} Mx(e +

7### + ^{1),}

where

### Mx >

^{0}

^{is}

^{some}scalar depending on

### A, B,

b, c, andd only.### (b)

^{Suppose}

^{that}

### (7>)

^{is}

^{primal}nondegeneratein the sensethat,

### for

^{every}

^{optimal}

solution

### (x*, y*) of (P),

^{those}

^{columns}

### of ^{A}

corresponding to thepositive components
### of

^{x*}

### + ^{BTy}

^{have}

^{rank}

^{n.}

^{Then,}

### for

^{all}

^{e}

### > O, all7 ^{>} ^{O,}

^{all}

^{)}

^{>_} 7/2,

^{and all}

### (t,p)

E satisfying### pe,(t,p, u) ^{<_}

^{1}

### for

^{some}u,

^{we}have

### 0 (t) ^{<}

where we

### define

### (w) M2(vr + ^{l/yr.)}

^{4}

^{V}

^{w}

^{>}

^{0,}

with

### M2 ^{>_}

^{1}

^{some}

^{scalar}

^{depending}

^{on}

### A, B,

b, c, andd only.### Proof. (a)

^{The}

^{proof}

^{is}

^{by}contradiction.

### Suppose

the contrary,^{so}that there exists asequence

### {(tk, ^{pk,}

^{u}

### k, ek, .),k)}

such that e^{k}

### >

^{0}and

### .),k >

^{0 and}

### (3.22) (t k, pk) e

k,_{pk,}

### (t k, pk,

u### k) ^{<_}

^{1}Vk,

### (3.23) IIt ll/( +

"r^{----,}oo, oo.

### By

passing into a subsequence if necessary we will assume that### (t

convergesto some limit point, say

### (t,p oo) (so (t , ^{poo)} ^{0).} ^{We}

^{have}

^{from}

^{(3.22)}

### (also

^{using}

### (3.1)

^{and}

### (3.2))

^{that}

Bt^{k} d, t^{k} >0 Vk,
and fromLemma3.1 that

### bTp

^{k}

### >_

v*### 2m7 k, ]It

^{k}

### + ATp

^{k}

### cll ^{<_} V/2(r/* ^{(e} ^{k)2} ^{+} ^{2rnekk)}

^{Vk.}

### Upon

dividing bothsidesof the above fourrelationsby### I[(tk, pk)[

and letting k^{--,}c, we obtainfrom

### Btoo (3.23)

^{and}

### O, (tk,pk)/[l(tk,pk)[ too >_ O, bTp

^{oo}

### >_ -

O,^{(too,poo)} [[too+ATp[l=0.

^{that}

Then, any optimal solution to

### (73)

^{would}

^{remain}optimal when moved along the direction

### (too, poo),

contradicting theboundedness ofthe optimal solution set of### (73) (via

^{Assumption}

### A(a)).

### (b)

^{Fix}any

### >

0,### - ^{>}

^{0,}

^{A >_} ^{e-/2,}

^{and}

^{any}

^{(t,p)}

^{E}

^{Y}

^{satisfying}

^{p,(t,p,} ^{u)} ^{<_}

^{1}

for some u. Let T--

### Diag(t)

andlet### D, E,

F be given by, respectively,### (3.6), (3.7),

and

### (3.8).

Then, F^{-1}

### A(I- D1/2ED1/2)AT, I]E[[ ^{_<}

^{1}

^{and}

^{D} (I + AT-2) ^{-1.}

From thedefinitionof

### 0h (t) (see (3.5))

^{we}

^{then}

^{obtain}

### (3.24)

### O(t) (lie + ED1/2ATFAD1/2E[I + IIED1/UATFII)

### _< ([[El[

^{/}

### []E[[2[[D/[12[[A[I[[F[[

^{/}

### I[E[[[ID/2I[[IAll[[F[[)2

wherethestrict inequality follows from thefacts

### I[D]l <

1,### IIEl[

^{1.}

^{Now}

^{we}

^{bound}

### [[F][.

^{We}

^{have}

### zT(F-I)z zTA(I- DI/2ED1/2)AT

z### >_ zTA(I D)A Tz

m

### +

j=l

### (3.25) ^{>_} ^{A} E(Az)2/(tj)

j=l

### >_ A E(Ayz)2/l[t[le

j=l

where the first inequality follows from

### [IE[[ ^{_<}

^{1 and}

^{a}

### >

^{0}denotes the smallest eigenvalue of

### AA T. (a >

^{0 because}

^{A}

^{has}

^{full}

^{row}

### rank.) Hence, A >_ e7/2

^{yields}

### (3.26) IIFll < 211tl12/(r).

For e and q, near zero, ^{we}give below a different bound on

### IFII- ^{By}

^{the}

^{primal}

nondegeneracy assumption, there exists a constant 5

### >

0 depending on### A, B,

b, c, anddonly suchthat if### (x, y)

isanyfeasible solution of### (P)

^{with}

### cTx + ^{dTy <_}

^{v*}

### + ,

thenthecolumns

### { Aj J ^{e} {1,..., m}

^{with}xj

### + Byy ^{_>}

^{5}

### }

^{have rank}

^{n.}

^{Consider}

the

_{(t,} _{p)}

case in which_{E}

_{:P}

_{and}

_{p,(t,} *e

p,### + u) 2m- ^{_<}

^{1,}

^{we}

^{<} ^{5,}

^{have from}

^{where}

### *

^{Lemma}

^{is}

^{defined as}

^{3.1 that}the columns

^{in}

^{Lemma}

^{3.1.}

### { A

^{Since}

^{J}

^{E}

### {1,..., m}

^{with}

### (t + Ayp + Bfu- ^{c)/e} ^{_>}

^{5}

^{}}

^{have}

^{rank}

^{n.}

^{Since}

^{(t,p) e} ^{:P}

^{and}

### p,(t,p, u) ^{_<}

1,^{we}have

### T(c- t-ATp- BTu) ^{_>} --2ee (cf. (3.1)

^{and}

### (3.2))

^{so}

^{that}

### (tj + Ayp + Bfu- ^{c)/e} ^{>_}

^{implies}

^{tj}

^{<_} 2,/5. Hence,

^{we}

^{obtain}

^{from}

### (3.25)

^{that}

m

### z(-l) ^{>} ,(y)/(t)

j=l

### >_ () (Yz)/(e)

### > (5)’llzll/(e) w,

where the last inequality follows from the fact thatthose

### Aj

forwhichtj### _ ^{2"),/5}

^{have}

rankn, and

### a’

issome positivescalar dependingon### A

only. Since### A >_ e7/2,

^{we}then have

### I111 ^{(2)/((5)} ^{’)} ^{<_} ^{8/((5)} ^{’)} ^{8/((5)’),}

where for convenience we let w

### /3’. Now,

consider the remaining^{ce}where

### * ^{+}

^{2my}

^{6.}

^{Then,}

^{6/(2m),}

^{so}

^{part}

^{(a)}

^{and}

^{(3.26)}

together yield
### IIFII 2(M1)(

^{e}

### +

7### + 1)

^{2}

### 2(M1)2

### (

^{1}

### <2(M1) ( ^{+}

^{1}

^{+} 2m)

Combiningtheabove two cesandweconcludeha

### IFII ^{<} { ^{K/w} ^{K(+1/)2}

^{if}

^{otherwise,}

^{*e} ^{+} ^{2m7} ^{<} ^{5;}

for somepositivescalarK depending^{on}

### A, B,

b,c, anddonly. Combining the above bound with### (3.24)

^{and}

^{we}conclude that

### O(t) M( + 1/)

^{4}

^{for}

^{some}

^{scalar}

### M2

^{1}

^{depending}

^{on}

### A, B,

b, c, and donly.3.3. Main convergence result.

### By

combiningLemm 3.1 to 3.3,^{we}

^{obtain}the followingglobal convergence result for the

### QLPPF

Algorithm.THEOREM3.4. Supposethat

### (P)

^{is}

^{primal}nondegenerateinthesense

### of ^{Lemma} 3.3(b)

dt

### (.) n @ (3.21). f n

th### QLPPF Atoth ^{(t,p} ) toth

^{th om}

u

### satisfies (.) (3.28) (.eg)

### for

^{some}

^{scalar}

### (3.30)

t

### >

0, Bt d,### o,. (t ) ^{_<} (/),

,Oe,

### (t 1, pl,

^{U}

### 1) _

^{fl,}

### 0<<

and

### if

^{we}

^{choose}

### (i Okk(tk)()2+x/-

^{1}

### )

^{Vk,}

### (3.31)

^{a}

^{k}

^{max}

### Z + v/

1### + V/71/el/llbl[

then e^{k}

### O,

_{7}

^{k}0 linearly, and

### {((tk + ^{ATp}

^{k-}

### C)/ek,uk/ek)}, {(tk,pk)}

approach the optimalsolutionset of, respectively,### (79)

^{and}

### (T)).

### Proof.

^{First}

^{note}

^{from}

### (w) >

1 forall w### >

^{0}

### (see (3.21)

^{and}

### M2 > 1)

^{and}

### (3.30)

that

### <

^{1.}Also note from

### (2.7)

^{that}

### (3.32) ek/7k

### We

claimthat### (3.33) (t k, pk) _

^{pk_,_x}

^{(t} ^{k,} ^{pk,}

^{U}

^{k)} ^{<_} ^{,} ^{p,} ^{(t} ^{k,} ^{pk,}

^{U}

^{k)} ^{<_ 3,}

for all k

### >

2. It is easily^{seen}byusing

### (3.27)-(3.31)

^{and}

### (2.5)-(2.7)

^{and}

^{Lemma}

^{3.2}

that

### (3.33)

^{holds}

^{for}

^{k}

^{2.}

^{Suppose}

^{that}

### (3.33)

^{holds for}

^{all}

^{k}

^{<}

h, for some
h ### >

2. Then,### (th,p h)

^{E}

### 3;-1

and### p-l,-l(th,ph,

^{u}

### h) ^{<} /3 <

^{1.}

^{Since}

^{we}also have from

### (2.7)

^{that}

^{e}

^{h}

^{ah-leh-1}

and 7^{h}

### ah-17h-1

^{and}

^{from}

^{(3.31)}

^{that}

^{(ah-1)2} ^{_>}

### y/-/(1

-4-### X/-) ^{>} 1/2,

^{we}

^{can}

^{apply}

^{Lemma}

### 3.3(b)

^{to}conclude that

### (3.34) Oh9/h(t h) _ ?D(h-1/7h-l) ) (1/71)

where the equality follows from

### (3.32).

^{Then,}

^{by}

### (3.30), fl < 1/0 (th).

^{Since}

^{(3.33)}

holds for k h, we also have

### p,(th,p ^{h,}

^{u}

### h) ^{<} ,

^{so}

^{Lemma}

^{3.2}

^{together}

^{with}

### (2.5)-(2.7)

^{and}

### (3.31), (3.32)yields

### (th+l,p h+l) , pe,/(th+l,ph+l,uh+l) _ ^{,} pe+,+(th+l,ph+l,uh+l) _ _{.}

### Hence, (3.33)

holds for k h### +

^{1.}

Since

### (3.33)

holds for all k### >

2,^{we see}that

### (3.34)

holds for all h### >

2. Then, by### (3.30), O(th)

^{is}

^{less}

^{than 1}and bounded away from 1 for all h

### >

2,^{so}that, by

### (3.31),

^{a}

^{k}

^{is}

^{less}

^{than 1}and bounded away from 1 for all k. Hence

### ek

0,_{7}

^{k}0 at the rate ofa geometric progression. The remainder of the proof follows from

### (3.33)

andLemma3.1. 0

Notethat insteadof

### Ok ^{(t} ^{k)}

^{we}

^{can}use, for example, theupperbound

### 1/(el/71)

intheformula

### (3.31),

^{and}

^{linear}convergence would be preserved.

### However,

thisbound istypically loose and difficulttocompute. Thereisalso theissueof finding### el, 71, ,

### (t1, pl)

^{and}

^{u}

^{satisfying}

### (3.27)-(3.30),

^{which}

^{we}

^{will}

^{address}

^{in}

### 3.4.

3.4. Algorithm initialization.

### By

Theorem 3.4, if the primal nondegeneracy assumption thereinholds and ifwe can find e, 7,### (t, p)

andusatisfying### (3.35) (3.36) (3.37)

t

### >

0,### Bt

d,### O(t) ^{<_} (/), p,(t,p, u) <

then we can

### set/3 p,(t,p, u) ^{(assuming} p,(t,p, u) # 0)

^{and}

^{start}

^{the}

### QLPPF

Algorithm withe, 7,### (t,p),

andwewould obtain linearconvergence. But how dowe findsuch e, 7,### (t, p),

andu?One obvious way is to fix any e

### >

0, any### >

0, and then solve the penalized problem### (T),).

The solution### (t,p)

^{obtained}

^{satisfies}

t

### >

0, Bt d,^{vf,(t,p)} ^{+} BTu)

^{0}

^{-0}

for some u

### (see (2.2))

^{so, by}

### (2.3)

^{and}

### (3.1), (3.2),

^{we}

^{have}

### pe,(t,

^{p,}

### u)

^{0}and

### (t,p) e ;. Hence (3.35)

^{and}

### (3.37)

^{hold}and, by Lemma

### 3.3(b), O(t) <_ (e/-),

^{so}

### (3.36)

also holds.### (Of

^{course}

### (T),)

^{need not}

^{be}

^{solved}

^{exactly.)}

^{To}

^{solve}

^{the}

^{problem}

### (T),),

^{we can use}

^{any}

^{method for}

^{convex}differentiable minimization

### (e.g.,

gradient descent,^{coordinate}

### descent),

andwewould typicallywant esmall and/large sothat### (T),)

^{is}

^{well}conditioned.

### Suppose

that there holds Be 0 and### Ae

b.### (This

holds, for example, when### B

isthezeromatrix### (which

corresponds to the casewhenall equalityconstraints are penalized), and a change ofvariable### x’ ()-lx,

^{where}

^{2}

^{is}any interior feasible solution of

### (P) (i.e., A

b,### > 0)

and Z Diag(), has been made in### (P).)

Then we can find a usable e, /,

### (t,p),

u immediately: Fix any e### > 2(1)([Ic[I + IIBT(BBT)-ldII)

^{and}

^{let}

### -

^{e.}

^{p}

^{Also}

### (AAT)-IA(c-

^{let w}

^{BT(BBT)-ld} w),

^{and}

### t:eew

u

### -(BBT)-d.

Then Bw d,

### A(c- ATp) Aw, ^{At}

eb### + ^{Aw}

^{and}

^{BTu}

^{ee}

^{t,}

^{so}

^{that}

Bt-

### Bw

_{d,}

### A(c-

^{t}

### ATp)

-eb,### T(c-

t### ATp BTu) +

^{e/e}

### (I + W)(c-

^{ee}

### ATp) + (e)ee

### (eI + W)(c- ATp)

ew,where

### T Diag(t)

and### W Diag(w).

Also from### b(1) >

^{1}

^{and}

^{our}

^{choice}

^{of}

^{e we}

have e

### > Ilwll,

^{so}

^{t}

### >

^{0.}

### Hence,

by### (3.2),

^{we}

^{have}

### (t,p) e J;

and, by### (3.1)

^{and}

^{e}7, wehave

### p,(t,

^{p,}

### u) II(eI + W)(c ATp) wll/()

^{2}

### <-- II-- ^{ATplI/} ^{+} IIWIIIIc- ^{ATp)II/()}

^{2}

### I1(I AT(AAT)-IA)(c

### + ^{IIWIIII(I} AT(AAT)-IA)c + AT(AAT)-IAwlI/(e)

^{2}

### < I1- ^{wll/}

^{/}

### Ilwll(llcll

^{/}

### Ilwll)/() ,

where the last inequality follows from the triangle inequality and the nonexpansive property of projection matrices. From our choice ofe we see that

### (llcll + Ilwll)/e <

### .5/(1),

^{so}the right-handsideof

### (3.38)

^{is}bounded above by

### .5/(1)+ (.5/(1))

^{2}

^{_<}

### 1/(1) 1/(e/),

^{where}

^{the}inequality follows from

### (1) ^{_>}

^{1}and the equality follows from

_{7}e.

### Hence (3.35)

^{and}

### (3.37)

^{hold.}Also, since

### (t, p) e

J), then### (3.37)

together with

### Lemma 3.3(b)

^{shows that}

### (3.36)

^{holds.}

4. Numerical results. Tostudythe performance of the

### QLPPF

Algorithmin practice, wehave implementedthealgorithmtosolve the specialcaseof### (7:))

^{and}

inwhich

### B

isthe zeromatrix, i.e.,### (P)

^{is}

^{of}

^{the form}

### (4.1)

^{minimize}

^{c}

^{T}

^{x}

subject to

### Ax

b, x### >

O.### (This

corresponds to penalizing all equality constraints in the corresponding dual### problem.)

Below we describe our implementation and present our very preliminary numerical experience.1. Initialization. Inour implementation, weset for allproblems
e^{1--}

### x= lO41[clll/m,

andset

### (somewhat

arbitrarily)### pl

-0, t### max{e, c/2},

where

### "max"

istaken componentwise.### (Note

^{that since}

^{B}

^{is}

^{the}

^{zero}matrix, t can be set to any positive

### vector.) ^{We}

have chosent so tominimize ### IITl(c

^{t}

### ATpl)II

subjectto t

### _>

e. Also, care must be exercised in choosinge and### -"

^{if their}

^{values}

are set toolow, thenthe

### QLPPF

Algorithm may fail to converge; if their values^{are}set too high, then the

### QLPPF

Algorithm may require many iterations to converge.### (Notice

that^{we}set e and

### 7

directly proportional to the average cost### Iclll/m

^{so}

theirvalues scale with

### c.)

2. Steplength selection. To ensure that the

### tk’s

remain inside the positive orthant,^{we}

^{employ}

^{a}backtrackingscheme similar to that usedbySetiono: whenever t

^{k}

### + ^{At}

^{k}

^{is}outside the positiveorthant,

^{we}replacetheformula for t

^{+1}in

### (2.6)

^{by}

### (4.2)

^{t}

^{k+l}

^{t}

^{k}

### H-.98AkAt k,

where

k

### (4.3) Ak=

^{min}

^{tj}

### However,

this raises a difficulty, namely, for### A

much smaller than 1, the vector### (.98AkAt k, Ap k)

may be far from the### Newton

direction### (Atk, ^{Ap} k)

and, as a conse-
quence, theiterates may fail to converge. Toremedy this, ^{we}replace

### (analogous

to### (4.2))

^{the}formula for

### pk+

in### (2.6)

^{by}

### pk+l pk -t-.98AkAp k,

whenevernonconvergence isdetected.

### (The

parameter value .98 ischosen somewhat arbitrarily, but^{it}works wellin our

### tests.)

The proper choiceofthe

### ak’s

isvery important for the### QLPPF

Algorthm:^{if the}

### ak’s

aretoo near 1### (so

^{the}penalty parameters decrease

### slowly),

then the algorithm would convergeslowly; if the### ak’s

aretoonear0### (so

^{the}penalty parameters decrease rapidly), then the algorithm may fail to converge. In our implementation we adjust the

### ak’s

dynamically according to the following rule:### max{.3,

^{.95o}

^{k-1}

### }

### Ok

_{.6}

### ok-1

if

### A

^{k}1;

if

### A _<

.2;otherwise

Vk

### _>

2,with

### 61

set to .5. The rational forthis adjustment rule is that,^{if}

### Ak

1, then the current iterate is closely following the solution trajectory### (so

^{we}can decrease the penalty parameters at afasterrate andstillretain

### convergence)

and,^{if}

### Ak _<

.2, then the current iterate is unable to follow the solution trajectory### (so

^{we}

^{must}

^{decrease}

thepenalty parameters at a slower

### rate).

3. Termination. To avoid numericalproblems, westop decreasing thepenalty parameters e and

### ,

^{when}

^{they reach}

^{some}prespecified tolerances

_{min}and

_{min,}respectively.

### In

our tests^{we}set

(min--

### lo-lellClll/m,

min### lO-llClll/m.

### We

terminatethe### QLPPF

Algorithm when the relativeduality gap andtheviolation ofprimal anddual feasibilityaresmall.### More

specifically,weterminatewhenever the current iterate, denoted by### (t,p),

^{satisfies}

### (4.4) IIX(ATp- c)111 _{_<}

10

### -7,

### (4.5) max{llAx bll , II[-x]+ll t -< ^{10-,}

### (4.6) II[ATp- _{c]+ll }

10### -7,

where x

### (t + ^{ATp-} )1 (see

^{Lemma}

### 3.1),

^{X}

### Diag(x),

and### [.1+

^{denotes}

^{the}

orthogonal projection onto the nonnegativeorthant. Only for three ofourtest prob- lems could the above termination criterion not be met

### (owing

to violation of### (4.4)

and

### (4.6))

^{in}

^{which case}the algorithmisterminated wheneverprimal feasibility

### (4.5)

ismet and

### IcTx

^{v*}

^{I/Iv*l}

^{is}

^{less}

^{than}

^{5.10}

^{-7,}

^{where}

^{v* denotes}

^{the}

^{optimal}

^{cost}

^{of}

### (4.1).

4. Solving for the direction. The most expensive computation at each iter- ation ofthe

### QLPPF

Algorithm lies in solving the system oflinear equations### (2.5).

Thiscan be seen to entailsolving a singlelinear system of the form

### (4.7) AQATw

z,for w, where z is some n-vector and

### Q

is some m m diagonal matrix whose jth diagonal entryis### (4.8)

### +

withe

### >

0,### >

0, andtsome positivem-vector.### (Linear

system of the form### (4.7)

^{also}

arise in interior point algorithms, but

### (4.7)

^{has}

^{the}

^{nice}property that the condition numberof

### Q

canbe controlled byadjusting thepenalty parameters e and### ’.)

^{In}

^{our}

implementation,

### (4.7)

^{is}solved using

### YSMP,

^{a}sparse matrixpackage for symmetric positive semidefinitesystems developed atYale University

### (see [EGSS79], [EGSS82])

andaprecursor to the commercialpackage

### SMPAK (Scientific

^{Computing}Associates,

### 1985). ^{YSMP}

^{comprises}

^{a}

^{set}

^{of}

^{Fortran}

^{routines}implementing theCholeskydecom- positionschemeand,

^{as a}preprocessor,the minimum-degree ordering algorithm

### (see,

e.g.,

### [GeL81]).

Inourimplementation, theminimum-degree orderingroutine### ORDER

iscalled first toobtainapermutationoftherowsand columns of the matrix

### AA

^{T}so that fill-in isreduced during factorization. Then,

### AA

^{T}issymbolically factoredusing the routine

### SSF. (SSF

^{is}

^{called}

^{only}

^{once since}

^{the}

^{nonzero}

^{pattern of}

^{AQA}

^{T}

^{does}

not change with

### Q.) ^{At}

eachiteration, the matrix### AQA

^{T}isnumerically factored by