Semidefinite programming - Generalized inequality constraints

Convex optimization problems

4.6 Generalized inequality constraints

4.6.2 Semidefinite programming

When K is S^k₊, the cone of positive semidefinite k× k matrices, the associated conic form problem is called a semidefinite program (SDP), and has the form

minimize c^Tx

subject to x1F1+· · · + xⁿFn+ G¹ 0 Ax = b,

(4.50)

where G, F1, . . . , Fn ∈ S^k, and A∈ R^p×n. The inequality here is a linear matrix inequality (see example 2.10).

If the matrices G, F1, . . . , Fn are all diagonal, then the LMI in (4.50) is equiva-lent to a set of n linear inequalities, and the SDP (4.50) reduces to a linear program.

Standard and inequality form semidefinite programs

Following the analogy to LP, a standard form SDP has linear equality constraints, and a (matrix) nonnegativity constraint on the variable X∈ Sⁿ:

minimize tr(CX)

subject to tr(AiX) = bi, i = 1, . . . , p X º 0,

(4.51)

where C, A1, . . . , Ap∈ S . (Recall that tr(CX) = _i,j=1CijXij is the form of a general real-valued linear function on Sⁿ.) This form should be compared to the standard form linear program (4.28). In LP and SDP standard forms, we minimize a linear function of the variable, subject to p linear equality constraints on the variable, and a nonnegativity constraint on the variable.

An inequality form SDP, analogous to an inequality form LP (4.29), has no equality constraints, and one LMI:

minimize c^Tx

subject to x1A1+· · · + xⁿAn¹ B,

with variable x∈ Rⁿ, and parameters B, A1, . . . , An∈ S^k, c∈ Rⁿ. Multiple LMIs and linear inequalities

It is common to refer to a problem with linear objective, linear equality and in-equality constraints, and several LMI constraints, i.e.,

minimize c^Tx

subject to F⁽ⁱ⁾(x) = x1F₁⁽ⁱ⁾+· · · + xⁿFn⁽ⁱ⁾+ G⁽ⁱ⁾¹ 0, i = 1, . . . , K Gx¹ h, Ax = b,

as an SDP as well. Such problems are readily transformed to an SDP, by forming a large block diagonal LMI from the individual LMIs and linear inequalities:

minimize c^Tx

subject to diag(Gx− h, F⁽¹⁾(x), . . . , F^(K)(x))¹ 0 Ax = b.

4.6.3 Examples

Second-order cone programming

The SOCP (4.36) can be expressed as a conic form problem minimize c^Tx

subject to −(Aⁱx + bi, c^T_ix + di)¹^Ki 0, i = 1, . . . , m F x = g,

in which

Ki={(y, t) ∈ R^kⁱ⁺¹| kyk²≤ t},

i.e., the second-order cone in Rⁿⁱ⁺¹. This explains the name second-order cone program for the optimization problem (4.36).

Matrix norm minimization

Let A(x) = A0+ x1A1+· · · + xⁿAn, where Ai ∈ R^p×q. We consider the uncon-strained problem

minimize kA(x)k²,

where k · k² denotes the spectral norm (maximum singular value), and x∈ Rⁿ is the variable. This is a convex problem sincekA(x)k² is a convex function of x.

Using the fact that kAk² ≤ s if and only if A^TA ¹ s²I (and s≥ 0), we can express the problem in the form

minimize s

subject to A(x)^TA(x)¹ sI,

with variables x and s. Since the function A(x)^TA(x)− sI is matrix convex in (x, s), this is a convex optimization problem with a single q× q matrix inequality constraint.

We can also formulate the problem using a single linear matrix inequality of size (p + q)× (p + q), using the fact that

(see§A.5.5). This results in the SDP minimize t

in the variables x and t.

Moment problems

Let t be a random variable in R. The expected values E t^k (assuming they exist) are called the (power) moments of the distribution of t. The following classical results give a characterization of a moment sequence.

If there is a probability distribution on R such that xk = E t^k, k = 0, . . . , 2n,

The following partial converse is less obvious: If x0= 1 and H(x)Â 0, then there exists a probability distribution on R such that xi = E tⁱ, i = 0, . . . , 2n. (For a

proof, see exercise 2.37.) Now suppose that x0 = 1, and H(x)º 0 (but possibly H(x)6Â 0), i.e., the linear matrix inequality (4.52) holds, but possibly not strictly.

In this case, there is a sequence of distributions on R, whose moments converge to x. In summary: the condition that x0, . . . , x2nbe the moments of some distribution on R (or the limit of the moments of a sequence of distributions) can be expressed as the linear matrix inequality (4.52) in the variable x, together with the linear equality x0 = 1. Using this fact, we can cast some interesting problems involving moments as SDPs.

Suppose t is a random variable on R. We do not know its distribution, but we do know some bounds on the moments, i.e.,

µ_k ≤ E t^k≤ µk, k = 1, . . . , 2n

(which includes, as a special case, knowing exact values of some of the moments).

Let p(t) = c0+ c1t +· · · + c²ⁿt²ⁿ be a given polynomial in t. The expected value

We can compute upper and lower bounds for E p(t), minimize (maximize) Ep(t)

subject to µ_k ≤ E t^k≤ µk, k = 1, . . . , 2n,

over all probability distributions that satisfy the given moment bounds, by solving the SDP

minimize (maximize) c1x1+· · · + c²ⁿx2n

subject to µ_k≤ x^k≤ µk, k = 1, . . . , 2n H(1, x1, . . . , x2n)º 0

with variables x1, . . . , x2n. This gives bounds on E p(t), over all probability dis-tributions that satisfy the known moment constraints. The bounds are sharp in the sense that there exists a sequence of distributions, whose moments satisfy the given moment bounds, for which E p(t) converges to the upper and lower bounds found by these SDPs.

Bounding portfolio risk with incomplete covariance information

We consider once again the setup for the classical Markowitz portfolio problem (see page 155). We have a portfolio of n assets or stocks, with xi denoting the amount of asset i that is held over some investment period, and pi denoting the relative price change of asset i over the period. The change in total value of the portfolio is p^Tx. The price change vector p is modeled as a random vector, with mean and covariance

p = E p, Σ = E(p− p)(p − p)^T.

The change in value of the portfolio is therefore a random variable with mean p^Tx and standard deviation σ = (x^TΣx)^1/2. The risk of a large loss, i.e., a change in portfolio value that is substantially below its expected value, is directly related

to the standard deviation σ, and increases with it. For this reason the standard deviation σ (or the variance σ²) is used as a measure of the risk associated with the portfolio.

In the classical portfolio optimization problem, the portfolio x is the optimiza-tion variable, and we minimize the risk subject to a minimum mean return and other constraints. The price change statistics p and Σ are known problem param-eters. In the risk bounding problem considered here, we turn the problem around:

we assume the portfolio x is known, but only partial information is available about the covariance matrix Σ. We might have, for example, an upper and lower bound on each entry:

Lij ≤ Σ^ij ≤ U^ij, i, j = 1, . . . , n,

where L and U are given. We now pose the question: what is the maximum risk for our portfolio, over all covariance matrices consistent with the given bounds?

We define the worst-case variance of the portfolio as

σ_wc² = sup{x^TΣx| L^ij ≤ Σ^ij ≤ U^ij, i, j = 1, . . . , n, Σº 0}.

We have added the condition Σº 0, which the covariance matrix must, of course, satisfy.

We can find σwcby solving the SDP maximize x^TΣx

subject to Lij ≤ Σ^ij≤ U^ij, i, j = 1, . . . , n Σº 0

with variable Σ∈ Sⁿ (and problem parameters x, L, and U ). The optimal Σ is the worst covariance matrix consistent with our given bounds on the entries, where

‘worst’ means largest risk with the (given) portfolio x. We can easily construct a distribution for p that is consistent with the given bounds, and achieves the worst-case variance, from an optimal Σ for the SDP. For example, we can take p = p + Σ^1/2v, where v is any random vector with E v = 0 and E vv^T = I.

Evidently we can use the same method to determine σwcfor any prior informa-tion about Σ that is convex. We list here some examples.

• Known variance of certain portfolios. We might have equality constraints such as

u^T_kΣuk= σ_k²,

where ukand σkare given. This corresponds to prior knowledge that certain known portfolios (given by uk) have known (or very accurately estimated) variance.

• Including effects of estimation error. If the covariance Σ is estimated from empirical data, the estimation method will give an estimate ˆΣ, and some in-formation about the reliability of the estimate, such as a confidence ellipsoid.

This can be expressed as

C(Σ− ˆΣ)≤ α,

where C is a positive definite quadratic form on Sⁿ, and the constant α determines the confidence level.

• Factor models. The covariance might have the form Σ = F ΣfactorF^T+ D,

where F ∈ R^n×k, Σfactor ∈ S^k, and D is diagonal. This corresponds to a model of the price changes of the form

p = F z + d,

where z is a random variable (the underlying factors that affect the price changes) and di are independent (additional volatility of each asset price).

We assume that the factors are known. Since Σ is linearly related to Σfactor

and D, we can impose any convex constraint on them (representing prior information) and still compute σwcusing convex optimization.

• Information about correlation coefficients. In the simplest case, the diagonal entries of Σ (i.e., the volatilities of each asset price) are known, and bounds on correlation coefficients between price changes are known:

lij ≤ ρ^ij= Σij

Σ^1/2_ii Σ^1/2_jj ≤ u^ij, i, j = 1, . . . , n.

Since Σii are known, but Σij for i6= j are not, these are linear inequalities.

Fastest mixing Markov chain on a graph

We consider an undirected graph, with nodes 1, . . . , n, and a set of edges E ⊆ {1, . . . , n} × {1, . . . , n}.

Here (i, j) ∈ E means that nodes i and j are connected by an edge. Since the graph is undirected, E is symmetric: (i, j) ∈ E if and only if (j, i) ∈ E. We allow the possibility of self-loops, i.e., we can have (i, i)∈ E.

We define a Markov chain, with state X(t) ∈ {1, . . . , n}, for t ∈ Z⁺ (the set of nonnegative integers), as follows. With each edge (i, j) ∈ E we associate a probability Pij, which is the probability that X makes a transition between nodes i and j. State transitions can only occur across edges; we have Pij = 0 for (i, j)6∈ E.

The probabilities associated with the edges must be nonnegative, and for each node, the sum of the probabilities of links connected to the node (including a self-loop, if there is one) must equal one.

The Markov chain has transition probability matrix

Pij = prob(X(t + 1) = i| X(t) = j), i, j = 1, . . . , n.

This matrix must satisfy

Pij≥ 0, i, j = 1, . . . , n, 1^TP = 1^T, P = P^T, (4.53) and also

Pij = 0 for (i, j)6∈ E. (4.54)

Since P is symmetric and 1^TP = 1^T, we conclude P 1 = 1, so the uniform distribution (1/n)1 is an equilibrium distribution for the Markov chain. Conver-gence of the distribution of X(t) to (1/n)1 is determined by the second largest (in magnitude) eigenvalue of P , i.e., by r = max{λ²,−λⁿ}, where

1 = λ1≥ λ²≥ · · · ≥ λⁿ

are the eigenvalues of P . We refer to r as the mixing rate of the Markov chain.

If r = 1, then the distribution of X(t) need not converge to (1/n)1 (which means the Markov chain does not mix). When r < 1, the distribution of X(t) approaches (1/n)1 asymptotically as r^t, as t → ∞. Thus, the smaller r is, the faster the Markov chain mixes.

The fastest mixing Markov chain problem is to find P , subject to the con-straints (4.53) and (4.54), that minimizes r. (The problem data is the graph, i.e., E.) We will show that this problem can be formulated as an SDP.

Since the eigenvalue λ1= 1 is associated with the eigenvector 1, we can express the mixing rate as the norm of the matrix P , restricted to the subspace 1^⊥: r = kQP Qk², where Q = I−(1/n)11^T is the matrix representing orthogonal projection on 1^⊥. Using the property P 1 = 1, we have

r = kQP Qk²

= k(I − (1/n)11^T)P (I− (1/n)11^T)k²

= kP − (1/n)11^Tk².

This shows that the mixing rate r is a convex function of P , so the fastest mixing Markov chain problem can be cast as the convex optimization problem

minimize kP − (1/n)11^Tk² subject to P 1 = 1

Pij ≥ 0, i, j = 1, . . . , n Pij = 0 for (i, j)6∈ E,

with variable P ∈ Sⁿ. We can express the problem as an SDP by introducing a scalar variable t to bound the norm of P − (1/n)11^T:

minimize t

subject to −tI ¹ P − (1/n)11^T ¹ tI P 1 = 1

Pij≥ 0, i, j = 1, . . . , n Pij= 0 for (i, j)6∈ E.

(4.55)

在文檔中 Convex Optimization (頁 182-188)