Convex optimization problems
4.6 Generalized inequality constraints
4.6.2 Semidefinite programming
When K is Sk+, the cone of positive semidefinite k× k matrices, the associated conic form problem is called a semidefinite program (SDP), and has the form
minimize cTx
subject to x1F1+· · · + xnFn+ G¹ 0 Ax = b,
(4.50)
where G, F1, . . . , Fn ∈ Sk, and A∈ Rp×n. The inequality here is a linear matrix inequality (see example 2.10).
If the matrices G, F1, . . . , Fn are all diagonal, then the LMI in (4.50) is equiva-lent to a set of n linear inequalities, and the SDP (4.50) reduces to a linear program.
Standard and inequality form semidefinite programs
Following the analogy to LP, a standard form SDP has linear equality constraints, and a (matrix) nonnegativity constraint on the variable X∈ Sn:
minimize tr(CX)
subject to tr(AiX) = bi, i = 1, . . . , p X º 0,
(4.51)
where C, A1, . . . , Ap∈ S . (Recall that tr(CX) = i,j=1CijXij is the form of a general real-valued linear function on Sn.) This form should be compared to the standard form linear program (4.28). In LP and SDP standard forms, we minimize a linear function of the variable, subject to p linear equality constraints on the variable, and a nonnegativity constraint on the variable.
An inequality form SDP, analogous to an inequality form LP (4.29), has no equality constraints, and one LMI:
minimize cTx
subject to x1A1+· · · + xnAn¹ B,
with variable x∈ Rn, and parameters B, A1, . . . , An∈ Sk, c∈ Rn. Multiple LMIs and linear inequalities
It is common to refer to a problem with linear objective, linear equality and in-equality constraints, and several LMI constraints, i.e.,
minimize cTx
subject to F(i)(x) = x1F1(i)+· · · + xnFn(i)+ G(i)¹ 0, i = 1, . . . , K Gx¹ h, Ax = b,
as an SDP as well. Such problems are readily transformed to an SDP, by forming a large block diagonal LMI from the individual LMIs and linear inequalities:
minimize cTx
subject to diag(Gx− h, F(1)(x), . . . , F(K)(x))¹ 0 Ax = b.
4.6.3 Examples
Second-order cone programming
The SOCP (4.36) can be expressed as a conic form problem minimize cTx
subject to −(Aix + bi, cTix + di)¹Ki 0, i = 1, . . . , m F x = g,
in which
Ki={(y, t) ∈ Rki+1| kyk2≤ t},
i.e., the second-order cone in Rni+1. This explains the name second-order cone program for the optimization problem (4.36).
Matrix norm minimization
Let A(x) = A0+ x1A1+· · · + xnAn, where Ai ∈ Rp×q. We consider the uncon-strained problem
minimize kA(x)k2,
where k · k2 denotes the spectral norm (maximum singular value), and x∈ Rn is the variable. This is a convex problem sincekA(x)k2 is a convex function of x.
Using the fact that kAk2 ≤ s if and only if ATA ¹ s2I (and s≥ 0), we can express the problem in the form
minimize s
subject to A(x)TA(x)¹ sI,
with variables x and s. Since the function A(x)TA(x)− sI is matrix convex in (x, s), this is a convex optimization problem with a single q× q matrix inequality constraint.
We can also formulate the problem using a single linear matrix inequality of size (p + q)× (p + q), using the fact that
(see§A.5.5). This results in the SDP minimize t
in the variables x and t.
Moment problems
Let t be a random variable in R. The expected values E tk (assuming they exist) are called the (power) moments of the distribution of t. The following classical results give a characterization of a moment sequence.
If there is a probability distribution on R such that xk = E tk, k = 0, . . . , 2n,
The following partial converse is less obvious: If x0= 1 and H(x)Â 0, then there exists a probability distribution on R such that xi = E ti, i = 0, . . . , 2n. (For a
proof, see exercise 2.37.) Now suppose that x0 = 1, and H(x)º 0 (but possibly H(x)6Â 0), i.e., the linear matrix inequality (4.52) holds, but possibly not strictly.
In this case, there is a sequence of distributions on R, whose moments converge to x. In summary: the condition that x0, . . . , x2nbe the moments of some distribution on R (or the limit of the moments of a sequence of distributions) can be expressed as the linear matrix inequality (4.52) in the variable x, together with the linear equality x0 = 1. Using this fact, we can cast some interesting problems involving moments as SDPs.
Suppose t is a random variable on R. We do not know its distribution, but we do know some bounds on the moments, i.e.,
µk ≤ E tk≤ µk, k = 1, . . . , 2n
(which includes, as a special case, knowing exact values of some of the moments).
Let p(t) = c0+ c1t +· · · + c2nt2n be a given polynomial in t. The expected value
We can compute upper and lower bounds for E p(t), minimize (maximize) Ep(t)
subject to µk ≤ E tk≤ µk, k = 1, . . . , 2n,
over all probability distributions that satisfy the given moment bounds, by solving the SDP
minimize (maximize) c1x1+· · · + c2nx2n
subject to µk≤ xk≤ µk, k = 1, . . . , 2n H(1, x1, . . . , x2n)º 0
with variables x1, . . . , x2n. This gives bounds on E p(t), over all probability dis-tributions that satisfy the known moment constraints. The bounds are sharp in the sense that there exists a sequence of distributions, whose moments satisfy the given moment bounds, for which E p(t) converges to the upper and lower bounds found by these SDPs.
Bounding portfolio risk with incomplete covariance information
We consider once again the setup for the classical Markowitz portfolio problem (see page 155). We have a portfolio of n assets or stocks, with xi denoting the amount of asset i that is held over some investment period, and pi denoting the relative price change of asset i over the period. The change in total value of the portfolio is pTx. The price change vector p is modeled as a random vector, with mean and covariance
p = E p, Σ = E(p− p)(p − p)T.
The change in value of the portfolio is therefore a random variable with mean pTx and standard deviation σ = (xTΣx)1/2. The risk of a large loss, i.e., a change in portfolio value that is substantially below its expected value, is directly related
to the standard deviation σ, and increases with it. For this reason the standard deviation σ (or the variance σ2) is used as a measure of the risk associated with the portfolio.
In the classical portfolio optimization problem, the portfolio x is the optimiza-tion variable, and we minimize the risk subject to a minimum mean return and other constraints. The price change statistics p and Σ are known problem param-eters. In the risk bounding problem considered here, we turn the problem around:
we assume the portfolio x is known, but only partial information is available about the covariance matrix Σ. We might have, for example, an upper and lower bound on each entry:
Lij ≤ Σij ≤ Uij, i, j = 1, . . . , n,
where L and U are given. We now pose the question: what is the maximum risk for our portfolio, over all covariance matrices consistent with the given bounds?
We define the worst-case variance of the portfolio as
σwc2 = sup{xTΣx| Lij ≤ Σij ≤ Uij, i, j = 1, . . . , n, Σº 0}.
We have added the condition Σº 0, which the covariance matrix must, of course, satisfy.
We can find σwcby solving the SDP maximize xTΣx
subject to Lij ≤ Σij≤ Uij, i, j = 1, . . . , n Σº 0
with variable Σ∈ Sn (and problem parameters x, L, and U ). The optimal Σ is the worst covariance matrix consistent with our given bounds on the entries, where
‘worst’ means largest risk with the (given) portfolio x. We can easily construct a distribution for p that is consistent with the given bounds, and achieves the worst-case variance, from an optimal Σ for the SDP. For example, we can take p = p + Σ1/2v, where v is any random vector with E v = 0 and E vvT = I.
Evidently we can use the same method to determine σwcfor any prior informa-tion about Σ that is convex. We list here some examples.
• Known variance of certain portfolios. We might have equality constraints such as
uTkΣuk= σk2,
where ukand σkare given. This corresponds to prior knowledge that certain known portfolios (given by uk) have known (or very accurately estimated) variance.
• Including effects of estimation error. If the covariance Σ is estimated from empirical data, the estimation method will give an estimate ˆΣ, and some in-formation about the reliability of the estimate, such as a confidence ellipsoid.
This can be expressed as
C(Σ− ˆΣ)≤ α,
where C is a positive definite quadratic form on Sn, and the constant α determines the confidence level.
• Factor models. The covariance might have the form Σ = F ΣfactorFT+ D,
where F ∈ Rn×k, Σfactor ∈ Sk, and D is diagonal. This corresponds to a model of the price changes of the form
p = F z + d,
where z is a random variable (the underlying factors that affect the price changes) and di are independent (additional volatility of each asset price).
We assume that the factors are known. Since Σ is linearly related to Σfactor
and D, we can impose any convex constraint on them (representing prior information) and still compute σwcusing convex optimization.
• Information about correlation coefficients. In the simplest case, the diagonal entries of Σ (i.e., the volatilities of each asset price) are known, and bounds on correlation coefficients between price changes are known:
lij ≤ ρij= Σij
Σ1/2ii Σ1/2jj ≤ uij, i, j = 1, . . . , n.
Since Σii are known, but Σij for i6= j are not, these are linear inequalities.
Fastest mixing Markov chain on a graph
We consider an undirected graph, with nodes 1, . . . , n, and a set of edges E ⊆ {1, . . . , n} × {1, . . . , n}.
Here (i, j) ∈ E means that nodes i and j are connected by an edge. Since the graph is undirected, E is symmetric: (i, j) ∈ E if and only if (j, i) ∈ E. We allow the possibility of self-loops, i.e., we can have (i, i)∈ E.
We define a Markov chain, with state X(t) ∈ {1, . . . , n}, for t ∈ Z+ (the set of nonnegative integers), as follows. With each edge (i, j) ∈ E we associate a probability Pij, which is the probability that X makes a transition between nodes i and j. State transitions can only occur across edges; we have Pij = 0 for (i, j)6∈ E.
The probabilities associated with the edges must be nonnegative, and for each node, the sum of the probabilities of links connected to the node (including a self-loop, if there is one) must equal one.
The Markov chain has transition probability matrix
Pij = prob(X(t + 1) = i| X(t) = j), i, j = 1, . . . , n.
This matrix must satisfy
Pij≥ 0, i, j = 1, . . . , n, 1TP = 1T, P = PT, (4.53) and also
Pij = 0 for (i, j)6∈ E. (4.54)
Since P is symmetric and 1TP = 1T, we conclude P 1 = 1, so the uniform distribution (1/n)1 is an equilibrium distribution for the Markov chain. Conver-gence of the distribution of X(t) to (1/n)1 is determined by the second largest (in magnitude) eigenvalue of P , i.e., by r = max{λ2,−λn}, where
1 = λ1≥ λ2≥ · · · ≥ λn
are the eigenvalues of P . We refer to r as the mixing rate of the Markov chain.
If r = 1, then the distribution of X(t) need not converge to (1/n)1 (which means the Markov chain does not mix). When r < 1, the distribution of X(t) approaches (1/n)1 asymptotically as rt, as t → ∞. Thus, the smaller r is, the faster the Markov chain mixes.
The fastest mixing Markov chain problem is to find P , subject to the con-straints (4.53) and (4.54), that minimizes r. (The problem data is the graph, i.e., E.) We will show that this problem can be formulated as an SDP.
Since the eigenvalue λ1= 1 is associated with the eigenvector 1, we can express the mixing rate as the norm of the matrix P , restricted to the subspace 1⊥: r = kQP Qk2, where Q = I−(1/n)11T is the matrix representing orthogonal projection on 1⊥. Using the property P 1 = 1, we have
r = kQP Qk2
= k(I − (1/n)11T)P (I− (1/n)11T)k2
= kP − (1/n)11Tk2.
This shows that the mixing rate r is a convex function of P , so the fastest mixing Markov chain problem can be cast as the convex optimization problem
minimize kP − (1/n)11Tk2 subject to P 1 = 1
Pij ≥ 0, i, j = 1, . . . , n Pij = 0 for (i, j)6∈ E,
with variable P ∈ Sn. We can express the problem as an SDP by introducing a scalar variable t to bound the norm of P − (1/n)11T:
minimize t
subject to −tI ¹ P − (1/n)11T ¹ tI P 1 = 1
Pij≥ 0, i, j = 1, . . . , n Pij= 0 for (i, j)6∈ E.
(4.55)