Convexity with respect to a generalized inequality

Convex functions

3.6 Convexity with respect to generalized inequalities

3.6.2 Convexity with respect to a generalized inequality

Suppose K ⊆ R^m is a proper cone with associated generalized inequality¹^K. We say f : Rⁿ → R^mis K-convex if for all x, y, and 0≤ θ ≤ 1,

f (θx + (1− θ)y) ¹^K θf (x) + (1− θ)f(y).

The function is strictly K-convex if

f (θx + (1− θ)y) ≺^K θf (x) + (1− θ)f(y)

for all x6= y and 0 < θ < 1. These definitions reduce to ordinary convexity and strict convexity when m = 1 (and K = R+).

Example 3.47 Convexity with respect to componentwise inequality. A function f : Rⁿ→ R^m is convex with respect to componentwise inequality (i.e., the generalized inequality induced by R^m₊) if and only if for all x, y and 0≤ θ ≤ 1,

f (θx + (1− θ)y) ¹ θf(x) + (1 − θ)f(y),

i.e., each component fiis a convex function. The function f is strictly convex with respect to componentwise inequality if and only if each component fiis strictly con-vex.

Example 3.48 Matrix convexity. Suppose f is a symmetric matrix valued function, i.e., f : Rⁿ→ S^m. The function f is convex with respect to matrix inequality if

f (θx + (1− θ)y) ¹ θf(x) + (1 − θ)f(y)

for any x and y, and for θ∈ [0, 1]. This is sometimes called matrix convexity. An equivalent definition is that the scalar function z^Tf (x)z is convex for all vectors z.

(This is often a good way to prove matrix convexity). A matrix function is strictly matrix convex if

f (θx + (1− θ)y) ≺ θf(x) + (1 − θ)f(y)

when x6= y and 0 < θ < 1, or, equivalently, if z^Tf z is strictly convex for every z6= 0.

Some examples:

• The function f(X) = XX^T where X ∈ R^n×m is matrix convex, since for fixed z the function z^TXX^Tz =kX^Tzk²2 is a convex quadratic function of (the components of) X. For the same reason, f (X) = X² is matrix convex on Sⁿ.

• The function X^p is matrix convex on Sⁿ₊₊ for 1≤ p ≤ 2 or −1 ≤ p ≤ 0, and matrix concave for 0≤ p ≤ 1.

• The function f(X) = e^X is not matrix convex on Sⁿ, for n≥ 2.

Many of the results for convex functions have extensions to K-convex functions.

As a simple example, a function is K-convex if and only if its restriction to any line in its domain is K-convex. In the rest of this section we list a few results for K-convexity that we will use later; more results are explored in the exercises.

Dual characterization of K-convexity

A function f is K-convex if and only if for every wº^K^∗ 0, the (real-valued) function w^Tf is convex (in the ordinary sense); f is strictly K-convex if and only if for every nonzero wº^K^∗ 0 the function w^Tf is strictly convex. (These follow directly from the definitions and properties of dual inequality.)

Differentiable K-convex functions

A differentiable function f is K-convex if and only if its domain is convex, and for all x, y∈ dom f,

f (y)º^Kf (x) + Df (x)(y− x).

(Here Df (x) ∈ R^m×n is the derivative or Jacobian matrix of f at x; see §A.4.1.) The function f is strictly K-convex if and only if for all x, y∈ dom f with x 6= y,

f (y)Â^Kf (x) + Df (x)(y− x).

Composition theorem

Many of the results on composition can be generalized to K-convexity. For example, if g : Rⁿ → R^p is K-convex, h : R^p → R is convex, and ˜h (the extended-value extension of h) is K-nondecreasing, then h◦ g is convex. This generalizes the fact that a nondecreasing convex function of a convex function is convex. The condition that ˜h be K-nondecreasing implies that dom h− K = dom h.

Example 3.49 The quadratic matrix function g : R^m×n→ Sⁿdefined by g(X) = X^TAX + B^TX + X^TB + C,

where A∈ S^m, B∈ R^m×n, and C∈ Sⁿ, is convex when Aº 0.

The function h : Sⁿ→ R defined by h(Y ) = − log det(−Y ) is convex and increasing on dom h =−Sⁿ++.

By the composition theorem, we conclude that

f (X) =− log det(−(X^TAX + B^TX + X^TB + C)) is convex on

domf ={X ∈ R^m×n| X^TAX + B^TX + X^TB + C≺ 0}.

This generalizes the fact that

− log(−(ax²+ bx + c)) is convex on

{x ∈ R | ax²+ bx + c < 0}, provided a≥ 0.

Bibliography

The standard reference on convex analysis is Rockafellar [Roc70]. Other books on convex functions are Stoer and Witzgall [SW70], Roberts and Varberg [RV73], Van Tiel [vT84], Hiriart-Urruty and Lemar´echal [HUL93], Ekeland and T´emam [ET99], Borwein and Lewis [BL00], Florenzano and Le Van [FL01], Barvinok [Bar02], and Bertsekas, Nedi´c, and Ozdaglar [Ber03]. Most nonlinear programming texts also include chapters on convex functions (see, for example, Mangasarian [Man94], Bazaraa, Sherali, and Shetty [BSS93], Bertsekas [Ber99], Polyak [Pol87], and Peressini, Sullivan, and Uhl [PSU88]).

Jensen’s inequality appears in [Jen06]. A general study of inequalities, in which Jensen’s inequality plays a central role, is presented by Hardy, Littlewood, and P´olya [HLP52], and Beckenbach and Bellman [BB65].

The term perspective function is from Hiriart-Urruty and Lemar´echal [HUL93, volume 1, page 100]. For the definitions in example 3.19 (relative entropy and Kullback-Leibler divergence), and the related exercise 3.13, see Cover and Thomas [CT91].

Some important early references on quasiconvex functions (as well as other extensions of convexity) are Nikaidˆo [Nik54], Mangasarian [Man94, chapter 9], Arrow and Enthoven [AE61], Ponstein [Pon67], and Luenberger [Lue68]. For a more comprehensive reference list, we refer to Bazaraa, Sherali, and Shetty [BSS93, page 126].

Pr´ekopa [Pr´e80] gives a survey of log-concave functions. Log-convexity of the Laplace transform is mentioned in Barndorff-Nielsen [BN78, §7]. For a proof of the integration result of log-concave functions, see Pr´ekopa [Pr´e71, Pr´e73].

Generalized inequalities are used extensively in the recent literature on cone programming, starting with Nesterov and Nemirovski [NN94, page 156]; see also Ben-Tal and Nemirovski [BTN01] and the references at the end of chapter 4. Convexity with respect to generalized inequalities also appears in the work of Luenberger [Lue69,§8.2] and Isii [Isi64]. Matrix monotonicity and matrix convexity are attributed to L¨owner [L¨ow34], and are discussed in detail by Davis [Dav63], Roberts and Varberg [RV73, page 216] and Marshall and Olkin [MO79, §16E]. For the result on convexity and concavity of the function X^p in example 3.48, see Bondar [Bon94, theorem 16.1]. For a simple example that demonstrates that e^X is not matrix convex, see Marshall and Olkin [MO79, page 474].

Exercises

Definition of convexity

3.1 Suppose f : R→ R is convex, and a, b ∈ dom f with a < b.

(a) Show that

f (x)≤b− x

b− af (a) +x− a b− af (b) for all x∈ [a, b].

(b) Show that

f (x)− f(a)

x− a ≤f (b)− f(a)

b− a ≤f (b)− f(x) b− x for all x∈ (a, b). Draw a sketch that illustrates this inequality.

b− a ≤ f⁰(b).

Note that these inequalities also follow from (3.2):

f (b)≥ f(a) + f⁰(a)(b− a), f (a)≥ f(b) + f⁰(b)(a− b).

(d) Suppose f is twice differentiable. Use the result in (c) to show that f⁰⁰(a)≥ 0 and f⁰⁰(b)≥ 0.

3.2 Level sets of convex, concave, quasiconvex, and quasiconcave functions. Some level sets of a function f are shown below. The curve labeled 1 shows{x | f(x) = 1}, etc.

PSfrag replacements

1 2 3

Could f be convex (concave, quasiconvex, quasiconcave)? Explain your answer. Repeat for the level curves shown below.

PSfrag replacements

1 2 3 4 5 6

3.3 Inverse of an increasing convex function. Suppose f : R→ R is increasing and convex on its domain (a, b). Let g denote its inverse, i.e., the function with domain (f (a), f (b)) and g(f (x)) = x for a < x < b. What can you say about convexity or concavity of g?

3.4 [RV73, page 15] Show that a continuous function f : Rⁿ→ R is convex if and only if for every line segment, its average value on the segment is less than or equal to the average of its values at the endpoints of the segment: For every x, y∈ Rⁿ,

Z 1 0

f (x + λ(y− x)) dλ ≤ f (x) + f (y)

2 .

3.5 [RV73, page 22] Running average of a convex function. Suppose f : R→ R is convex, with R+⊆ dom f. Show that its running average F , defined as is convex. You can assume f is differentiable.

3.6 Functions and epigraphs. When is the epigraph of a function a halfspace? When is the epigraph of a function a convex cone? When is the epigraph of a function a polyhedron?

3.7 Suppose f : Rⁿ→ R is convex with dom f = Rⁿ, and bounded above on Rⁿ. Show that f is constant.

3.8 Second-order condition for convexity. Prove that a twice differentiable function f is convex if and only if its domain is convex and∇²f (x)º 0 for all x ∈ dom f. Hint. First consider the case f : R→ R. You can use the first-order condition for convexity (which was proved on page 70).

3.9 Second-order conditions for convexity on an affine set. Let F ∈ R^n×m, ˆx∈ Rⁿ. The restriction of f : Rⁿ→ R to the affine set {F z + ˆx | z ∈ R^m} is defined as the function f : R˜ ^m→ R with

f (z) = f (F z + ˆ˜ x), dom ˜f ={z | F z + ˆx ∈ dom f}.

Suppose f is twice differentiable with a convex domain.

(a) Show that ˜f is convex if and only if for all z∈ dom ˜f

3.10 An extension of Jensen’s inequality. One interpretation of Jensen’s inequality is that randomization or dithering hurts, i.e., raises the average value of a convex function: For f convex and v a zero mean random variable, we have E f (x0+ v)≥ f(x⁰). This leads to the following conjecture. If f0 is convex, then the larger the variance of v, the larger Ef (x0+ v).

(a) Give a counterexample that shows that this conjecture is false. Find zero mean random variables v and w, with var(v) > var(w), a convex function f , and a point x0, such that E f (x0+ v) < E f (x0+ w).

(b) The conjecture is true when v and w are scaled versions of each other. Show that Ef (x0+ tv) is monotone increasing in t≥ 0, when f is convex and v is zero mean.

3.11 Monotone mappings. A function ψ : Rⁿ→ Rⁿis called monotone if for all x, y∈ dom ψ, (ψ(x)− ψ(y))^T(x− y) ≥ 0.

(Note that ‘monotone’ as defined here is not the same as the definition given in§3.6.1.

Both definitions are widely used.) Suppose f : Rⁿ→ R is a differentiable convex function.

Show that its gradient ∇f is monotone. Is the converse true, i.e., is every monotone mapping the gradient of a convex function?

3.12 Suppose f : Rⁿ → R is convex, g : Rⁿ → R is concave, dom f = dom g = Rⁿ, and for all x, g(x)≤ f(x). Show that there exists an affine function h such that for all x, g(x)≤ h(x) ≤ f(x). In other words, if a concave function g is an underestimator of a convex function f , then we can fit an affine function between f and g.

3.13 Kullback-Leibler divergence and the information inequality. Let Dkl be the Kullback-Leibler divergence, as defined in (3.17). Prove the information inequality: Dkl(u, v)≥ 0 for all u, v∈ Rⁿ++. Also show that Dkl(u, v) = 0 if and only if u = v.

Hint. The Kullback-Leibler divergence can be expressed as Dkl(u, v) = f (u)− f(v) − ∇f(v)^T(u− v), where f (v) =Pn

i=1vilog vi is the negative entropy of v.

3.14 Convex-concave functions and saddle-points. We say the function f : Rⁿ× R^m → R is convex-concave if f (x, z) is a concave function of z, for each fixed x, and a convex function of x, for each fixed z. We also require its domain to have the product form domf = A× B, where A ⊆ Rⁿand B⊆ R^m are convex.

(a) Give a second-order condition for a twice differentiable function f : Rⁿ× R^m→ R to be convex-concave, in terms of its Hessian∇²f (x, z).

(b) Suppose that f : Rⁿ×R^m→ R is convex-concave and differentiable, with ∇f(˜x, ˜z) = 0. Show that the saddle-point property holds: for all x, z, we have

f (˜x, z)≤ f(˜x, ˜z) ≤ f(x, ˜z).

Show that this implies that f satisfies the strong max-min property:

sup

(c) Now suppose that f : Rⁿ× R^m→ R is differentiable, but not necessarily convex-concave, and the saddle-point property holds at ˜x, ˜z:

f (˜x, z)≤ f(˜x, ˜z) ≤ f(x, ˜z) for all x, z. Show that∇f(˜x, ˜z) = 0.

Examples

3.15 A family of concave utility functions. For 0 < α≤ 1 let uα(x) = x^α− 1

α ,

with dom uα= R+. We also define u0(x) = log x (with dom u0= R++).

(a) Show that for x > 0, u0(x) = limα→0uα(x).

(b) Show that uα are concave, monotone increasing, and all satisfy uα(1) = 0.

These functions are often used in economics to model the benefit or utility of some quantity of goods or money. Concavity of uα means that the marginal utility (i.e., the increase in utility obtained for a fixed increase in the goods) decreases as the amount of goods increases. In other words, concavity models the effect of satiation.

3.16 For each of the following functions determine whether it is convex, concave, quasiconvex, or quasiconcave. 3.17 Suppose p < 1, p6= 0. Show that the function

f (x) =

i=11/xi)⁻¹. Hint. Adapt the proofs for the log-sum-exp function and the geometric mean in§3.1.5.

3.18 Adapt the proof of concavity of the log-determinant function in§3.1.5 to show the follow-ing.

(a) f (X) = tr¡ X⁻¹¢

is convex on dom f = Sⁿ++. (b) f (X) = (det X)^1/n is concave on dom f = Sⁿ₊₊. 3.19 Nonnegative weighted sums and integrals.

(a) Show that f (x) = Pr

i=1αix[i] is a convex function of x, where α1 ≥ α2 ≥ · · · ≥ αr≥ 0, and x[i] denotes the ith largest component of x. (You can use the fact that f (x) =Pk

i=1x_[i]is convex on Rⁿ.)

(b) Let T (x, ω) denote the trigonometric polynomial

T (x, ω) = x1+ x2cos ω + x3cos 2ω +· · · + xⁿcos(n− 1)ω.

3.20 Composition with an affine function. Show that the following functions f : Rⁿ→ R are convex.

3.21 Pointwise maximum and supremum. Show that the following functions f : R → R are the absolute value of x, componentwise), and|x|[i] is the ith largest component of

|x|. In other words, |x|[1],|x|[2], . . . ,|x|[n]are the absolute values of the components of x, sorted in nonincreasing order.

3.22 Composition rules. Show that the following functions are convex.

(a) f (x) =− log(− log(Pm

3.24 Some functions on the probability simplex. Let x be a real-valued random variable which takes values in {a¹, . . . , an} where a¹ < a2 < · · · < aⁿ, with prob(x = ai) = pi, i = 1, . . . , n. For each of the following functions of p (on the probability simplex {p ∈ Rⁿ+ | 1^Tp = 1}), determine if the function is convex, concave, quasiconvex, or quasicon-cave.

(a) E x.

(b) prob(x≥ α).

(d) Pn

i=1pilog pi, the negative entropy of the distribution.

(e) var x = E(x− E x)².

(f) quartile(x) = inf{β | prob(x ≤ β) ≥ 0.25}.

(g) The cardinality of the smallest setA ⊆ {a¹, . . . , an} with probability ≥ 90%. (By cardinality we mean the number of elements inA.)

(h) The minimum width interval that contains 90% of the probability, i.e., inf{β − α | prob(α ≤ x ≤ β) ≥ 0.9} .

3.25 Maximum probability distance between distributions. Let p, q∈ Rⁿ represent two proba-bility distributions on{1, . . . , n} (so p, q º 0, 1^Tp = 1^Tq = 1). We define the maximum probability distance dmp(p, q) between p and q as the maximum difference in probability assigned by p and q, over all events:

dmp(p, q) = max{| prob(p, C) − prob(q, C)| | C ⊆ {1, . . . , n}}.

Here prob(p, C) is the probability of C, under the distribution p, i.e., prob(p, C) = P of a matrix X ∈ Sⁿ. We have already seen several functions of the eigenvalues that are convex or concave functions of X.

• The maximum eigenvalue λ¹(X) is convex (example 3.10). The minimum eigenvalue λn(X) is concave.

• The sum of the eigenvalues (or trace), tr X = λ¹(X) +· · · + λⁿ(X), is linear.

• The sum of the inverses of the eigenvalues (or trace of the inverse), tr(XPn ⁻¹) =

i=11/λi(X), is convex on Sⁿ₊₊(exercise 3.18).

• The geometric mean of the eigenvalues, (det X)^1/n = (Qn

i=1λi(X))^1/n, and the logarithm of the product of the eigenvalues, log det X =Pn

i=1log λi(X), are concave on X∈ Sⁿ++(exercise 3.18 and page 74).

In this problem we explore some more functions of eigenvalues, by exploiting variational characterizations.

(a) Sum of k largest eigenvalues. Show thatPk

i=1λi(X) is convex on Sⁿ. Hint. [HJ85, page 191] Use the variational characterization

Xk i=1

λi(X) = sup{tr(V^TXV )| V ∈ R^n×k, V^TV = I}.

(b) Geometric mean of k smallest eigenvalues. Show that (Qn

i=n−k+1λi(X))^1/k is

i=n−k+1log λi(X) is concave

3.27 Diagonal elements of Cholesky factor. Each X∈ Sⁿ++has a unique Cholesky factorization X = LL^T, where L is lower triangular, with Lii> 0. Show that Liiis a concave function

Operations that preserve convexity

3.28 Expressing a convex function as the pointwise supremum of a family of affine functions.

In this problem we extend the result proved on page 83 to the case where dom f6= Rⁿ. Let f : Rⁿ→ R be a convex function. Define ˜f : Rⁿ→ R as the pointwise supremum of all affine functions that are global underestimators of f :

f (x) = sup˜ {g(x) | g affine, g(z) ≤ f(z) for all z}.

(a) Show that f (x) = ˜f (x) for x∈ int dom f.

(b) Show that f = ˜f if f is closed (i.e., epi f is a closed set; see§A.3.3).

3.29 Representation of piecewise-linear convex functions. A function f : Rⁿ → R, with domf = Rⁿ, is called piecewise-linear if there exists a partition of Rⁿas

Rⁿ= X1∪ X²∪ · · · ∪ X^L,

where int Xi 6= ∅ and int Xⁱ∩ int X^j = ∅ for i 6= j, and a family of affine functions a^T₁x + b1, . . . , a^T_Lx + bLsuch that f (x) = a^T_ix + bi for x∈ Xi.

Show that this means that f (x) = max{a^T1x + b1, . . . , a^T_Lx + bL}.

3.30 Convex hull or envelope of a function. The convex hull or convex envelope of a function f : Rⁿ→ R is defined as

g(x) = inf{t | (x, t) ∈ conv epi f}.

Geometrically, the epigraph of g is the convex hull of the epigraph of f .

Show that g is the largest convex underestimator of f . In other words, show that if h is convex and satisfies h(x)≤ f(x) for all x, then h(x) ≤ g(x) for all x.

3.31 [Roc70, page 35] Largest homogeneous underestimator. Let f be a convex function. Define the function g as

(b) Show that g is the largest homogeneous underestimator of f : If h is homogeneous and h(x)≤ f(x) for all x, then we have h(x) ≤ g(x) for all x.

3.32 Products and ratios of convex functions. In general the product or ratio of two convex functions is not convex. However, there are some results that apply to functions on R.

Prove the following.

(a) If f and g are convex, both nondecreasing (or nonincreasing), and positive functions on an interval, then f g is convex.

(b) If f , g are concave, positive, with one nondecreasing and the other nonincreasing, then f g is concave.

3.33 Direct proof of perspective theorem. Give a direct proof that the perspective function g, as defined in§3.2.6, of a convex function f is convex: Show that dom g is a convex set, and that for (x, t), (y, s)∈ dom g, and 0 ≤ θ ≤ 1, we have

g(θx + (1− θ)y, θt + (1 − θ)s) ≤ θg(x, t) + (1 − θ)g(y, s).

3.34 The Minkowski function. The Minkowski function of a convex set C is defined as MC(x) = inf{t > 0 | t⁻¹x∈ C}.

(a) Draw a picture giving a geometric interpretation of how to find MC(x).

(b) Show that MC is homogeneous, i.e., MC(αx) = αMC(x) for α≥ 0.

(d) Show that MC is a convex function.

(e) Suppose C is also closed, symmetric (if x∈ C then −x ∈ C), and has nonempty interior. Show that MC is a norm. What is the corresponding unit ball?

3.35 Support function calculus. Recall that the support function of a set C⊆ Rⁿis defined as SC(y) = sup{y^Tx| x ∈ C}. On page 81 we showed that S^C is a convex function.

3.36 Derive the conjugates of the following functions.

(a) Max function. f (x) = maxi=1,...,nxion Rⁿ.

(f) Negative generalized logarithm for second-order cone. f (x, t) =− log(t²− x^Tx) on {(x, t) ∈ Rⁿ× R | kxk²< t}.

3.37 Show that the conjugate of f (X) = tr(X⁻¹) with dom f = Sⁿ₊₊is given by f^∗(Y ) =−2 tr(−Y )^1/2, domf^∗=−Sⁿ+.

Hint. The gradient of f is∇f(X) = −X⁻².

3.38 Young’s inequality. Let f : R→ R be an increasing function, with f(0) = 0, and let g be its inverse. Define F and G as

F (x) =

Show that F and G are conjugates. Give a simple graphical interpretation of Young’s inequality,

xy≤ F (x) + G(y).

3.39 Properties of conjugate functions.

(a) Conjugate of convex plus affine function. Define g(x) = f (x) + c^Tx + d, where f is convex. Express g^∗in terms of f^∗(and c, d).

(b) Conjugate of perspective. Express the conjugate of the perspective of a convex function f in terms of f^∗.

(c) Conjugate and minimization. Let f (x, z) be convex in (x, z) and define g(x) = infzf (x, z). Express the conjugate g^∗in terms of f^∗.

As an application, express the conjugate of g(x) = infz{h(z) | Az + b = x}, where h is convex, in terms of h^∗, A, and b.

(d) Conjugate of conjugate. Show that the conjugate of the conjugate of a closed convex function is itself: f = f^∗∗ if f is closed and convex. (A function is closed if its epigraph is closed; see§A.3.3.) Hint. Show that f^∗∗ is the pointwise supremum of all affine global underestimators of f . Then apply the result of exercise 3.28.

3.40 Gradient and Hessian of conjugate function. Suppose f : Rⁿ → R is convex and twice continuously differentiable. Suppose ¯y and ¯x are related by ¯y =∇f(¯x), and that ∇²f (¯x)Â 0.

(a) Show that∇f^∗(¯y) = ¯x.

(b) Show that∇²f^∗(¯y) =∇²f (¯x)⁻¹.

3.41 Domain of conjugate function. Suppose f : Rⁿ → R is a twice differentiable convex function and x∈ dom f. Show that for small enough u we have

y =∇f(x) + ∇²f (x)u∈ dom f^∗,

i.e., y^Tx− f(x) is bounded above. It follows that dim(dom f^∗)≥ rank ∇²f (x).

Hint. Consider∇f(x + tv), where t is small, and v is any vector in Rⁿ. Quasiconvex functions

3.42 Approximation width. Let f0, . . . , fn: R→ R be given continuous functions. We consider the problem of approximating f0 as a linear combination of f1, . . . , fn. For x∈ Rⁿ, we say that f = x1f1+· · · + xⁿfn approximates f0 with tolerance ² > 0 over the interval [0, T ] if|f(t) − f⁰(t)| ≤ ² for 0 ≤ t ≤ T . Now we choose a fixed tolerance ² > 0 and define the approximation width as the largest T such that f approximates f0 over the interval [0, T ]:

W (x) = sup{T | |x1f1(t) +· · · + xnfn(t)− f0(t)| ≤ ² for 0 ≤ t ≤ T }.

Show that W is quasiconcave.

3.43 First-order condition for quasiconvexity. Prove the first-order condition for quasiconvexity given in§3.4.3: A differentiable function f : Rⁿ→ R, with dom f convex, is quasiconvex if and only if for all x, y∈ dom f,

f (y)≤ f(x) =⇒ ∇f(x)^T(y− x) ≤ 0.

Hint. It suffices to prove the result for a function on R; the general result follows by restriction to an arbitrary line.

3.44 Second-order conditions for quasiconvexity. In this problem we derive alternate repre-sentations of the second-order conditions for quasiconvexity given in §3.4.3. Prove the following.

(a) A point x∈ dom f satisfies (3.21) if and only if there exists a σ such that

∇²f (x) + σ∇f(x)∇f(x)^Tº 0. (3.26) It satisfies (3.22) for all y6= 0 if and only if there exists a σ such

∇²f (x) + σ∇f(x)∇f(x)^TÂ 0. (3.27) Hint. We can assume without loss of generality that∇²f (x) is diagonal.

(b) A point x∈ dom f satisfies (3.21) if and only if either ∇f(x) = 0 and ∇²f (x)º 0,

has exactly one negative eigenvalue. It satisfies (3.22) for all y 6= 0 if and only if H(x) has exactly one nonpositive eigenvalue.

Hint. You can use the result of part (a). The following result, which follows from the eigenvalue interlacing theorem in linear algebra, may also be useful: If B∈ Sⁿ and a∈ Rⁿ, then

3.45 Use the first and second-order conditions for quasiconvexity given in §3.4.3 to verify quasiconvexity of the function f (x) =−x¹x2, with dom f = R²₊₊.

3.46 Quasilinear functions with domain Rⁿ. A function on R that is quasilinear (i.e., qua-siconvex and quasiconcave) is monotone, i.e., either nondecreasing or nonincreasing. In this problem we consider a generalization of this result to functions on Rⁿ.

Suppose the function f : Rⁿ→ R is quasilinear and continuous with dom f = Rⁿ. Show that it can be expressed as f (x) = g(a^Tx), where g : R→ R is monotone and a ∈ Rⁿ. In other words, a quasilinear function with domain Rⁿmust be a monotone function of a linear function. (The converse is also true.)

Log-concave and log-convex functions

3.49 Show that the following functions are log-concave.

(a) Logistic function: f (x) = e^x/(1 + e^x) with dom f = R.

3.50 Coefficients of a polynomial as a function of the roots. Show that the coefficients of a polynomial with real negative roots are log-concave functions of the roots. In other words, the functions ai: Rⁿ→ R, defined by the identity

3.51 [BL00, page 41] Let p be a polynomial on R, with all its roots real. Show that it is log-concave on any interval on which it is positive.

3.52 [MO79, §3.E.2] Log-convexity of moment functions. Suppose f : R → R is nonnegative with R+⊆ dom f. For x ≥ 0 define

φ(x) = Z ∞

u^xf (u) du.

Show that φ is a log-convex function. (If x is a positive integer, and f is a probability density function, then φ(x) is the xth moment of the distribution.)

Use this to show that the Gamma function, Γ(x) =

Z ∞ 0

u^x−1e^−udu,

is log-convex for x≥ 1.

3.53 Suppose x and y are independent random vectors in Rⁿ, with log-concave probability density functions f and g, respectively. Show that the probability density function of the sum z = x + y is log-concave.

3.54 Log-concavity of Gaussian cumulative distribution function. The cumulative distribution function of a Gaussian random variable,

f (x) = 1

√2π Z x

−∞

e^−t²^/2dt,

is log-concave. This follows from the general result that the convolution of two log-concave functions is log-concave. In this problem we guide you through a simple self-contained proof that f is log-concave. Recall that f is log-concave if and only if f⁰⁰(x)f (x)≤ f⁰(x)² for all x.

(a) Verify that f⁰⁰(x)f (x)≤ f⁰(x)² for x≥ 0. That leaves us the hard part, which is to show the inequality for x < 0.

(b) Verify that for any t and x we have t²/2≥ −x²/2 + xt.

3.55 Log-concavity of the cumulative distribution function of a log-concave probability density.

In this problem we extend the result of exercise 3.54. Let g(t) = exp(−h(t)) be a differ-entiable log-concave probability density function, and let

f (x) =

be its cumulative distribution. We will show that f is log-concave, i.e., it satisfies f⁰⁰(x)f (x)≤ (f⁰(x))² for all x.

(a) Express the derivatives of f in terms of the function h. Verify that f⁰⁰(x)f (x) ≤ (f⁰(x))² if h⁰(x)≥ 0.

(b) Assume that h⁰(x) < 0. Use the inequality

h(t)≥ h(x) + h⁰(x)(t− x) (which follows from convexity of h), to show that

Z x

−∞

e^−h(t)dt≤ e^−h(x)

−h⁰(x).

Use this inequality to verify that f⁰⁰(x)f (x)≤ (f⁰(x))²if h⁰(x)≥ 0.

3.56 More log-concave densities. Show that the following densities are log-concave.

(a) [MO79, page 493] The gamma density, defined by f (x) = α^λ

Γ(λ)x^λ−1e^−αx,

with dom f = R+. The parameters λ and α satisfy λ≥ 1, α > 0.

(b) [MO79, page 306] The Dirichlet density

f (x) = Γ(1^Tλ)

Convexity with respect to a generalized inequality 3.57 Show that the function f (X) = X⁻¹is matrix convex on Sⁿ₊₊. 3.58 Schur complement. Suppose X∈ Sⁿpartitioned as

X = (see§A.5.5). Show that the Schur complement, viewed as function from Sⁿinto S^n−k, is matrix concave on Sⁿ₊₊.

3.59 Second-order conditions for K-convexity. Let K ⊆ R^m be a proper convex cone, with associated generalized inequality¹^K. Show that a twice differentiable function f : Rⁿ→ R^m, with convex domain, is K-convex if and only if for all x∈ dom f and all y ∈ Rⁿ,

3.60 Sublevel sets and epigraph of K-convex functions. Let K⊆ R be a proper convex cone

在文檔中 Convex Optimization (頁 123-141)