1.4 Nonlinear optimization
1.5.6 Comments on exercises
Each chapter concludes with a set of exercises. Some involve working out the de-tails of an argument or claim made in the text. Others focus on determining, or establishing, convexity of some given sets, functions, or problems; or more gener-ally, convex optimization problem formulation. Some chapters include numerical exercises, which require some (but not much) programming in an appropriate high level language. The difficulty level of the exercises is mixed, and varies without warning from quite straightforward to rather tricky.
1.6 Notation
Our notation is more or less standard, with a few exceptions. In this section we describe our basic notation; a more complete list appears on page 697.
We use R to denote the set of real numbers, R+to denote the set of nonnegative real numbers, and R++ to denote the set of positive real numbers. The set of real n-vectors is denoted Rn, and the set of real m× n matrices is denoted Rm×n. We delimit vectors and matrices with square brackets, with the components separated by space. We use parentheses to construct column vectors from comma separated lists. For example, if a, b, c∈ R, we have
(a, b, c) =
a b c
= [ a b c ]T,
which is an element of R3. The symbol 1 denotes a vector all of whose components are one (with dimension determined from context). The notation xi can refer to the ith component of the vector x, or to the ith element of a set or sequence of vectors x1, x2, . . .. The context, or the text, makes it clear which is meant.
We use Sk to denote the set of symmetric k× k matrices, Sk+ to denote the set of symmetric nonnegative definite k× k matrices, and Sk++ to denote the set of symmetric positive definite k× k matrices. The curled inequality symbol º (and its strict formÂ) is used to denote generalized inequality: between vectors, it represents componentwise inequality; between symmetric matrices, it represents matrix inequality. With a subscript, the symbol ¹K (or ≺K) denotes generalized inequality with respect to the cone K (explained in§2.4.1).
Our notation for describing functions deviates a bit from standard notation, but we hope it will cause no confusion. We use the notation f : Rp→ Rq to mean that f is an Rq-valued function on some subset of Rp, specifically, its domain, which we denote dom f . We can think of our use of the notation f : Rp → Rq as a declaration of the function type, as in a computer language: f : Rp→ Rq means that the function f takes as argument a real p-vector, and returns a real q-vector.
The set dom f , the domain of the function f , specifies the subset of Rp of points x for which f (x) is defined. As an example, we describe the logarithm function as log : R → R, with dom log = R++. The notation log : R → R means that
the logarithm function accepts and returns a real number; dom log = R++ means that the logarithm is defined only for positive numbers.
We use Rn as a generic finite-dimensional vector space. We will encounter several other finite-dimensional vector spaces, e.g., the space of polynomials of a variable with a given maximum degree, or the space Skof symmetric k×k matrices.
By identifying a basis for a vector space, we can always identify it with Rn (where n is its dimension), and therefore the generic results, stated for the vector space Rn, can be applied. We usually leave it to the reader to translate general results or statements to other vector spaces. For example, any linear function f : Rn→ R can be represented in the form f (x) = cTx, where c ∈ Rn. The corresponding statement for the vector space Skcan be found by choosing a basis and translating.
This results in the statement: any linear function f : Sk → R can be represented in the form f (X) = tr(CX), where C ∈ Sk.
Bibliography
Least-squares is a very old subject; see, for example, the treatise written (in Latin) by Gauss in the 1820s, and recently translated by Stewart [Gau95]. More recent work in-cludes the books by Lawson and Hanson [LH95] and Bj¨orck [Bj¨o96]. References on linear programming can be found in chapter 4.
There are many good texts on local methods for nonlinear programming, including Gill, Murray, and Wright [GMW81], Nocedal and Wright [NW99], Luenberger [Lue84], and Bertsekas [Ber99].
Global optimization is covered in the books by Horst and Pardalos [HP94], Pinter [Pin95], and Tuy [Tuy98]. Using convex optimization to find bounds for nonconvex problems is an active research topic, and addressed in the books above on global optimization, the book by Ben-Tal and Nemirovski [BTN01,§4.3], and the survey by Nesterov, Wolkowicz, and Ye [NWY00]. Some notable papers on this subject are Goemans and Williamson [GW95], Nesterov [Nes00, Nes98], Ye [Ye99], and Parrilo [Par03]. Randomized methods are discussed in Motwani and Raghavan [MR95].
Convex analysis, the mathematics of convex sets, functions, and optimization problems, is a well developed subfield of mathematics. Basic references include the books by Rockafel-lar [Roc70], Hiriart-Urruty and Lemar´echal [HUL93, HUL01], Borwein and Lewis [BL00], and Bertsekas, Nedi´c, and Ozdaglar [Ber03]. More references on convex analysis can be found in chapters 2–5.
Nesterov and Nemirovski [NN94] were the first to point out that interior-point methods can solve many convex optimization problems; see also the references in chapter 11. The book by Ben-Tal and Nemirovski [BTN01] covers modern convex optimization, interior-point methods, and applications.
Solution methods for convex optimization that we do not cover in this book include subgradient methods [Sho85], bundle methods [HUL93], cutting-plane methods [Kel60, EM75, GLY96], and the ellipsoid method [Sho91, BGT81].
The idea that convex optimization problems are tractable is not new. It has long been rec-ognized that the theory of convex optimization is far more straightforward (and complete) than the theory of general nonlinear optimization. In this context Rockafellar stated, in his 1993 SIAM Review survey paper [Roc93],
In fact the great watershed in optimization isn’t between linearity and nonlin-earity, but convexity and nonconvexity.
The first formal argument that convex optimization problems are easier to solve than general nonlinear optimization problems was made by Nemirovski and Yudin, in their 1983 book Problem Complexity and Method Efficiency in Optimization [NY83]. They showed that the information-based complexity of convex optimization problems is far lower than that of general nonlinear optimization problems. A more recent book on this topic is Vavasis [Vav91].
The low (theoretical) complexity of interior-point methods is integral to modern research in this area. Much of the research focuses on proving that an interior-point (or other) method can solve some class of convex optimization problems with a number of operations that grows no faster than a polynomial of the problem dimensions and log(1/²), where
² > 0 is the required accuracy. (We will see some simple results like these in chapter 11.) The first comprehensive work on this topic is the book by Nesterov and Nemirovski [NN94]. Other books include Ben-Tal and Nemirovski [BTN01, lecture 5] and Renegar [Ren01]. The polynomial-time complexity of interior-point methods for various convex optimization problems is in marked contrast to the situation for a number of nonconvex optimization problems, for which all known algorithms require, in the worst case, a number of operations that is exponential in the problem dimensions.
Convex optimization has been used in many applications areas, too numerous to cite here. Convex analysis is central in economics and finance, where it is the basis of many results. For example the separating hyperplane theorem, together with a no-arbitrage assumption, is used to deduce the existence of prices and risk-neutral probabilities (see, e.g., Luenberger [Lue95, Lue98] and Ross [Ros99]). Convex optimization, especially our ability to solve semidefinite programs, has recently received particular attention in au-tomatic control theory. Applications of convex optimization in control theory can be found in the books by Boyd and Barratt [BB91], Boyd, El Ghaoui, Feron, and Balakrish-nan [BGFB94], Dahleh and Diaz-Bobillo [DDB95], El Ghaoui and Niculescu [EN00], and Dullerud and Paganini [DP00]. A good example of embedded (convex) optimization is model predictive control, an automatic control technique that requires the solution of a (convex) quadratic program at each step. Model predictive control is now widely used in the chemical process control industry; see Morari and Zafirou [MZ89]. Another applica-tions area where convex optimization (and especially, geometric programming) has a long history is electronic circuit design. Research papers on this topic include Fishburn and Dunlop [FD85], Sapatnekar, Rao, Vaidya, and Kang [SRVK93], and Hershenson, Boyd, and Lee [HBL01]. Luo [Luo03] gives a survey of applications in signal processing and communications. More references on applications of convex optimization can be found in chapters 4 and 6–8.
High quality implementations of recent interior-point methods for convex optimization problems are available in the LOQO [Van97] and MOSEK [MOS02] software packages, and the codes listed in chapter 11. Software systems for specifying optimization prob-lems include AMPL [FGK99] and GAMS [BKMR98]. Both provide some support for recognizing problems that can be transformed to linear programs.