Multicriterion optimization - Vector optimization

Convex optimization problems

4.7 Vector optimization

4.7.5 Multicriterion optimization

When a vector optimization problem involves the cone K = R^q₊, it is called a multicriterion or multi-objective optimization problem. The components of f0, say, F1, . . . , Fq, can be interpreted as q different scalar objectives, each of which we would like to minimize. We refer to Fi as the ith objective of the problem. A multicriterion optimization problem is convex if f1, . . . , fm are convex, h1, . . . , hp

are affine, and the objectives F1, . . . , Fq are convex.

Since multicriterion problems are vector optimization problems, all of the ma-terial of §4.7.1–§4.7.4 applies. For multicriterion problems, though, we can be a bit more specific in the interpretations. If x is feasible, we can think of Fi(x) as its score or value, according to the ith objective. If x and y are both feasible, Fi(x)≤ Fⁱ(y) means that x is at least as good as y, according to the ith objective;

Fi(x) < Fi(y) means that x is better than y, or x beats y, according to the ith ob-jective. If x and y are both feasible, we say that x is better than y, or x dominates y, if Fi(x)≤ Fⁱ(y) for i = 1, . . . , q, and for at least one j, Fj(x) < Fj(y). Roughly speaking, x is better than y if x meets or beats y on all objectives, and beats it in at least one objective.

In a multicriterion problem, an optimal point x^?satisfies Fi(x^?)≤ Fⁱ(y), i = 1, . . . , q,

for every feasible y. In other words, x^? is simultaneously optimal for each of the scalar problems

minimize Fj(x)

subject to fi(x)≤ 0, i = 1, . . . , m hi(x) = 0, i = 1, . . . , p,

for j = 1, . . . , q. When there is an optimal point, we say that the objectives are noncompeting, since no compromises have to be made among the objectives; each objective is as small as it could be made, even if the others were ignored.

A Pareto optimal point x^po satisfies the following: if y is feasible and Fi(y)≤ Fi(x^po) for i = 1, . . . , q, then Fi(x^po) = Fi(y), i = 1, . . . , q. This can be restated as: a point is Pareto optimal if and only if it is feasible and there is no better feasible point. In particular, if a feasible point is not Pareto optimal, there is at least one other feasible point that is better. In searching for good points, then, we can clearly limit our search to Pareto optimal points.

Trade-off analysis

Now suppose that x and y are Pareto optimal points with, say, Fi(x) < Fi(y), i∈ A

Fi(x) = Fi(y), i∈ B Fi(x) > Fi(y), i∈ C,

where A∪ B ∪ C = {1, . . . , q}. In other words, A is the set of (indices of) objectives for which x beats y, B is the set of objectives for which the points x and y are tied, and C is the set of objectives for which y beats x. If A and C are empty, then the two points x and y have exactly the same objective values. If this is not the case, then both A and C must be nonempty. In other words, when comparing two Pareto optimal points, they either obtain the same performance (i.e., all objectives equal), or, each beats the other in at least one objective.

In comparing the point x to y, we say that we have traded or traded off better objective values for i∈ A for worse objective values for i ∈ C. Optimal trade-off analysis (or just trade-off analysis) is the study of how much worse we must do in one or more objectives in order to do better in some other objectives, or more generally, the study of what sets of objective values are achievable.

As an example, consider a bi-criterion (i.e., two criterion) problem. Suppose x is a Pareto optimal point, with objectives F1(x) and F2(x). We might ask how much larger F2(z) would have to be, in order to obtain a feasible point z with F1(z)≤ F¹(x)− a, where a > 0 is some constant. Roughly speaking, we are asking how much we must pay in the second objective to obtain an improvement of a in the first objective. If a large increase in F2 must be accepted to realize a small decrease in F1, we say that there is a strong trade-off between the objectives, near the Pareto optimal value (F1(x), F2(x)). If, on the other hand, a large decrease in F1 can be obtained with only a small increase in F2, we say that the trade-off between the objectives is weak (near the Pareto optimal value (F1(x), F2(x))).

We can also consider the case in which we trade worse performance in the first objective for an improvement in the second. Here we find how much smaller F2(z)

can be made, to obtain a feasible point z with F1(z) ≤ F¹(x) + a, where a > 0 is some constant. In this case we receive a benefit in the second objective, i.e., a reduction in F2 compared to F2(x). If this benefit is large (i.e., by increasing F1

a small amount we obtain a large reduction in F2), we say the objectives exhibit a strong trade-off. If it is small, we say the objectives trade off weakly (near the Pareto optimal value (F1(x), F2(x))).

Optimal trade-off surface

The set of Pareto optimal values for a multicriterion problem is called the optimal trade-off surface (in general, when q > 2) or the optimal trade-off curve (when q = 2). (Since it would be foolish to accept any point that is not Pareto optimal, we can restrict our trade-off analysis to Pareto optimal points.) Trade-off analysis is also sometimes called exploring the optimal off surface. (The optimal trade-off surface is usually, but not always, a surface in the usual sense. If the problem has an optimal point, for example, the optimal trade-off surface consists of a single point, the optimal value.)

An optimal trade-off curve is readily interpreted. An example is shown in figure 4.11, on page 185, for a (convex) bi-criterion problem. From this curve we can easily visualize and understand the trade-offs between the two objectives.

• The endpoint at the right shows the smallest possible value of F², without any consideration of F1.

• The endpoint at the left shows the smallest possible value of F¹, without any consideration of F2.

• By finding the intersection of the curve with a vertical line at F¹= α, we can see how large F2 must be to achieve F1≤ α.

• By finding the intersection of the curve with a horizontal line at F²= β, we can see how large F1must be to achieve F2≤ β.

• The slope of the optimal trade-off curve at a point on the curve (i.e., a Pareto optimal value) shows the local optimal trade-off between the two objectives.

Where the slope is steep, small changes in F1 are accompanied by large changes in F2.

• A point of large curvature is one where small decreases in one objective can only be accomplished by a large increase in the other. This is the prover-bial knee of the trade-off curve, and in many applications represents a good compromise solution.

All of these have simple extensions to a trade-off surface, although visualizing a surface with more than three objectives is difficult.

Scalarizing multicriterion problems

When we scalarize a multicriterion problem by forming the weighted sum objective λ^Tf0(x) =

Xq i=1

λiFi(x),

where λ Â 0, we can interpret λⁱ as the weight we attach to the ith objective.

The weight λi can be thought of as quantifying our desire to make Fi small (or our objection to having Fi large). In particular, we should take λi large if we want Fi to be small; if we care much less about Fi, we can take λi small. We can interpret the ratio λi/λj as the relative weight or relative importance of the ith objective compared to the jth objective. Alternatively, we can think of λi/λj as exchange rate between the two objectives, since in the weighted sum objective a decrease (say) in Fjby α is considered the same as an increase in Fiin the amount (λi/λj)α.

These interpretations give us some intuition about how to set or change the weights while exploring the optimal trade-off surface. Suppose, for example, that the weight vector λÂ 0 yields the Pareto optimal point x^po, with objective values F1(x^po), . . . , Fq(x^po). To find a (possibly) new Pareto optimal point which trades off a better kth objective value (say), for (possibly) worse objective values for the other objectives, we form a new weight vector ˜λ with

λ˜k> λk, ˜λj = λj, j6= k, j = 1, . . . , q,

i.e., we increase the weight on the kth objective. This yields a new Pareto optimal point ˜x^po with Fk(˜x^po) ≤ F^k(x^po) (and usually, Fk(˜x^po) < Fk(x^po)), i.e., a new Pareto optimal point with an improved kth objective.

We can also see that at any point where the optimal trade-off surface is smooth, λ gives the inward normal to the surface at the associated Pareto optimal point.

In particular, when we choose a weight vector λ and apply scalarization, we obtain a Pareto optimal point where λ gives the local trade-offs among objectives.

In practice, optimal trade-off surfaces are explored by ad hoc adjustment of the weights, based on the intuitive ideas above. We will see later (in chapter 5) that the basic idea of scalarization, i.e., minimizing a weighted sum of objectives, and then adjusting the weights to obtain a suitable solution, is the essence of duality.

4.7.6 Examples

Regularized least-squares

We are given A ∈ R^m×n and b ∈ R^m, and want to choose x ∈ Rⁿ taking into account two quadratic objectives:

• F¹(x) = kAx − bk²2 = x^TA^TAx− 2b^TAx + b^Tb is a measure of the misfit between Ax and b,

• F²(x) =kxk²2= x^Tx is a measure of the size of x.

Our goal is to find x that gives a good fit (i.e., small F1) and that is not large (i.e., small F2). We can formulate this problem as a vector optimization problem with respect to the cone R²₊, i.e., a bi-criterion problem (with no constraints):

minimize (w.r.t. R²₊) f0(x) = (F1(x), F2(x)).

PSfrag replacements

F1(x) =kAx − bk²2

F2(x)=kxk

2 2

0 5 10 15

0 5 10

Figure 4.11 Optimal trade-off curve for a regularized least-squares problem.

The shaded set is the set of achievable values (kAx−bk²2,kxk²2). The optimal trade-off curve, shown darker, is the lower left part of the boundary.

We can scalarize this problem by taking λ1 > 0 and λ2 > 0 and minimizing the scalar weighted sum objective

λ^Tf0(x) = λ1F1(x) + λ2F2(x)

= x^T(λ1A^TA + λ2I)x− 2λ¹b^TAx + λ1b^Tb, which yields

x(µ) = (λ1A^TA + λ2I)⁻¹λ1A^Tb = (A^TA + µI)⁻¹A^Tb,

where µ = λ2/λ1. For any µ > 0, this point is Pareto optimal for the bi-criterion problem. We can interpret µ = λ2/λ1as the relative weight we assign F2compared to F1.

This method produces all Pareto optimal points, except two, associated with the extremes µ → ∞ and µ → 0. In the first case we have the Pareto optimal solution x = 0, which would be obtained by scalarization with λ = (0, 1). At the other extreme we have the Pareto optimal solution A^†b, where A^† is the pseudo-inverse of A. This Pareto optimal solution is obtained as the limit of the optimal solution of the scalarized problem as µ→ 0, i.e., as λ → (1, 0). (We will encounter the regularized least-squares problem again in §6.3.2.)

Figure 4.11 shows the optimal trade-off curve and the set of achievable values for a regularized least-squares problem with problem data A∈ R^100×10, b∈ R¹⁰⁰. (See exercise 4.50 for more discussion.)

Risk-return trade-off in portfolio optimization

The classical Markowitz portfolio optimization problem described on page 155 is naturally expressed as a bi-criterion problem, where the objectives are the negative

mean return (since we wish to maximize mean return) and the variance of the return:

minimize (w.r.t. R²₊) (F1(x), F2(x)) = (−p^Tx, x^TΣx) subject to 1^Tx = 1, xº 0.

In forming the associated scalarized problem, we can (without loss of generality) take λ1= 1 and λ2= µ > 0:

minimize −p^Tx + µx^TΣx subject to 1^Tx = 1, xº 0,

which is a QP. In this example too, we get all Pareto optimal portfolios except for the two limiting cases corresponding to µ→ 0 and µ → ∞. Roughly speaking, in the first case we get a maximum mean return, without regard for return variance;

in the second case we form a minimum variance return, without regard for mean return. Assuming that p_k> p_ifor i6= k, i.e., that asset k is the unique asset with maximum mean return, the portfolio allocation x = ek is the only one correspond-ing to µ→ 0. (In other words, we concentrate the portfolio entirely in the asset that has maximum mean return.) In many portfolio problems asset n corresponds to a risk-free investment, with (deterministic) return rrf. Assuming that Σ, with its last row and column (which are zero) removed, is full rank, then the other extreme Pareto optimal portfolio is x = en, i.e., the portfolio is concentrated entirely in the risk-free asset.

As a specific example, we consider a simple portfolio optimization problem with 4 assets, with price change mean and standard deviations given in the following table.

Asset p_i Σ^1/2_ii

1 12% 20%

2 10% 10%

3 7% 5%

4 3% 0%

Asset 4 is a risk-free asset, with a (certain) 3% return. Assets 3, 2, and 1 have increasing mean returns, ranging from 7% to 12%, as well as increasing standard deviations, which range from 5% to 20%. The correlation coefficients between the assets are ρ12= 30%, ρ13=−40%, and ρ²³= 0%.

Figure 4.12 shows the optimal trade-off curve for this portfolio optimization problem. The plot is given in the conventional way, with the horizontal axis show-ing standard deviation (i.e., squareroot of variance) and the vertical axis showshow-ing expected return. The lower plot shows the optimal asset allocation vector x for each Pareto optimal point.

The results in this simple example agree with our intuition. For small risk, the optimal allocation consists mostly of the risk-free asset, with a mixture of the other assets in smaller quantities. Note that a mixture of asset 3 and asset 1, which are negatively correlated, gives some hedging, i.e., lowers variance for a given level of mean return. At the other end of the trade-off curve, we see that aggressive growth portfolios (i.e., those with large mean returns) concentrate the allocation in assets 1 and 2, the ones with the largest mean returns (and variances).

在文檔中 Convex Optimization (頁 195-200)